440 days ago
Bruno V Bionode Hack 1.0
Fiona N == write your team name and link to your github repo below ==
Alec ( Billy Gene: bionode-pgp = bionode-pgp-billy-gene !!!!!
Shun L Team Eva: bionode-eva
Ismail M Team Express-yourselves: bionode-gxa
Nikolas P Team: bionode-monarch
Adam H The V Team: dbVar
651 days ago
Fiona N Notes for the day #CMDNAHack
Tweet with hashtag: 
Nadia #CMDNAHack please use when twitting!
Fiona N And find us on Twitter: @DNAdigest @theContentMine @linguamatics 
•Intro from DNAdigest :ok: 
•Intro from ContentMine :ok: 
•Installation of tools :ok:
And off you go! :checkered_flag: 
Command line to reset keyboard mappings in VM
  • setxkbmap gb
Fiona N download papers using getpapers (different year for each table): 
  • getpapers --query '"human genomic" AND PUB_YEAR:[2010 TO 2010]' -o genome2010 -x
Peter M then normalize:
  • norma -q genome2010 -i fulltext.xml -o scholarly.html --transform nlm2html
Fiona N Challenge: Can we build a word cloud? 
Peter M Run word frequencies
  • ami2-word -q genome2010 -i scholarly.html --w.words wordFrequencies --w.stopwords /org/xmlcml/ami2/plugins/word/stopwords.txt
A regular expression for EGA can be found http://www.ebi.ac.uk/miriam/main/collections/MIR:00000512
Antony Q ENA (European Nucleotide Archive): ^[A-Z]+[0-9]+$
Fiona N Here is the regular expression for ArrayExpress: ^[AEP]-\w{4}-\d+$
A regular expression to find DOIs: 
Jana G A regular expression to ENCODE data: 
Other repositories to look for the data identifiers
Need regex accession for: 
dbGap  http://www.ncbi.nlm.nih.gov/gap : see http://www.ncbi.nlm.nih.gov/books/NBK110024/ e.g. study accession id: "phs000879.v1.p1" 
Jee-Hyub K Remove ^ ... $ in regular expressions.
If you remove ^ and $ from the regular expression, you will get some matches (it worked with arrayexpress and refseq).
Peter M Run ami-regex [1]
(assuming the directory is called "genome2010")
we create  a file genome2010/regex.xml containing
<compoundRegex title="genome">
<regex weight="1.0" fields="genome">([Gg]enome)</regex>
<regex weight="1.0" fields="data">([Dd]ata)</regex>
then run
ami2-regex -q genome2010 --r.regex genome2010/regex.xml
Typical output
<results title="genome"><result pre="e of 10 primary tumors and 10 effusions was analyzed using the Array-Ready Oligo set for the Human " name0="genome" value0="Genome" post="platform. Results for selected genes were validated using PCR, Western blotting, and immunohistoche" xpath="/html[1]/body[1]/div[1]/div[2]/p[1]"/>
<result pre="otting, and immunohistochemistry confirmed the array findings for BCAR1, CLDN4, VIL2, and DCN. Our " name0="data" value0="data" post="show that breast carcinoma cells in primary carcinomas and effusions have different gene expression" xpath="/html[1]/body[1]/div[1]/div[2]/p[1]"/>
<result pre="l, and their clinical relevance was analyzed in a larger series of breast carcinoma effusions. Our " name0="data" value0="data" post="demonstrate that in agreement with our previous observations, breast carcinoma cells in effusions a" xpath="/html[1]/body[1]/div[1]/div[3]/p[3]"/>
now create your own file ids.xml like:
<compoundRegex title="ids">
<regex weight="1.0" fields="egad">(EGAD\d{11})</regex>
Fiona N •Challenge 1 : Can we find data accession numbers? 
Jee-Hyub K
  • Europe PubMed Central provides web services for mined accession numbers.
  • Accession number search
Fiona N •Challenge 2 : Can we find data DOIs? 
•Challenge X : …?
•Interrupted by coffee breaks and lunch J
Open Access scientific journals in biology that can be readily mined: 
  • PLoS
  • Biomed Central
  • eLife
651 days ago
Here are photos for the discussion boards from both 'open discussion' sessions at the DNAdigest Symposium
Fiona N In the first Discussion session we focused on
 "What are the challenges to data sharing?"
One group even produced two pages of notes! 
In the second discussion second we focused on 
"What are the best practises?" and wrote down recommendations in two groups
Justin C Group 3
Government/big picture
Who uses/accesses data
Problems with databases
Fiona N DNAdigest Symposium - Incentives for data sharing in genomics research
9:30 Arrivals (Tea/coffee is available)
9:45-10:00 Introduction to DNAdigest and the outline for the day
Part I “Multiple perspectives on data sharing”
10:00 Natalie Banner from Wellcome Trust
“Incentives for data sharing in genomics: a funder’s perspective”
10:25 Neil Walker from University of Cambridge
“Turning policy into practice: barriers to data sharing, as seen by the people who make the data”
10:50  Shahid Hanif from the Association of the British Pharmaceutical Industry (ABPI)
“The opportunities and challenges of sharing genomics data with the pharmaceutical industry”
11:15 – 13:00 Open space discussion
“What are the reasons to share? What are the hurdles?” (Tea/coffee is available)
13:00 – 13:30 Lunch break
Part II “Tools and workflows” 
13:35 – 14:00 Roland Roberts from PLoS
“How to publish research-related data?”
14:00 – 14:25 Jean Liu from Altmetric
“How can attention paid to datasets be identified and measured?”
14:25 – 14:50 Mark Hahnel from Figshare
“What is the point of openly available academic research data?”
14:50 – 16:30 Open Space discussion
“What are the best practices for data sharing, and how can the hurdles be overcome?”
16:30 – 16:45 Summary of learning points from each group.
16:45 – 17:00 Fiona Nielsen from DNAdigest
“How to win grants from H2020 by sharing data” (Tea/coffee and cake is available)

