Hackpads are smart collaborative documents. .
379 days ago
Bruno V Bionode Hack 1.0
 
Teams
Fiona N == write your team name and link to your github repo below ==
 
Alec ( Billy Gene: bionode-pgp = bionode-pgp-billy-gene !!!!!
 
Shun L Team Eva: bionode-eva
 
Ismail M Team Express-yourselves: bionode-gxa
 
Nikolas P Team: bionode-monarch
 
Adam H The V Team: dbVar
 
 
 
590 days ago
Fiona N Notes for the day #CMDNAHack
Tweet with hashtag: 
Nadia #CMDNAHack please use when twitting!
 
Fiona N And find us on Twitter: @DNAdigest @theContentMine @linguamatics 
 
Schedule: 
•Intro from DNAdigest :ok: 
•Intro from ContentMine :ok: 
•Installation of tools :ok:
And off you go! :checkered_flag: 
 
Command line to reset keyboard mappings in VM
  • setxkbmap gb
 
Fiona N download papers using getpapers (different year for each table): 
  • getpapers --query '"human genomic" AND PUB_YEAR:[2010 TO 2010]' -o genome2010 -x
 
Peter M then normalize:
  • norma -q genome2010 -i fulltext.xml -o scholarly.html --transform nlm2html
 
Fiona N Challenge: Can we build a word cloud? 
Peter M Run word frequencies
  • ami2-word -q genome2010 -i scholarly.html --w.words wordFrequencies --w.stopwords /org/xmlcml/ami2/plugins/word/stopwords.txt
A regular expression for EGA can be found http://www.ebi.ac.uk/miriam/main/collections/MIR:00000512
 
Antony Q ENA (European Nucleotide Archive): ^[A-Z]+[0-9]+$
 
Fiona N Here is the regular expression for ArrayExpress: ^[AEP]-\w{4}-\d+$
 
A regular expression to find DOIs: 
 
Jana G A regular expression to ENCODE data: 
ENC[SR|BS|DO|AB|LB|FF][0-9]{3}[A-Z]{3}
 
Other repositories to look for the data identifiers
 
Need regex accession for: 
dbGap  http://www.ncbi.nlm.nih.gov/gap : see http://www.ncbi.nlm.nih.gov/books/NBK110024/ e.g. study accession id: "phs000879.v1.p1" 
 
 
Jee-Hyub K Remove ^ ... $ in regular expressions.
If you remove ^ and $ from the regular expression, you will get some matches (it worked with arrayexpress and refseq).
 
Peter M Run ami-regex [1]
(assuming the directory is called "genome2010")
we create  a file genome2010/regex.xml containing
<compoundRegex title="genome">
<regex weight="1.0" fields="genome">([Gg]enome)</regex>
<regex weight="1.0" fields="data">([Dd]ata)</regex>
</compoundRegex>
 
then run
ami2-regex -q genome2010 --r.regex genome2010/regex.xml
 
Typical output
 
containing:
 
<results title="genome"><result pre="e of 10 primary tumors and 10 effusions was analyzed using the Array-Ready Oligo set for the Human " name0="genome" value0="Genome" post="platform. Results for selected genes were validated using PCR, Western blotting, and immunohistoche" xpath="/html[1]/body[1]/div[1]/div[2]/p[1]"/>
<result pre="otting, and immunohistochemistry confirmed the array findings for BCAR1, CLDN4, VIL2, and DCN. Our " name0="data" value0="data" post="show that breast carcinoma cells in primary carcinomas and effusions have different gene expression" xpath="/html[1]/body[1]/div[1]/div[2]/p[1]"/>
<result pre="l, and their clinical relevance was analyzed in a larger series of breast carcinoma effusions. Our " name0="data" value0="data" post="demonstrate that in agreement with our previous observations, breast carcinoma cells in effusions a" xpath="/html[1]/body[1]/div[1]/div[3]/p[3]"/>
 
now create your own file ids.xml like:
<compoundRegex title="ids">
<regex weight="1.0" fields="egad">(EGAD\d{11})</regex>
</compoundRegex>
 
 
Fiona N •Challenge 1 : Can we find data accession numbers? 
Jee-Hyub K
  • Europe PubMed Central provides web services for mined accession numbers.
  • Accession number search
 
Fiona N •Challenge 2 : Can we find data DOIs? 
•Challenge X : …?
•Interrupted by coffee breaks and lunch J
•Summary 
Open Access scientific journals in biology that can be readily mined: 
 
  • PLoS
  • Biomed Central
  • eLife
 
...
590 days ago
Proactive P Notes for the day 11 Dec 2015: 
 
 
 
Fiona N Tweet with hashtag: 
#CMDNAHack please use when twitting!
 
And find us on Twitter: @DNAdigest @theContentMine @linguamatics 
 
 
 
Peter M THIS COMMAND WORKS for PMR, suggest you use different years
getpapers --query '"human genomic" AND PUB_YEAR:[2010 TO 2010]' -o genome2010 -x
 
Antony Q Delete empty directories:
cd genome2010
find -empty -delete
cd ..
 
Peter M then normalize:
norma -q genome2010 -i fulltext.xml -o scholarly.html --transform nlm2html
 
then word frequencies
ami2-word -q genome2010 -i scholarly.html --w.words wordFrequencies  --w.stopwords /org/xmlcml/ami2/plugins/word/stopwords.txt
 
and regex
 
Antony Q ENA (European Nucleotide Archive) regex (from http://www.ebi.ac.uk/miriam/main/collections/MIR:00000372):
^[A-Z]+[0-9]+$
 
 
 
José M More often than not, i utilise https://regex101.com for helping me develop and test regular expressions and might be useful for other people here, particularly regular expression newbies. For example: https://regex101.com/r/cA9aK7/1
 
  • Notes
 
698 days ago
Unfiled. Edited by Fiona Nielsen 698 days ago
Here are photos for the discussion boards from both 'open discussion' sessions at the DNAdigest Symposium
 
Fiona N In the first Discussion session we focused on
 "What are the challenges to data sharing?"
 
 
 
 
 
One group even produced two pages of notes! 
 
 
 
 
In the second discussion second we focused on 
"What are the best practises?" and wrote down recommendations in two groups
 
 
 
 
 
 
 
 
702 days ago
Unfiled. Edited by Justin Clark-Casey 702 days ago
Justin C Group 3
 
Topics
Consent
Government/big picture
Reseachers
Who uses/accesses data
Problems with databases
 
702 days ago
Unfiled. Edited by Fiona Nielsen 702 days ago
Fiona N DNAdigest Symposium - Incentives for data sharing in genomics research
 
 
9:30 Arrivals (Tea/coffee is available)
9:45-10:00 Introduction to DNAdigest and the outline for the day
 
Part I “Multiple perspectives on data sharing”
10:00 Natalie Banner from Wellcome Trust
“Incentives for data sharing in genomics: a funder’s perspective”
10:25 Neil Walker from University of Cambridge
“Turning policy into practice: barriers to data sharing, as seen by the people who make the data”
10:50  Shahid Hanif from the Association of the British Pharmaceutical Industry (ABPI)
“The opportunities and challenges of sharing genomics data with the pharmaceutical industry”
 
11:15 – 13:00 Open space discussion
“What are the reasons to share? What are the hurdles?” (Tea/coffee is available)
 
13:00 – 13:30 Lunch break
Part II “Tools and workflows” 
13:35 – 14:00 Roland Roberts from PLoS
“How to publish research-related data?”
14:00 – 14:25 Jean Liu from Altmetric
“How can attention paid to datasets be identified and measured?”
14:25 – 14:50 Mark Hahnel from Figshare
“What is the point of openly available academic research data?”
14:50 – 16:30 Open Space discussion
“What are the best practices for data sharing, and how can the hurdles be overcome?”
 
16:30 – 16:45 Summary of learning points from each group.
 
16:45 – 17:00 Fiona Nielsen from DNAdigest
“How to win grants from H2020 by sharing data” (Tea/coffee and cake is available)
 

Contact Support



Please check out our How-to Guide and FAQ first to see if your question is already answered! :)

If you have a feature request, please add it to this pad. Thanks!


Log in