Hackpads are smart collaborative documents. .

Fiona Nielsen

440 days ago
Bruno V Bionode Hack 1.0
 
Teams
Fiona N == write your team name and link to your github repo below ==
 
Alec ( Billy Gene: bionode-pgp = bionode-pgp-billy-gene !!!!!
 
Shun L Team Eva: bionode-eva
 
Ismail M Team Express-yourselves: bionode-gxa
 
Nikolas P Team: bionode-monarch
 
Adam H The V Team: dbVar
 
 
 
651 days ago
Fiona N Notes for the day #CMDNAHack
Tweet with hashtag: 
Nadia #CMDNAHack please use when twitting!
 
Fiona N And find us on Twitter: @DNAdigest @theContentMine @linguamatics 
 
Schedule: 
•Intro from DNAdigest :ok: 
•Intro from ContentMine :ok: 
•Installation of tools :ok:
And off you go! :checkered_flag: 
 
Command line to reset keyboard mappings in VM
  • setxkbmap gb
 
Fiona N download papers using getpapers (different year for each table): 
  • getpapers --query '"human genomic" AND PUB_YEAR:[2010 TO 2010]' -o genome2010 -x
 
Peter M then normalize:
  • norma -q genome2010 -i fulltext.xml -o scholarly.html --transform nlm2html
 
Fiona N Challenge: Can we build a word cloud? 
Peter M Run word frequencies
  • ami2-word -q genome2010 -i scholarly.html --w.words wordFrequencies --w.stopwords /org/xmlcml/ami2/plugins/word/stopwords.txt
A regular expression for EGA can be found http://www.ebi.ac.uk/miriam/main/collections/MIR:00000512
 
Antony Q ENA (European Nucleotide Archive): ^[A-Z]+[0-9]+$
 
Fiona N Here is the regular expression for ArrayExpress: ^[AEP]-\w{4}-\d+$
 
A regular expression to find DOIs: 
 
Jana G A regular expression to ENCODE data: 
ENC[SR|BS|DO|AB|LB|FF][0-9]{3}[A-Z]{3}
 
Other repositories to look for the data identifiers
 
Need regex accession for: 
dbGap  http://www.ncbi.nlm.nih.gov/gap : see http://www.ncbi.nlm.nih.gov/books/NBK110024/ e.g. study accession id: "phs000879.v1.p1" 
 
 
Jee-Hyub K Remove ^ ... $ in regular expressions.
If you remove ^ and $ from the regular expression, you will get some matches (it worked with arrayexpress and refseq).
 
Peter M Run ami-regex [1]
(assuming the directory is called "genome2010")
we create  a file genome2010/regex.xml containing
<compoundRegex title="genome">
<regex weight="1.0" fields="genome">([Gg]enome)</regex>
<regex weight="1.0" fields="data">([Dd]ata)</regex>
</compoundRegex>
 
then run
ami2-regex -q genome2010 --r.regex genome2010/regex.xml
 
Typical output
 
containing:
 
<results title="genome"><result pre="e of 10 primary tumors and 10 effusions was analyzed using the Array-Ready Oligo set for the Human " name0="genome" value0="Genome" post="platform. Results for selected genes were validated using PCR, Western blotting, and immunohistoche" xpath="/html[1]/body[1]/div[1]/div[2]/p[1]"/>
<result pre="otting, and immunohistochemistry confirmed the array findings for BCAR1, CLDN4, VIL2, and DCN. Our " name0="data" value0="data" post="show that breast carcinoma cells in primary carcinomas and effusions have different gene expression" xpath="/html[1]/body[1]/div[1]/div[2]/p[1]"/>
<result pre="l, and their clinical relevance was analyzed in a larger series of breast carcinoma effusions. Our " name0="data" value0="data" post="demonstrate that in agreement with our previous observations, breast carcinoma cells in effusions a" xpath="/html[1]/body[1]/div[1]/div[3]/p[3]"/>
 
now create your own file ids.xml like:
<compoundRegex title="ids">
<regex weight="1.0" fields="egad">(EGAD\d{11})</regex>
</compoundRegex>
 
 
Fiona N •Challenge 1 : Can we find data accession numbers? 
Jee-Hyub K
  • Europe PubMed Central provides web services for mined accession numbers.
  • Accession number search
 
Fiona N •Challenge 2 : Can we find data DOIs? 
•Challenge X : …?
•Interrupted by coffee breaks and lunch J
•Summary 
Open Access scientific journals in biology that can be readily mined: 
 
  • PLoS
  • Biomed Central
  • eLife
 
...
651 days ago
Proactive P Notes for the day 11 Dec 2015: 
 
 
 
Fiona N Tweet with hashtag: 
#CMDNAHack please use when twitting!
 
And find us on Twitter: @DNAdigest @theContentMine @linguamatics 
 
 
 
Peter M THIS COMMAND WORKS for PMR, suggest you use different years
getpapers --query '"human genomic" AND PUB_YEAR:[2010 TO 2010]' -o genome2010 -x
 
Antony Q Delete empty directories:
cd genome2010
find -empty -delete
cd ..
 
Peter M then normalize:
norma -q genome2010 -i fulltext.xml -o scholarly.html --transform nlm2html
 
then word frequencies
ami2-word -q genome2010 -i scholarly.html --w.words wordFrequencies  --w.stopwords /org/xmlcml/ami2/plugins/word/stopwords.txt
 
and regex
 
Antony Q ENA (European Nucleotide Archive) regex (from http://www.ebi.ac.uk/miriam/main/collections/MIR:00000372):
^[A-Z]+[0-9]+$
 
 
 
José M More often than not, i utilise https://regex101.com for helping me develop and test regular expressions and might be useful for other people here, particularly regular expression newbies. For example: https://regex101.com/r/cA9aK7/1
 
  • Notes
 
760 days ago
Unfiled. Edited by Fiona Nielsen 760 days ago
Here are photos for the discussion boards from both 'open discussion' sessions at the DNAdigest Symposium
 
Fiona N In the first Discussion session we focused on
 "What are the challenges to data sharing?"
 
 
 
 
 
One group even produced two pages of notes! 
 
 
 
 
In the second discussion second we focused on 
"What are the best practises?" and wrote down recommendations in two groups
 
 
 
 
 
 
 
 
764 days ago
Unfiled. Edited by Fiona Nielsen 764 days ago
Fiona N DNAdigest Symposium - Incentives for data sharing in genomics research
 
 
9:30 Arrivals (Tea/coffee is available)
9:45-10:00 Introduction to DNAdigest and the outline for the day
 
Part I “Multiple perspectives on data sharing”
10:00 Natalie Banner from Wellcome Trust
“Incentives for data sharing in genomics: a funder’s perspective”
10:25 Neil Walker from University of Cambridge
“Turning policy into practice: barriers to data sharing, as seen by the people who make the data”
10:50  Shahid Hanif from the Association of the British Pharmaceutical Industry (ABPI)
“The opportunities and challenges of sharing genomics data with the pharmaceutical industry”
 
11:15 – 13:00 Open space discussion
“What are the reasons to share? What are the hurdles?” (Tea/coffee is available)
 
13:00 – 13:30 Lunch break
Part II “Tools and workflows” 
13:35 – 14:00 Roland Roberts from PLoS
“How to publish research-related data?”
14:00 – 14:25 Jean Liu from Altmetric
“How can attention paid to datasets be identified and measured?”
14:25 – 14:50 Mark Hahnel from Figshare
“What is the point of openly available academic research data?”
14:50 – 16:30 Open Space discussion
“What are the best practices for data sharing, and how can the hurdles be overcome?”
 
16:30 – 16:45 Summary of learning points from each group.
 
16:45 – 17:00 Fiona Nielsen from DNAdigest
“How to win grants from H2020 by sharing data” (Tea/coffee and cake is available)
 
789 days ago
Unfiled. Edited by Fiona Nielsen 789 days ago
Fiona N schedule for the day: 
9:30 Arrivals
name badges - grab a sticker and write your name 
And drink some coffee & tea! 
 
9:45-10:00 Briefing of the schedule for the day by Fiona  
  • introduction to the DNAdigest team of volunteers today: Kasia, ..., Nadia, Fiona Nielsen
  • Who are the participants today? 
  • Who uses human genomic data in their work? 
  • Who are other types of researchers? 
  • Who else do we have in the room? 
 
and the outline for the day
  • 1 slide intro to the schedule and the topics of the inspirational talks
  • you come up with ideas for the discussion sessions
  • grab post-it notes and pen 
 
10:00 Invited talk: The Funders Perspective
Natalie Banner from Wellcome Trust
(insert bio and title/abstract)
 
10:25 Invited talk: The Researchers Perspective
Neil Walker from University of Cambridge 
(insert bio and title/abstract)
 
Margarita K 10:50 - 13:00 
Fiona N What are the reasons to share? What are the hurdles? 
Margarita K workshop discussion in the form of Open Space (coffee is available) (2hr)
Fiona N
  • let your topic known to the session leader who will assign locations to the groups
  • join the discussion! 
  • Make a hackpad to take notes from your discussion
--> click the second icon from the left in the top bar to create a new pad :) 
 
If we have more invited speakers, we can reduce to just one Open Space discussion, after lunch
 
 
Margarita K 13:00 - 13:30 Lunch break
 
 
Fiona N 13:35 - 14:00 Invited talk: The publishers perspective (title tba)
Roland Roberts from PLoS (insert bio and title/abstract)
 
14:05 - 14:25 Invited talk: tools and workflows (title tba) 
Jean Liu from Altmetrics (insert bio and title/abstract)
 
14:30 - 14:50 Invited talk: tools and workflows (title tba)
Mark Hahnel from Figshare (insert bio and title/abstract)
 
Each presentation will be approximately 20 minutes, including some time for Q&A.
 
14:50 - 16:30 second Open Space discussion (1.5hr)
What are the best practices for data sharing, and how can the hurdles be overcome? 
  • let your topic known to the session leader who will assign locations to the groups
  • join the discussion! 
  • Make a hackpad to take notes from your discussion
 
16:30 - 17:00 Summary of learning points from each discussion group
 
 
 
If you enjoy participating in these events please donate to DNAdigest by texting DNAD14 £10 to 70070, so that we can continue organizing more of these interactive events in the future. :)
 
---
 
 
Venue in London
 
  • Invite speakers & communicate the plan for the event (Nadia & Fiona)
 
927 days ago
Unfiled. Edited by Fiona Nielsen 927 days ago
Ines S
  • (Fiona Nielsen I wrote in the "first" person, feel free to change the text the way you prefer)
Fiona N
  • Sounds good Ines de Santiago I am almost done putting this into a blog post with my own additions and links added. Should be sent round to all participants shortly :) 
 
Ines S
Fiona N
  • Yup, I have made my additions in the post on the blog, almost ready to publish
 
Ines S
Fiona N
  • Tim Richardson could you please help by pasting the right photo here? 
 
Ines S
  • Tim Richardson I was trying to end in a more positive way. Feel free to cahnge everything I wrote! not attached to it, I just wanted to start somehow!
Fiona N
  • Tim Richardson could you summarise the challenges 1) 2) 3) that need to be addressed to make progress on this? This is the last piece missing before we can publish the post on the DNAdigest blog
 
Ines S  So, this is where we are right now, we are open to more Ideas and contributors to this project! Please join the discussion at http://dnadigest.hackpad.com and join our DNAdigest meetup group to discuss further. 
 
All code produced so far can be found at the DNAdigest bitbucket repository.
 
We were very happy to welcome all of our participants and hope to see you soon again on our next brainstorming sessions.
 

Contact Support



Please check out our How-to Guide and FAQ first to see if your question is already answered! :)

If you have a feature request, please add it to this pad. Thanks!


Log in