Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
Allyson Lister
SIG: DAM and BOSC Joint Session: BioHDF: Open binary file formats for large-scale data management, Mark Welsh, geospiza.com
...aka Toward Scalable Bioinformatics Infrastructures - Allyson Lister
@Allyson: Thanks for setting up all those talk threads! - Oliver Hofmann
Ouch: "Byte-ing off more than you can chew"!! - Allyson Lister
HDF5 is a model and file format for large complex data: http://www.hdfgroup.org/HDF5... - Brad Chapman
@Oliver - no problem - I just check before posting that someone hasn't done it before me :D - Allyson Lister
Allyson -- seconded. Great work. Nice to have others here. I refuse to comment on that pun-filled slide. - Brad Chapman
Problem: handling mutiple next-gen sequencing data while still being able to drill down to the original sequence reads. Aim is to generate domain-specific HDF5 extensions to move away from a flat-file format. - Oliver Hofmann
@Brad - it's definitely more fun to liveblog in a community of bloggers :) - Allyson Lister
HDF5 is a fairly complicated API. BioHDF layers a biology oriented interface on top of it, targeted at next generation sequencing especially. Appears to sit on top of one or more HDF databases. Having trouble finding the code itself. Geospiza BioHDF page is here: http://www.geospiza.com/researc... - Brad Chapman
I wonder when the time is right to start referring to "current-gen sequencing", soon we'll have to start saying "next-next-gen" when talking about single-molecule sequencing. - Andy Jenkinson
The BioHDF page seems to be here: http://hdfgroup.com/project... Edit: Nevermind, that link resolves to the HDF front page... - Oliver Hofmann
He has a bagful of thumb drives with the HDF software pre-loaded. good idea! - Allyson Lister
@Andy - very funny :) I know people who, similarly, hate to hear the phrase "post-genomic era"! - Allyson Lister
Do we get to keep the drives :) ? - Oliver Hofmann
plug for piotr's talk in the BIo* update session tomorrow - Jim Procter
I'm still confused. Hierarchical data. Random access to the data. Is this not a file system? Why not use a filesystem? - Phil Lord
Allyson — thanks for blogging my talk at BOSC / ISMB. And, I'm afraid we're already using the term "next-next-gen" to describe technologies like those of Helicos or Oxford Nanopore. - Mark Welsh
Ah, the pun-filled slide... in my defense, those were other peoples' puns; however, I'm guilty of propagating them (anything for a laugh in a technology talk). - Mark Welsh
You can download the current BioHDF prototype software from the "BioHDF Command Line Tools" link here: http://www.hdfgroup.org/project... - Mark Welsh
The BioHDF slides are available here on slideshare (along with all the other talks from the conference): http://www.slideshare.net/bosc... - Mark Welsh
There will be a scivee.tv presentation of our ISMB poster also — looks like its not up yet though. - Mark Welsh
HDF is kind of like a file system within a single binary file. However, those "files" (called datasets in HDF) are multi-dimensional arrays with each element being an arbitrarily complex data structure. - Mark Welsh
@Mark - thanks for the extra information and comments. It's always good to be able to go back and look over the slides. :) - Allyson Lister