Cool presentation. Slide 63, "Accessibility" with [24/7, East coast, West coast, Down the Corridor] should also have a very thin, broken dotted line that says "Australia
- Andrew Perry
very cool presentation. Would Google help with space in some of their data-centers ?
- Pedro Beltrao
Thanks one and all! The problem is really two fold: handling the raw data, and managing what people want to do with it, the analysis. We have the handling in place (_lots_ of disks, at present), but we need to start thinking about analysis.
- Matt Wood
so lets say I want to stop using local data and I want to use EBI's databases from the cloud to request a filtered set of data and analyze it in my own way it, there might be some ways of doing it but the work flow would be too slow due to data movement. Are you suggesting that the analysis be done on the cloud as well ? That I upload my own (smaller) data sets and analysis tools and be able to use some virtual computer with access to EBIs databases ?
- Pedro Beltrao
That's one reason I thought we might need some form of CDN for all these data. The sheer size of sequencing data are huge, but after all the images are processed it's manageable IF we can distribute it at a web scale (as opposed to hitting a particular server).
- Deepak Singh
which is pretty much what Matt is talking about as of this writing (watching the screencast)
- Deepak Singh
@Pedro: exactly. You could imagine firing up a virtual machine that comes pre-packaged with the tools to access and search the distributed data store, along with useful analysis tools: MAQ, BLAST, whatever. However, we're shouldn't be looking to be all things to all people. Such facilities should be focused on scientific requirements.
- Matt Wood
Just as a point of reference, one of the BioTeam offerings is essentially exactly that ... a set of tools provided essentially as a packaged virtual machine.
- Deepak Singh