Sign in
or
Join FriendFeed
FriendFeed
is the easiest way to share online.
Learn more »
Join FriendFeed
Amazon Cloud Comp Workshop 6/2010
:
Dawei lin
Jon Sorenson, Pacific Biosciences: Cloud Computing Strategies for Next Generation of Sequence Analysis
June 8, 2010
-
Comment
-
Like
-
Share
Began by acknowledging PacBio is a part of effort creating data problems. -
Dawei lin
1-3 bases incorporated per second, 80K ZMW monitored parallelly -
Dawei lin
Dataflow: 30 min- 4TB, Movie2Trace -> Trace 100GB -> Trace2Pulse -> Pulse (heigh, width, interbase time) -> 30 Min -1GB -
Dawei lin
Filter -> Mapping (De novo assembly, reference alignment) -> consensus (Simple, Bayesian, HMM) -> identify variants > finished human genome 150GB (for 30X human) -
Dawei lin
because quick reaction time, it makes sense to do the analysis at real time too -
Dawei lin
It has 96 well. Steps Design job, monitor jobs, view data. -
Dawei lin
SMRT View. PacBio's genome browser -
Dawei lin
customers: Genome Centers, Service labs, Genomics institutes, Core labs, individual PIs, Clinical lab -
Dawei lin
It targets to have data analysis results back in 15 minutes -
Dawei lin
Software in Cloud is more maintainable , budget-able -
Dawei lin
Circular consensus is a way to make high quality sequences -
Dawei lin
Strobe sequencing: u=620 sigma=40 -
Dawei lin
Event-based information model -
Dawei lin
visualization and standardization make complexity manageable -
Dawei lin
10,000 genomes is ~2PB of data. -
Dawei lin
Hadoop: wasn't good for generic customer depolyment, did not play well with existing scheduler, but these have all been changed. -
Dawei lin
PacBio still use a lot of structure binary files. -
Dawei lin
Recreational and commercial genomics. -
Dawei lin
Cloud computing infrastructure, application stack, niche genomics Saas -
Dawei lin