Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
Dawei lin
Jon Sorenson, Pacific Biosciences: Cloud Computing Strategies for Next Generation of Sequence Analysis
Began by acknowledging PacBio is a part of effort creating data problems. - Dawei lin
1-3 bases incorporated per second, 80K ZMW monitored parallelly - Dawei lin
Dataflow: 30 min- 4TB, Movie2Trace -> Trace 100GB -> Trace2Pulse -> Pulse (heigh, width, interbase time) -> 30 Min -1GB - Dawei lin
Filter -> Mapping (De novo assembly, reference alignment) -> consensus (Simple, Bayesian, HMM) -> identify variants > finished human genome 150GB (for 30X human) - Dawei lin
because quick reaction time, it makes sense to do the analysis at real time too - Dawei lin
It has 96 well. Steps Design job, monitor jobs, view data. - Dawei lin
SMRT View. PacBio's genome browser - Dawei lin
customers: Genome Centers, Service labs, Genomics institutes, Core labs, individual PIs, Clinical lab - Dawei lin
It targets to have data analysis results back in 15 minutes - Dawei lin
Software in Cloud is more maintainable , budget-able - Dawei lin
Circular consensus is a way to make high quality sequences - Dawei lin
Strobe sequencing: u=620 sigma=40 - Dawei lin
Event-based information model - Dawei lin
visualization and standardization make complexity manageable - Dawei lin
10,000 genomes is ~2PB of data. - Dawei lin
Hadoop: wasn't good for generic customer depolyment, did not play well with existing scheduler, but these have all been changed. - Dawei lin
PacBio still use a lot of structure binary files. - Dawei lin
Recreational and commercial genomics. - Dawei lin
Cloud computing infrastructure, application stack, niche genomics Saas - Dawei lin