Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
Chris Miller

Chris Miller

Bioinformatics Grad student at Baylor College of Medicine. My online home is at
BlogTwitterBlogCustom RSS/AtomCustom RSS/Atom
C: Visualization tools for NGS analysis results, suitable for biologists -
The visualizations that will be most useful depend to a huge extent upon the design of the study. There are many, many things you might want to explore in this data, so you're going to have to narrow it down to get reasonable recommendations. - Chris Miller
C: Can anyone suggest me a script based pipeline for exome sequencing with paired end reads generated by Illumina for tumor samples. -
This would probably be better off posted as a new question. It's likely that only the two or three people involved in this thread will notice that you've posted it here. That said, This previous question may help: What is the default quality encoding expected by BWA? - Chris Miller
C: genome error rate calulation for mutation -
This sounds like a homework question. If it is, then pasting your assignment here is not the way to get help. If it is not homework, please explain why you're trying to figure this out and we may be able to provide assistance. - Chris Miller
A: Can anyone suggest me a script based pipeline for exome sequencing with paired end reads generated by Illumina for tumor samples. -
What you're asking here is probably beyond the scope of a Q/A site. To properly review all of these steps and provide feedback and suggestions would take hours. If you really need that level of support, then you're going to want to pay someone a consulting fee to help you get your pipeline set up. If you have specific questions about individual steps or commands, then Biostar can be a great resource, and please do feel free to ask questions. I'd encourage you to look through old posts first, as many of these topics have been addressed individually in the past. - Chris Miller
A: BreakDancer + SquareDancer -
The squaredancer code is buried in one of the other repos. Here's a direct link to the perl script: - Chris Miller
C: Epigenomics Contest: How Many Flaws Can You Find in this Paper? -
You're posting this article hoping to pick some sort of fight, as you've done previously (Why Does Biostar Cover Questions on Epigenetics, but not Intelligent Design?). That's not cool. There is probably a good post to be made that outlines specific criticisms of the paper and starts a healthy discussion about the merits of the science. This, unfortunately, is not that post. - Chris Miller
C: My NCBI Curriculum Vitae Web Application: SciENcv -
Aha - I missed the biosketch export at the bottom. That seems fairly useful. - Chris Miller
It's just a convention. In most applications normalizing for higher coverage that correlates to GC regions is essentially the same as normalizing for lower coverage in AT regions. The point is that base composition correlates with some parameter that's being adjusted. - Chris Miller
C: Epigenomics Contest: How Many Flaws Can You Find in this Paper? -
This kind of axe-grinding post doesn't have any place here. If you'd like to offer critiques of this science and start a discussion, that's fine, but just posting a link and calling it "junk" isn't really constructive. - Chris Miller
A: My NCBI Curriculum Vitae Web Application: SciENcv -
Anybody tried it yet? Is this going to be worth filling out, or would we be better off just pointing people to a Google Scholar page? - Chris Miller
A: Asking the developers before asking Biostars -
We actually discussed this with Istvan about a year ago, and have been pointing users of our software here for support. We then have a tool that monitors the RSS feed for specific keywords or tags and then notifies the appropriate people to come answer it. (rss2jira) We do try to be clear that bug reports don't belong here and patches and such should go on github. This has had a number of positive effects. The answers to commonly asked questions about our tools are now publicly accessible, indexed, and on a site where lots of people can find them (as opposed to buried away in some obscure mailing list archives). It also drives traffic to Biostar and makes lots of users aware of the community. Finally, several people who have started out asking questions about Breakdancer or Somatic Sniper here have gone on to become regular contributors to the site. - Chris Miller
A: to open file from dbGAP -
The first result for a search on "ncbi_enc" is this page: It says: "The data files distributed through the dbGaP are all encrypted by NCBI’s special encryption algorithm. These files have a file suffix “.ncbi_enc”, indicating that they are NCBI encrypted files." That page also contains a link to the archive and encryption utilities. - Chris Miller
A: What does the term low pass mean? -
That doesn't really make sense. "Low-pass" generally refers to a genome that's sequenced to a depth under 10x. With this data, you can call germline SNPs, find structural variants, etc. It's not particularly useful for cancer sequencing though, as somatic variants are difficult to discern and forget about finding subclonal variants. - Chris Miller
A: liftOver bam file -
This is a bad idea. Since the genome assembly that the reads were mapped to are different, you really need to realign your data. There will undoubtedly be many places where reads map to different places than where liftover would place them, due to the differences between the assemblies. Convert the bam back to a fastq with picard, then redo the mapping with the aligner of your choice. - Chris Miller
A: Staff Scientist (Cancer Genomics) - The Genome Institute, Washington University, STL -
Just chiming in to say that WashU is a great place to work. We've got some fantastically interesting projects going on that will help to shape the future of genomic medicine. - Chris Miller
C: Intersect gene annotation with specific position or genomic interval -
I'm not immediately familiar with that format, but it probably contains lines labelled "intron", "cds_exon", "rna", etc. Grep out the lines you want, do your intersection, then collapse by gene name if necessary - Chris Miller
A: Intersect gene annotation with specific position or genomic interval -
Instead of intersecting with some monolithic gene track containing everything, you're going to want to intersect with a track containing only exons. Since I don't know exactly what your data looks like, I can't tell you exactly how to accomplish this. If you don't have coordinates for specific exons in your data, you can download such tracks from UCSC genome browser or Ensembl easily enough. - Chris Miller
C: Lolliplot link on the TVAP website -
Glad to hear that you got it working, Christian. If you have a stand-alone version that you'd like to contribute back to the community, we'd be happy to put it up on our site. (with proper credit given to you, naturally). One of our motivations for open-sourcing all of our code is to enable people to use our tools - even the ones we haven't had time to package up neatly yet! Even rough code might give someone else a cleaner place to start. - Chris Miller
C: Where do I start to make career in bioinformatics? -
Yes. You should not have to pay for a PhD in the hard sciences. In fact, you will get a modest stipend to cover your living expenses and such. One more suggestions - your english seems passable in writing, but brushing up on speaking that language certainly won't hurt your chances if granted an interview - Chris Miller
C: CNVNator deletion calls all based on mapping quality zero reads? -
I'm afraid I don't quite understand what you're asking. Can you try rephrasing a little and giving a more thorough example? Defining the columns in the output you pasted would help as well. - Chris Miller
C: Retreiving data from TCGA database -
I think you misread my comment. Several years ago, I was not in a TCGA group and still got access and published using TCGA data. No one is hoarding data - it is freely accessible via the TCGA data portal and CGHub. (Here, for example, are all somatic mutations found in the AML cohort: When such data contains information that is potentially identifying (like raw sequence reads and germline variant calls), the NIH requires that you fill out a short form so that they can verify you're using the data for research. This is not a difficult hurdle. - Chris Miller
A: permutation method in cancer data set (same as MutSic) -
It's not clear whether you're referring to MuSiC or MutSig. MutSig is just a method for determining the significance of recurrently mutated genes. MuSiC is a suite of tools that includes an SMG test, but also includes other tools, like one for inferring the significance of correlation and mutual exclusivity. The difference between the two statements seems to be in the restrictions that they placed on the permutation test. The difference is subtle, but has to do with the difference between counting mutations and counting mutated genes. These are different numbers. (you could have 25 samples with TP53 mutated, but they could be hit with 50 mutations, two per sample) Test 1: 1) preserve the number of samples with a given gene mutated (if 23 samples have TP53 muts, this is maintained) 2) preserve the number of genes mutated in a given sample (if sample X has 50 mutated genes, this is maintained) Test 2: 1) preserve the number of mutations in a given gene (if there are 50 TP53 mutations,... - Chris Miller
A: How to calculate degree of deletion and amplification of CNV given SNP array data from TCGA? -
This is a straightforward coding exercise that can be accomplished with a few lines of perl, or by using something like bedTools. Essentially, you're going to take a file containing coordinates for every gene, and intersect it with the regions of copy number alteration. Watch out for edge cases - what happens when a gene spans two or more copy number regions? - Chris Miller
A: Difference between somatic and germline variant calling? -
To rehash/expand on what Dan said, if you're sequencing normal tissue, you generally expect to see single-nucleotide variant sites fall into one of three bins: 0%, 50%, or 100%, depending on whether they're heterozygous or homozygous. With tumors, you have to deal with a whole host of other factors: Normal admixture in the tumor sample: lowers variant allele fraction (VAF) Tumor admixture in the normal - this occurs when adjacent normals are used, or in hematological cancers, when there is some blood in the skin normal sample Subclonal variants, which may occur in any fraction of the cells, meaning that your het-site VAF might be anywhere from 50% down to sub-1%, depending on the tumor's clonal architecture and the sensitivity of your method Copy number variants, cn-neutral loss of heterozygosity, or ploidy changes, all of which again shift the expected distribution of variant fractions These, and other factors, make calling somatic variants difficult and still an area that is being... - Chris Miller
C: Mapping of Bisulfite Sequencing reads -
Thanks! That's really helpful. - Chris Miller
C: Retreiving data from TCGA database -
Yes, your comment is way off topic, but I'll briefly respond to say that I haven't seen this behavior. In grad school, I was in a small lab with no direct connection to TCGA and we had no problems getting access to the protected data. Yes, you need to state a rough research plan so that they can verify that you'll safeguard protected patient information. Yes, you also need to wait until the marker paper is published, as the people who worked so hard to generate the data get the first shot at one general paper describing the dataset. I don't feel like that's unreasonable. - Chris Miller
C: Difference between somatic and germline variant calling? -
Well, single cell has a role to play (and would have more of one if WGA wasn't so lossy), but realistically, you can't sequence billions of cells from a tumor individually. Bulk sequencing still is going to have a role for quite a while. - Chris Miller
C: How to analyze your own exome -
This seems to be out of place in the tutorial section and reads more like an advertisement. Maybe it would fit better in the Forum section? - Chris Miller
A: Using a filtered MAF when calculating the background mutation rate -
I wouldn't use the non-filtered file. It doesn't make much sense to calculate a background mutation rate based on data that you don't believe is accurate. - Chris Miller
C: what is the temperature at which summer squash trypsin inhibitors get denature? -
Closing this as offtopic - it's not a bioinformatics question. - Chris Miller
Other ways to read this feed:Feed readerFacebook