Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
Chris Miller

Chris Miller

Bioinformatics Grad student at Baylor College of Medicine. My online home is at
BlogTwitterBlogCustom RSS/AtomCustom RSS/Atom
Will Picard MarkDuplicates also un-mark duplicates? -
If I take a bam that's already aligned and has dups marked, remove a bunch of reads, then re-run Picard's mark-duplicates, will it correctly change the flags of reads that are no longer duplicates (but may have been before ditching the reads)? This question is proving surprisingly hard to google for. - Chris Miller
Vaccination works. Measles Incidence Rate Per 100,000 People, 1928 - 2003 (via
For all the policymakers that still don't get it: Building Bigger Roads Actually Makes Traffic Worse.
Answer: A: Tcga Lack Of Controls - Workarounds? -
It's often difficult to get appropriate matched normals for tumor methylation or expression data, as they'd have to be tissue-matched. If you're working with glioblastoma, you can't take a chunk of a patient's healthy brain. Same goes for blood cancers - it's really difficult to separate leukemic cells from non-leukemic cells and get a matched normal from the same patient. (for genomic DNA calls, you can just use non-proximal blood or skin) Your best bet is to find healthy samples of that tissue type from another source. I'd start with GEO, and would definitely consider pooling the normals to help smooth out differences specific to only one of your normal samples. Also beware of batch effects, since it's likely that different facilities generated the data. - Chris Miller
Answer: A: Comparing Exomes -How to get ALL variants -
What you probably want to do is call variants on both samples using standard parameters, merge and deduplicate the list of called sites, then get readcounts for each site from both samples. [bam-readcount]( is one tool that makes extracting this information easy. With that information, you can filter the results to your desired level of confidence, including the prior information that a variant was called in the other sample. - Chris Miller
Answer: A: Spliting a dataframe based in a chromosome -
There are many ways to do this. Here is one:   > a = data.frame(Chr=c(1,1,2,3),Position=c(1234,5678,3456,7890),LTR=c(2,3,4,5)) > a   Chr Position LTR 1   1     1234   2 2   1     5678   3 3   2     3456   4 4   3     7890   5 > for(i in unique(a$Chr)){write.table(a[a$Chr==i,],paste("newsplit",i,".txt",sep=""),sep="\t",quote=F,row.names=F)} - Chris Miller
Comment: C: Biostar 2.0, Second Beta, Help Us Test The New Site -
Possible solutions: 1) give the moderator the whole blurb of text up front and let them edit as they see fit (I think the template for a polite response is a good idea). 2) insert an intermediate step between the moderation form and posting, where the moderator can edit the auto-generated text block. 3) Include a polite note in a div with a slightly different color/font (think blockquote here) and let the moderator add text below it (thus preserving the separation between auto-text and my words). - Chris Miller
Answer: A: Which software should be used to get SNPs different from reference but same in a -
What you're looking to do is fairly straightforward. I'd just call SNPs with Samtools, then merge the lists with a little script. - Chris Miller
Comment: C: Would You Be Interested In Having A Usesthis Like Site For The Bioinformatics Co -
Certainly getting some big names (Heng Li, Ewan Birney, etc) would be nice for publicity, but just working your way through some of the top BioStar users would turn up lots of interesting tidbits, I'm sure. - Chris Miller
Answer: A: Biostar 2.0, Second Beta, Help Us Test The New Site -
I like many elements of the new design - it's a little cleaner. A few bug reports: The option to accept an answer should probably only appear if you are the person that asked the question. If I move an answer with a code block in it to a comment on the top level post, the upvote logo and count for that comment elements overlap the code block. (see The moderation comment auto-text could use some work. It seems like it would be good to visually make it clear which text was auto-filled, and which came from the moderator (rather than just sticking it in a PS block. - Chris Miller
Answer: C: tbl2asn Permission denied on Mac -
Hello jolespin - this post does not fit the main topic of this site, as is not a bioinformatics question. I'd suggest that you do some basic reading on file permissions in UNIX.  FWIW, 'chmod +x yourfile' should make it executable, which may solve your problem. - Chris Miller
Answer: A: Somatic mutation calling without matched normal -
The bottom line is that without a matched normal, you're just not going to be able to call the somatic status for the vast majority of sites. That said, you can winnow down a list to those you *suspect* are somatic. Some ideas: Weed out sites with high frequency in the population If your tumor is very impure, you can take advantage of the fact that the frequencies of somatic variants will be shifted away from 50%/100% - Chris Miller
Comment: C: Bwa-Meth: Align And Tabulate Bs-Seq Reads -
Boo for rejection, but major props for the transparent and reproducible analysis. The paper will find a good home soon, I'm sure. - Chris Miller
Answer: A: What is a repetitive region? -
A little background reading seems to be in order, eh? Maybe start here: The high-level summary is that all of these refer to regions of the genome that have exactly or nearly the same sequence.  Large swaths of our genome are made up of duplicated sequence - things like Alus, SINE, and LINEs, just to name a few.   These make many tasks, like mapping short reads, and assembly difficult, because it's hard to determine which of the many possible regions your particular read came from. - Chris Miller
Comment: C: How to make this kind of 3-D plotting for cancer subclones ? (picture attached) -
- Are your VAFs scaled between 0 and 100 (percentage) and not between 0 and 1 (fraction)? - If your data really does look like this, you could modify with the sc.plot2d function in the R code to tweak the xlim and ylim. - Chris Miller
Comment: C: Applying a task to several files in R -
I'm closing this up, since it's not a bioinformatics question, as noted below. This allow us to keep the site focused on the topics that the community can help with.  Please consider asking your question on Stack Overflow, or better yet, taking a little time to work through a basic R tutorial, which will help with both of the questions you've asked today. - Chris Miller
Comment: C: Would You Be Interested In Having A Usesthis Like Site For The Bioinformatics Co -
Sure, but you're going to have to give me a little time to get my thoughts together. If you haven't heard from me in a week or so, shoot me an email and remind me - Chris Miller
Answer: A: Software To Call Somatic Indels? -
Pindel is another generally good somatic indel caller: - Chris Miller
Comment: C: How Can I Create A Protein-Protein Interface Schematic Or Diagram For Easy Viewi -
I don't know which program that is, but if you need to visualize networks/interactions, Cytoscape is often the answer. - Chris Miller
Answer: A: Applying a task to several files in R -
here's some example code > files = Sys.glob("*.txt") > files [1] "1.txt"     "2.txt"     "3.txt" > for(i in files){   #your code here } Now I'm closing this up, since it's really a programming question and not a bioinformatics question.  Based on your last two questions, you would benefit from working through a short R tutorial. - Chris Miller
Comment: C: How to make this kind of 3-D plotting for cancer subclones ? (picture attached) -
1) That's not your sequencing coverage, that's the VAF. Seeing the highest clone at sub-20% suggests that your tumor purity is very low.  2) Those outlier points (and corresponding strange clusters) would almost certainly be cleaned up by adding copy-number information. You can derive it from exome data using VarScan, cn.mops, or other packages. - Chris Miller
Comment: C: What Are The Most Common Stupid Mistakes In Bioinformatics? -
You also occasionally see IUPAC codes pop up in fasta sequences. - Chris Miller
Comment: C: How Can I Create A Protein-Protein Interface Schematic Or Diagram For Easy Viewi -
Nice detective work! That link didn't work for me, so I pointed it to the pubmed page - Chris Miller
Comment: C: CNV analysis tool on exome data for NGS -
Lots of answers in these previous questions.  (If there weren't a bunch of answers on this question already, I would have closed it as a duplicate) - Chris Miller
Comment: C: Best Cnv Software? -
Paired tumor/normal samples or single samples? (somatic or germline CNVs?) - Chris Miller
Comment: C: Convert A Matrix Into A Hash Of Arrays -
Not a bioinformatics question. Closing. - Chris Miller
Answer: A: How to make this kind of 3-D plotting for cancer subclones ? (picture attached) -
I'll put in a plug for our sciClone package, which produces some nice plots, though nothing exactly like you're showing here. - Chris Miller
Comment: C: Where I can download the program CNV-TV -
I agree - implementations are crucial. (And I've been pretty harsh in some reviews of papers that don't provide one). - Chris Miller
RT @glentickle: Someone gave my wife a "Chemical Free Weed Killer" recipe. I made some corrections.
Other ways to read this feed:Feed readerFacebook