Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
The entry you requested has been deleted
Chris Miller

Chris Miller

Bioinformatics Grad student at Baylor College of Medicine. My online home is at http://www.chrisamiller.com/
BlogTwitterBlogCustom RSS/AtomCustom RSS/Atom
Answer by Chris Miller for split.screen function in R - http://biostar.stackexchange.com/questio...
Try the layout() function, which will give you all sort of fine-grained control over where your plots go and how they're spaced: For one example: http://stackoverflow.com/questio... - Chris Miller
The movie should come with dual titles: "The Joyous Miracle of Birth" and "There Will Be Blood"
At a childbirth class and I can't help but be amused at the contrast between the gory pictures and the soothing music and voiceovers.
Dogs are great self-esteem boosters. I can leave Charlie alone all day, but he'll always be ecstatically happy to see me when I get home.
This week's goal: live a life worth tweeting about. (I'm all about keeping standards low)
Oh, hi twitter. It's been a while.
I haven't looked at the source code, but I bet that it would be fairly easy to remove the code implementing that check. - Chris Miller
Can you give more information on what you're using these mapability tracks for? - Chris Miller
A great number of your genomic coordinates would change between the two mappings. To convert between the two, you can use the liftOver tool: genome.ucsc.edu/cgi-bin/hgLiftOver - Chris Miller
RT @GuyEndoreKaiser: Gingrich claims we'll have a moon base by his 2nd term. Do you know how hard it is to craft a sentence where moon base isn't the crazy part?
RT @sween: The difference between Firefly and the Republican debates? One presents a future where government crushes dissent. The other was cancelled.
Answer by Chris Miller for heatmaps in R with huge data - http://biostar.stackexchange.com/questio...
With really large data, even a big-ass server may not be enough. The best advice I can offer is to try reducing the size of your data set. Doing this will require intelligently thinking about what it is you're trying to represent. (related: are you really going to be able to pick out 500k individual points on a little graph the size of your computer screen?) If, for example, you're plotting expression data from an exon array, maybe you could merge the data and plot per-gene instead of per-exon. If you're looking for patterns of differential expression, maybe you could plot just the top 1000 most differentially expressed. (and so on) - Chris Miller
Answer by Chris Miller for Need help in understanding a BAM file - http://biostar.stackexchange.com/questio...
If you want the hairy details of how bam indexing works, Section 4 of the SAM specification document is what you're looking for. If all you're looking for is enough understanding to work with the bam files, it's sufficient to understand that it works in a way that is similar to a table of contents in a book and allows accessors to grab the appropriate sequences much more quickly. - Chris Miller
Can you explain why you want to merge these into a single bam? In my experience, it's usually easier to keep one bam per sample. - Chris Miller
I think he wants to split among multiple physical processors or machines. - Chris Miller
There may be an actual question behind this, but you're going to have to give a lot more information before we can help you. IF so, reel free to write up a more lengthy question and try again. - Chris Miller
I think VCF is going to be what you want to use here - Chris Miller
Answer by Chris Miller for small RNA (miRNA) mapping - http://biostar.stackexchange.com/questio...
Also consider your reference. If you only align to the standard human reference, you'll often miss reads that cross splice junctions. Some people handle this by using an aligner like Tophat that infers splice junctions, others append splice junctions to a standard reference and then do normal alignment. - Chris Miller
Answer by Chris Miller for M.S in Bioinformatics? - http://biostar.stackexchange.com/questio...
Start by looking through the previous posts that are found by searching with "career". There's lots of relevant info there. If after reading those, you still have specific questions, come back here, edit your post (or add a followup comment) and we'll be happy to answer them for you. - Chris Miller
I have 0 desire to get a Windows phone (not open source/platform), but I hope other people do. More competitors = more incentive to innovate
Agreed - looks like a custom R script to me. - Chris Miller
Answer by Chris Miller for why longer reads must be trimmed or divided into 36bp ? - http://biostar.stackexchange.com/questio...
From a quick glance at the abstract of that paper, I'm guessing that they wanted to be able to directly compare the results across many samples that were sequenced with different read lengths. Under typical circumstances, there shouldn't be any reason to split your reads up. In fact, longer reads allow you to map into repetitive regions that shorter reads can't access. This enhances your ability to detect CNV in these potentially unstable regions. - Chris Miller
Answer by Chris Miller for Bisulfite analysis with Illumina FastQ from different lanes - http://biostar.stackexchange.com/questio...
Three days isn't necessarily surprising, given the size of your files and depending on how many CPU cores you're throwing at the problem. If you have a cluster with many CPUs available, my advice would be to map the fastqs independently, then combine the results. (If you don't have this access, and are doing this on your desktop, you might just be better off waiting for it to finish at this point) Combining could be done directly after mapping by using "samtools merge" on your bams. If you prefer to wait until later to merge, there's no reason why a simple little script couldn't be used to combine the methylation reports. Those should simply be a list of genomic positions along with counts of methylated Cs, unmethylated Cs, and a ratio between the two. A few lines of perl/python/whatever language should be able to merge those for you. - Chris Miller
Answer by Chris Miller for bam and indexed bam files - http://biostar.stackexchange.com/questio...
A bai file isn't an indexed form of a bam - it's a companion to your bam that contains the index. A bam file is a binary blob that stores all of your aligned sequence data. You can view what's in the bam file using "samtools view bamfile.bam | less". Bam files can also have a companion file, called an index file. This file has the same name, suffixed with .bai. This file acts like an external table of contents, and allows programs to jump directly to specific parts of the bam file without reading through all of the sequences. Without the corresponding bam file, your bai file is useless, since it doesn't actually contain any sequence data. If you have a bam file without a corresponding index, you can generate one using "samtools index bamfile.bam". If your index file is named identically, with just the additional ".bai" suffix, you can be reasonably sure that it was generated from the same file. If you have any doubt, though, it's easy enough to delete your bai file, then generate a... - Chris Miller
Read Count approach for DNA copy number variants detection - http://www.citeulike.org/user...
Answer by Chris Miller for how to annotate a human DNA position - http://biostar.stackexchange.com/questio...
You might try snpEff or Ensemble's Variant Effect Predictor - Chris Miller
Answer by Chris Miller for samtools pileup format - http://biostar.stackexchange.com/questio...
I'm assuming that those characters are preceded by a ^, which means they represent the mapping quality. From the page you linked: Also at the read base column, a symbol '^' marks the start of a read segment which is a contiguous subsequence on the read separated by `N/S/H' CIGAR operations. The ASCII of the character following '^' minus 33 gives the mapping quality. - Chris Miller
Answer by Chris Miller for Where can I find Data sets of cancer publicly available? - http://biostar.stackexchange.com/questio...
If you hurry, you can grab reads from The Cancer Genome Atlas at the Sequence Read Archive here. This deposition of raw sequence data is hugely costly (and many would argue wasteful) so after Dec 31st, it won't be hosted there anymore. There may be a plan for hosting it elsewhere, but I'm not sure of the details at the moment. - Chris Miller
Heh - #GodIsNotGreat is trending. I'd like to think Hitchens would have been amused at the ignorance and outrage it's sparking.
Other ways to read this feed:Feed readerFacebook