I'm not familiar with the MuTect format, but IGV will load bed files happily (chr, start, stop, name - ie. chr1 1234 1235 A/T) I have to imagine some form of that info is the mutect output that you could munge into a bed file.
- Chris Miller
This will take the first million reads and give you counts of each read length. To look at more of the bam, get rid of the "head" command: samtools view file.bam | head -n 1000000 | cut -f 10 | perl -ne 'chomp;print length($_) . "\n"' | sort | uniq -c
- Chris Miller
Are you sure this is actually what you want to do? If you align reads from the whole genome to only chromosome 21, you will get false alignments. There are many regions with high similarity in the genome. For the purposes of illustration, let's imagine small regions on chromosome 1 and chromosome 21 that differ only by two base pairs. If you align only to chromosome 21, reads drawn from chromosome 1 will get matched up with their best alignment, which will be the highly similar region on chromosome 21. If you aligned against the whole genome, though, they would find a better (and correct) match on chr1. Thus, in many cases, it makes sense to align to the whole genome first, then subset out your chromosome 21 reads from the alignment file. If aligning to chr21 is really what you want to do, you should follow the instructions given by Sangwoo Kim - just create a new BWA index using only the chromosome 21 fasta.
- Chris Miller
To add to what Istvan says, the distribution of reads in a sequencing experiment is usually NOT Poisson, because of any number of biases (GC-content, mapability, others we don't fully understand). It's usually better approximated by a negative binomial distribution, which is essentially an overdispersed Poisson.
- Chris Miller
The relevant part of this error seems to be: "filename not available" Are you sure your bam path is correct and that the software can access it (permissions are set properly, etc)?
- Chris Miller
bamToBed would get you part of the way there, but there are two caveats: 1) You need per-chromosome bed files 2) readDepth actually only looks at the first and second columns. You can save yourself some disk by omitting the third column. Use the following command on your sorted bam file instead: for $chr in $(seq 1 22) X Y;do samtools view myfile.bam $chr:1-999999999 | cut -f 3,4 >$chr.bed;done If your bam uses the "chr" prefix ("chr1" instead of "1"), use this instead: for $chr in $(seq 1 22) X Y;do samtools view myfile.bam chr$chr:1-999999999 | cut -f 3,4 >chr$chr.bed;done Keep in mind that this will use a lot of disk space! Just make sure the files match your annotations. (either use "chr" or don't, but be consistent). Sidenote: If I had to write this package over again, it would read bam/sam natively, but a lot of this work was done before bam/sam was in widespread use. I'm working on a new followup tool that should be out this summer.
- Chris Miller
We currently have several openings at TGI. One fis or a statistically-oriented staff scientist, and two for more traditional software developers (though bioinformatics experience is a plus!). Though the job descriptions are often boring, the work is anything but! We're doing kick-ass genomic research on cancer and other human disease and building software pipelines designed to make biological insights from petabyte-scale data. Apply through the links below, but feel free to message me with questions and I'll either answer them or put you in touch with someone who can. Staff Scientist https://jobs.wustl.edu/psp... Software Developers (2) https://jobs.wustl.edu/psp......
- Chris Miller
Mark Gerstein is putting together a list of US programs in bioinformatics here and is soliciting additions: http://blog.gerstein.info/2013... I figured that if anyone can help him expand that list, it's the Biostar community.
- Chris Miller
There seem to be example files here - I'd suggest using them to determine the correct input format: http://ccmbweb.ccv.brown.edu/hotnet... Alternately, download the package and look at the enclosed README, which doesn't seem to be on their server. It appears to have useful information about formats. http://ccmbweb.ccv.brown.edu/hotnet...
- Chris Miller
FYI, a spam post still got through to the RSS last night. (id 62389). Just another data point, in case you hadn't noticed
- Chris Miller
RT @tlwriter: Of 79 residential nbhoods in STL City, 31 are either 90+% white or 90+% Af-American. 15 more are 80%+ one or the other. (cont...) #stlmayor
RT @Massgenomics: Probably the most impressive academic lab web site I've seen is that of the Stam Lab at the University of Washington: http://www.stamlab.org/
I feel like the 4-hour delay is a little long. In the interest of timely answers, can we cut it to 2 hours? Do we have enough international coverage with our moderators to cover spam deletion that quickly when it crops up at odd hours?
- Chris Miller