We've got a collection of 4 million sequences, exported as a FASTA file with just GIs in the headers. When I ran formatdb on this collection using `formatdb -i jeleskolab.faa` it split the collection in half to jeleskolab.00.<ending> and jeleskolab.01.<ending>. When I run blast2 using `-d /path/to/jeleskolab.faa` (to use the DB) I get very slow performance and different (and fewer) hits than using `-j /path/to/jelesko.faa` (using the straight FASTA file as subject). Any ideas what I screwed up?
- Chris Lasher
Also, does anyone have experience using a formatdb-formatted DB with the FASTA search program (from the Pearson lab)?
- Chris Lasher
Have not used FASTA to search BLAST DB. You're getting 00 and 01 because of the default database volume size (-v switch); this was 4000 million letters last time I checked. Don't use /path/to/file, set up a .ncbirc config file in $HOME to point to your BLASTDB directory. Did you use the '-o T' switch for formatdb (create indices?) And I assume it's protein, or you need '-p F'. Try "formatdb - | less" for more help.
- Neil Saunders
@Neil Thanks! Saw split files was due to character limit. Had no clue about .ncbirc. (Can't find it in the man pages of blastall and formatdb, where do they document that?) Didn't use -o since the headers aren't in "Defline Format" i.e., not prefixed with "lcl|" (ell-see-ell pipe). May have to write one-liner to do that though if that's important for speedup. Indeed, it's protein.
- Chris Lasher
The ncbirc docs are scattered all over the place - I think they're in a standalone BLAST README somewhere. Can send you a sample if you like.
- Neil Saunders
Hmm, where do the blastdb files need to go once generated, if you're not supposed to use a path? Also, I'm using the Ubuntu (Debian) package for blast2, so I didn't think the ncbirc would be necessary.
- Chris Lasher
Well, I'm not really certain what the '-j /path/to/fastafile' was producing but the results are quite different, and the fact that it runs much faster than using BLASTDB makes me very suspicious. I must not be running it correctly with that option.
- Chris Lasher
I'm not sure how Ubuntu does the installation - I prefer to set it up manually. It's possible that there's something like /etc/ncbi/.ncbirc. Basically, you put the DB files wherever you like, then specify their path in .ncbirc using BLASTDB=/path/to/directory. Then when running BLAST you just specify "-d nr" (or whatever) instead of "-d /path/to/nr".
- Neil Saunders