從 blastdb 中提取 GI 和 taxid

可以使用 blastdbcmd 從 blastdb 中提取資料,blastdbcmd 應該包含在爆炸裝置中。你可以從以下選項中指定 -outfmt 的一部分,包括哪些後設資料以及包含的順序。

從手冊頁:

 -outfmt <String>
   Output format, where the available format specifiers are:
       %f means sequence in FASTA format
       %s means sequence data (without defline)
       %a means accession
       %g means gi
       %o means ordinal id (OID)
       %i means sequence id
       %t means sequence title
       %l means sequence length
       %h means sequence hash value
       %T means taxid
       %X means leaf-node taxids
       %e means membership integer
       %L means common taxonomic name
       %C means common taxonomic names for leaf-node taxids
       %S means scientific name
       %N means scientific names for leaf-node taxids
       %B means BLAST name
       %K means taxonomic super kingdom
       %P means PIG

示例程式碼段顯示瞭如何從 blastdb 中提取 gi 和 taxid。所述 NCBI 16SMicrobial (FTP)blastdb 被選擇用於本實施例中:

# Example:
# blastdbcmd -db <db label> -entry all -outfmt "%g %T" -out <outfile>
blastdbcmd -db 16SMicrobial -entry all -outfmt "%g %T" -out 16SMicrobial.gi_taxid.tsv

這將生成一個檔案 16SMicrobial.gi_taxid.tsv,如下所示:

939733319 526714
636559958 429001
645319546 629680