class Bio::Blast
Description¶ ↑
The Bio::Blast
class contains methods for running local or remote BLAST searches, as well as for parsing of the output of such BLASTs (i.e. the BLAST reports). For more information on similarity searches and the BLAST program, see www.ncbi.nlm.nih.gov/Education/BLASTinfo/similarity.html.
Usage¶ ↑
require 'bio' # To run an actual BLAST analysis: # 1. create a BLAST factory remote_blast_factory = Bio::Blast.remote('blastp', 'swissprot', '-e 0.0001', 'genomenet') #or: local_blast_factory = Bio::Blast.local('blastn','/path/to/db') # 2. run the actual BLAST by querying the factory report = remote_blast_factory.query(sequence_text) # Then, to parse the report, see Bio::Blast::Report
See also¶ ↑
References
¶ ↑
Attributes
Full path for blastall. (default: ‘blastall’).
Database name (-d option for blastall)
Filter option for blastall -F (T or F).
Output report format for blastall -m
0, pairwise; 1; 2; 3; 4; 5; 6; 7, XML Blast
outpu;, 8, tabular; 9, tabular with comment lines; 10, ASN text; 11, ASN binery [intege].
Substitution matrix for blastall -M
Options for blastall
Returns a String containing blast execution output in as is the Bio::Blast#format
.
Program name (-p option for blastall): blastp, blastn, blastx, tblastn or tblastx
Server to submit the BLASTs to
Public Class Methods
This is a shortcut for Bio::Blast.new
:
Bio::Blast.local(program, database, options)
is equivalent to
Bio::Blast.new(program, database, options, 'local')
Arguments:
-
program (required): ‘blastn’, ‘blastp’, ‘blastx’, ‘tblastn’ or ‘tblastx’
-
db (required): name of the local database
-
options: blastall options \
(see www.genome.jp/dbget-bin/show_man?blast2)
-
blastall: full path to blastall program (e.g. “/opt/bin/blastall”; DEFAULT: “blastall”)
- Returns
-
Bio::Blast
factory object
# File lib/bio/appl/blast.rb 78 def self.local(program, db, options = '', blastall = nil) 79 f = self.new(program, db, options, 'local') 80 if blastall then 81 f.blastall = blastall 82 end 83 f 84 end
Creates a Bio::Blast
factory object.
To run any BLAST searches, a factory has to be created that describes a certain BLAST pipeline: the program to use, the database to search, any options and the server to use. E.g.
blast_factory = Bio::Blast.new('blastn','dbsts', '-e 0.0001 -r 4', 'genomenet')
Arguments:
-
program (required): ‘blastn’, ‘blastp’, ‘blastx’, ‘tblastn’ or ‘tblastx’
-
db (required): name of the (local or remote) database
-
options: blastall options \
(see www.genome.jp/dbget-bin/show_man?blast2)
-
server: server to use (e.g. ‘genomenet’; DEFAULT = ‘local’)
- Returns
-
Bio::Blast
factory object
# File lib/bio/appl/blast.rb 316 def initialize(program, db, opt = [], server = 'local') 317 @program = program 318 @db = db 319 320 @blastall = 'blastall' 321 @matrix = nil 322 @filter = nil 323 324 @output = '' 325 @parser = nil 326 @format = nil 327 328 @options = set_options(opt, program, db) 329 self.server = server 330 end
Bio::Blast.remote
does exactly the same as Bio::Blast.new
, but sets the remote server ‘genomenet’ as its default.
Arguments:
-
program (required): ‘blastn’, ‘blastp’, ‘blastx’, ‘tblastn’ or ‘tblastx’
-
db (required): name of the remote database
-
options: blastall options \
(see www.genome.jp/dbget-bin/show_man?blast2)
-
server: server to use (DEFAULT = ‘genomenet’)
- Returns
-
Bio::Blast
factory object
# File lib/bio/appl/blast.rb 96 def self.remote(program, db, option = '', server = 'genomenet') 97 self.new(program, db, option, server) 98 end
Bio::Blast.report parses given data, and returns an array of report (Bio::Blast::Report
or Bio::Blast::Default::Report
) objects, or yields each report object when a block is given.
Supported formats: NCBI
default (-m 0), XML (-m 7), tabular (-m 8).
Arguments:
-
input (required): input data
-
parser: type of parser. see
Bio::Blast::Report.new
- Returns
-
Undefiend when a block is given. Otherwise, an Array containing report (
Bio::Blast::Report
orBio::Blast::Default::Report
) objects.
# File lib/bio/appl/blast.rb 113 def self.reports(input, parser = nil) 114 begin 115 istr = input.to_str 116 rescue NoMethodError 117 istr = nil 118 end 119 if istr then 120 input = StringIO.new(istr) 121 end 122 raise 'unsupported input data type' unless input.respond_to?(:gets) 123 124 # if proper parser is given, emulates old behavior. 125 case parser 126 when :xmlparser, :rexml 127 ff = Bio::FlatFile.new(Bio::Blast::Report, input) 128 if block_given? then 129 ff.each do |e| 130 yield e 131 end 132 return [] 133 else 134 return ff.to_a 135 end 136 when :tab 137 istr = input.read unless istr 138 rep = Report.new(istr, parser) 139 if block_given? then 140 yield rep 141 return [] 142 else 143 return [ rep ] 144 end 145 end 146 147 # preparation of the new format autodetection rule if needed 148 if !defined?(@@reports_format_autodetection_rule) or 149 !@@reports_format_autodetection_rule then 150 regrule = Bio::FlatFile::AutoDetect::RuleRegexp 151 blastxml = regrule[ 'Bio::Blast::Report', 152 /\<\!DOCTYPE BlastOutput PUBLIC / ] 153 blast = regrule[ 'Bio::Blast::Default::Report', 154 /^BLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ] 155 tblast = regrule[ 'Bio::Blast::Default::Report_TBlast', 156 /^TBLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ] 157 tab = regrule[ 'Bio::Blast::Report_tab', 158 /^([^\t]*\t){11}[^\t]*$/ ] 159 auto = Bio::FlatFile::AutoDetect[ blastxml, 160 blast, 161 tblast, 162 tab 163 ] 164 # sets priorities 165 blastxml.is_prior_to blast 166 blast.is_prior_to tblast 167 tblast.is_prior_to tab 168 # rehash 169 auto.rehash 170 @@report_format_autodetection_rule = auto 171 end 172 173 # Creates a FlatFile object with dummy class 174 ff = Bio::FlatFile.new(Object, input) 175 ff.dbclass = nil 176 177 # file format autodetection 178 3.times do 179 break if ff.eof? or 180 ff.autodetect(31, @@report_format_autodetection_rule) 181 end 182 # If format detection failed, assumed to be tabular (-m 8) 183 ff.dbclass = Bio::Blast::Report_tab unless ff.dbclass 184 185 if block_given? then 186 ff.each do |entry| 187 yield entry 188 end 189 ret = [] 190 else 191 ret = ff.to_a 192 end 193 ret 194 end
Note that this is the old implementation of Bio::Blast.reports
. The aim of this method is keeping compatibility for older BLAST XML documents which might not be parsed by the new Bio::Blast.reports
nor Bio::FlatFile
. (Though we are not sure whether such documents exist or not.)
Bio::Blast.reports_xml
parses given data, and returns an array of Bio::Blast::Report
objects, or yields each Bio::Blast::Report
object when a block is given.
It can be used only for XML format. For default (-m 0) format, consider using Bio::FlatFile
, or Bio::Blast.reports
.
Arguments:
-
input (required): input data
-
parser: type of parser. see
Bio::Blast::Report.new
- Returns
-
Undefiend when a block is given. Otherwise, an Array containing
Bio::Blast::Report
objects.
# File lib/bio/appl/blast.rb 219 def self.reports_xml(input, parser = nil) 220 ary = [] 221 input.each_line("</BlastOutput>\n") do |xml| 222 xml.sub!(/[^<]*(<?)/, '\1') # skip before <?xml> tag 223 next if xml.empty? # skip trailing no hits 224 rep = Report.new(xml, parser) 225 if rep.reports then 226 if block_given? 227 rep.reports.each { |r| yield r } 228 else 229 ary.concat rep.reports 230 end 231 else 232 if block_given? 233 yield rep 234 else 235 ary.push rep 236 end 237 end 238 end 239 return ary 240 end
Public Instance Methods
Returns options of blastall
# File lib/bio/appl/blast.rb 373 def option 374 # backward compatibility 375 Bio::Command.make_command_line(options) 376 end
Set options for blastall
# File lib/bio/appl/blast.rb 379 def option=(str) 380 # backward compatibility 381 self.options = Shellwords.shellwords(str) 382 end
Sets options for blastall
# File lib/bio/appl/blast.rb 254 def options=(ary) 255 @options = set_options(ary) 256 end
This method submits a sequence to a BLAST factory, which performs the actual BLAST.
# example 1 seq = Bio::Sequence::NA.new('agggcattgccccggaagatcaagtcgtgctcctg') report = blast_factory.query(seq) # example 2 str <<END_OF_FASTA >lcl|MySequence MPPSAISKISNSTTPQVQSSSAPNLTMLEGKGISVEKSFRVYSEEENQNQHKAKDSLGF KELEKDAIKNSKQDKKDHKNWLETLYDQAEQKWLQEPKKKLQDLIKNSGDNSRVILKDS END_OF_FASTA report = blast_factory.query(str)
Bug note: When multi-FASTA is given and the format is 7 (XML) or 8 (tab), it should return an array of Bio::Blast::Report
objects, but it returns a single Bio::Blast::Report
object. This is a known bug and should be fixed in the future.
Arguments:
-
query (required): single- or multiple-FASTA formatted sequence(s)
- Returns
-
a
Bio::Blast::Report
(orBio::Blast::Default::Report
) object when single query is given. When multiple sequences are given as the query, it returns an array ofBio::Blast::Report
(orBio::Blast::Default::Report
) objects. If it can not parse result, nil will be returnd.
# File lib/bio/appl/blast.rb 357 def query(query) 358 case query 359 when Bio::Sequence 360 query = query.output(:fasta) 361 when Bio::Sequence::NA, Bio::Sequence::AA, Bio::Sequence::Generic 362 query = query.to_fasta('query', 70) 363 else 364 query = query.to_s 365 end 366 367 @output = self.__send__("exec_#{@server}", query) 368 report = parse_result(@output) 369 return report 370 end
Sets server to submit the BLASTs to. The exec_xxxx method should be defined in Bio::Blast
or Bio::Blast::Remote::Xxxx class.
# File lib/bio/appl/blast.rb 264 def server=(str) 265 @server = str 266 begin 267 m = Bio::Blast::Remote.const_get(@server.capitalize) 268 rescue NameError 269 m = nil 270 end 271 if m and !(self.is_a?(m)) then 272 # lazy include Bio::Blast::Remote::XXX module 273 self.class.class_eval { include m } 274 end 275 return @server 276 end