class Bio::Blast

Description

The Bio::Blast class contains methods for running local or remote BLAST searches, as well as for parsing of the output of such BLASTs (i.e. the BLAST reports). For more information on similarity searches and the BLAST program, see www.ncbi.nlm.nih.gov/Education/BLASTinfo/similarity.html.

Usage

require 'bio'

# To run an actual BLAST analysis:
#   1. create a BLAST factory
remote_blast_factory = Bio::Blast.remote('blastp', 'swissprot',
                                         '-e 0.0001', 'genomenet')
#or:
local_blast_factory = Bio::Blast.local('blastn','/path/to/db')

#   2. run the actual BLAST by querying the factory
report = remote_blast_factory.query(sequence_text)

# Then, to parse the report, see Bio::Blast::Report

See also

References

Attributes

blastall[RW]

Full path for blastall. (default: ‘blastall’).

db[RW]

Database name (-d option for blastall)

filter[RW]

Filter option for blastall -F (T or F).

format[RW]

Output report format for blastall -m

0, pairwise; 1; 2; 3; 4; 5; 6; 7, XML Blast outpu;, 8, tabular; 9, tabular with comment lines; 10, ASN text; 11, ASN binery [intege].

matrix[RW]

Substitution matrix for blastall -M

options[R]

Options for blastall

output[R]

Returns a String containing blast execution output in as is the Bio::Blast#format.

parser[W]
program[RW]

Program name (-p option for blastall): blastp, blastn, blastx, tblastn or tblastx

server[R]

Server to submit the BLASTs to

Public Class Methods

local(program, db, options = '', blastall = nil) click to toggle source

This is a shortcut for Bio::Blast.new:

Bio::Blast.local(program, database, options)

is equivalent to

Bio::Blast.new(program, database, options, 'local')

Arguments:

  • program (required): ‘blastn’, ‘blastp’, ‘blastx’, ‘tblastn’ or ‘tblastx’

  • db (required): name of the local database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • blastall: full path to blastall program (e.g. “/opt/bin/blastall”; DEFAULT: “blastall”)

Returns

Bio::Blast factory object

   # File lib/bio/appl/blast.rb
78 def self.local(program, db, options = '', blastall = nil)
79   f = self.new(program, db, options, 'local')
80   if blastall then
81     f.blastall = blastall
82   end
83   f
84 end
new(program, db, opt = [], server = 'local') click to toggle source

Creates a Bio::Blast factory object.

To run any BLAST searches, a factory has to be created that describes a certain BLAST pipeline: the program to use, the database to search, any options and the server to use. E.g.

blast_factory = Bio::Blast.new('blastn','dbsts', '-e 0.0001 -r 4', 'genomenet')

Arguments:

  • program (required): ‘blastn’, ‘blastp’, ‘blastx’, ‘tblastn’ or ‘tblastx’

  • db (required): name of the (local or remote) database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • server: server to use (e.g. ‘genomenet’; DEFAULT = ‘local’)

Returns

Bio::Blast factory object

    # File lib/bio/appl/blast.rb
316 def initialize(program, db, opt = [], server = 'local')
317   @program  = program
318   @db       = db
319 
320   @blastall = 'blastall'
321   @matrix   = nil
322   @filter   = nil
323 
324   @output   = ''
325   @parser   = nil
326   @format   = nil
327 
328   @options = set_options(opt, program, db)
329   self.server = server
330 end
remote(program, db, option = '', server = 'genomenet') click to toggle source

Bio::Blast.remote does exactly the same as Bio::Blast.new, but sets the remote server ‘genomenet’ as its default.


Arguments:

  • program (required): ‘blastn’, ‘blastp’, ‘blastx’, ‘tblastn’ or ‘tblastx’

  • db (required): name of the remote database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • server: server to use (DEFAULT = ‘genomenet’)

Returns

Bio::Blast factory object

   # File lib/bio/appl/blast.rb
96 def self.remote(program, db, option = '', server = 'genomenet')
97   self.new(program, db, option, server)
98 end
reports(input, parser = nil) { |e| ... } click to toggle source

Bio::Blast.report parses given data, and returns an array of report (Bio::Blast::Report or Bio::Blast::Default::Report) objects, or yields each report object when a block is given.

Supported formats: NCBI default (-m 0), XML (-m 7), tabular (-m 8).


Arguments:

Returns

Undefiend when a block is given. Otherwise, an Array containing report (Bio::Blast::Report or Bio::Blast::Default::Report) objects.

    # File lib/bio/appl/blast.rb
113 def self.reports(input, parser = nil)
114   begin
115     istr = input.to_str
116   rescue NoMethodError
117     istr = nil
118   end
119   if istr then
120     input = StringIO.new(istr)
121   end
122   raise 'unsupported input data type' unless input.respond_to?(:gets)
123 
124   # if proper parser is given, emulates old behavior.
125   case parser
126   when :xmlparser, :rexml
127     ff = Bio::FlatFile.new(Bio::Blast::Report, input)
128     if block_given? then
129       ff.each do |e|
130         yield e
131       end
132       return []
133     else
134       return ff.to_a
135     end
136   when :tab
137     istr = input.read unless istr
138     rep = Report.new(istr, parser)
139     if block_given? then
140       yield rep
141       return []
142     else
143       return [ rep ]
144     end
145   end
146 
147   # preparation of the new format autodetection rule if needed
148   if !defined?(@@reports_format_autodetection_rule) or
149       !@@reports_format_autodetection_rule then
150     regrule = Bio::FlatFile::AutoDetect::RuleRegexp
151     blastxml = regrule[ 'Bio::Blast::Report',
152                         /\<\!DOCTYPE BlastOutput PUBLIC / ]
153     blast    = regrule[ 'Bio::Blast::Default::Report',
154                         /^BLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ]
155     tblast   = regrule[ 'Bio::Blast::Default::Report_TBlast',
156                         /^TBLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ]
157     tab      = regrule[ 'Bio::Blast::Report_tab',
158                         /^([^\t]*\t){11}[^\t]*$/ ]
159     auto = Bio::FlatFile::AutoDetect[ blastxml,
160                                       blast,
161                                       tblast,
162                                       tab
163                                     ]
164     # sets priorities
165     blastxml.is_prior_to blast
166     blast.is_prior_to tblast
167     tblast.is_prior_to tab
168     # rehash
169     auto.rehash
170     @@report_format_autodetection_rule = auto
171   end
172 
173   # Creates a FlatFile object with dummy class
174   ff = Bio::FlatFile.new(Object, input)
175   ff.dbclass = nil
176 
177   # file format autodetection
178   3.times do
179     break if ff.eof? or
180       ff.autodetect(31, @@report_format_autodetection_rule)
181   end
182   # If format detection failed, assumed to be tabular (-m 8)
183   ff.dbclass = Bio::Blast::Report_tab unless ff.dbclass
184 
185   if block_given? then
186     ff.each do |entry|
187       yield entry
188     end
189     ret = []
190   else
191     ret = ff.to_a
192   end
193   ret
194 end
reports_xml(input, parser = nil) { |r| ... } click to toggle source

Note that this is the old implementation of Bio::Blast.reports. The aim of this method is keeping compatibility for older BLAST XML documents which might not be parsed by the new Bio::Blast.reports nor Bio::FlatFile. (Though we are not sure whether such documents exist or not.)

Bio::Blast.reports_xml parses given data, and returns an array of Bio::Blast::Report objects, or yields each Bio::Blast::Report object when a block is given.

It can be used only for XML format. For default (-m 0) format, consider using Bio::FlatFile, or Bio::Blast.reports.


Arguments:

Returns

Undefiend when a block is given. Otherwise, an Array containing Bio::Blast::Report objects.

    # File lib/bio/appl/blast.rb
219 def self.reports_xml(input, parser = nil)
220   ary = []
221   input.each_line("</BlastOutput>\n") do |xml|
222     xml.sub!(/[^<]*(<?)/, '\1') # skip before <?xml> tag
223     next if xml.empty?          # skip trailing no hits
224     rep = Report.new(xml, parser)
225     if rep.reports then
226       if block_given?
227         rep.reports.each { |r| yield r }
228       else
229         ary.concat rep.reports
230       end
231     else
232       if block_given?
233         yield rep
234       else
235         ary.push rep
236       end
237     end
238   end
239   return ary
240 end

Public Instance Methods

option() click to toggle source

Returns options of blastall

    # File lib/bio/appl/blast.rb
373 def option
374   # backward compatibility
375   Bio::Command.make_command_line(options)
376 end
option=(str) click to toggle source

Set options for blastall

    # File lib/bio/appl/blast.rb
379 def option=(str)
380   # backward compatibility
381   self.options = Shellwords.shellwords(str)
382 end
options=(ary) click to toggle source

Sets options for blastall

    # File lib/bio/appl/blast.rb
254 def options=(ary)
255   @options = set_options(ary)
256 end
query(query) click to toggle source

This method submits a sequence to a BLAST factory, which performs the actual BLAST.

# example 1
seq = Bio::Sequence::NA.new('agggcattgccccggaagatcaagtcgtgctcctg')
report = blast_factory.query(seq)

# example 2
str <<END_OF_FASTA
>lcl|MySequence
MPPSAISKISNSTTPQVQSSSAPNLTMLEGKGISVEKSFRVYSEEENQNQHKAKDSLGF
KELEKDAIKNSKQDKKDHKNWLETLYDQAEQKWLQEPKKKLQDLIKNSGDNSRVILKDS
END_OF_FASTA
report = blast_factory.query(str)

Bug note: When multi-FASTA is given and the format is 7 (XML) or 8 (tab), it should return an array of Bio::Blast::Report objects, but it returns a single Bio::Blast::Report object. This is a known bug and should be fixed in the future.


Arguments:

  • query (required): single- or multiple-FASTA formatted sequence(s)

Returns

a Bio::Blast::Report (or Bio::Blast::Default::Report) object when single query is given. When multiple sequences are given as the query, it returns an array of Bio::Blast::Report (or Bio::Blast::Default::Report) objects. If it can not parse result, nil will be returnd.

    # File lib/bio/appl/blast.rb
357 def query(query)
358   case query
359   when Bio::Sequence
360     query = query.output(:fasta)
361   when Bio::Sequence::NA, Bio::Sequence::AA, Bio::Sequence::Generic
362     query = query.to_fasta('query', 70)
363   else
364     query = query.to_s
365   end
366 
367   @output = self.__send__("exec_#{@server}", query)
368   report = parse_result(@output)
369   return report
370 end
server=(str) click to toggle source

Sets server to submit the BLASTs to. The exec_xxxx method should be defined in Bio::Blast or Bio::Blast::Remote::Xxxx class.

    # File lib/bio/appl/blast.rb
264 def server=(str)
265   @server = str
266   begin
267     m = Bio::Blast::Remote.const_get(@server.capitalize)
268   rescue NameError
269     m = nil
270   end
271   if m and !(self.is_a?(m)) then
272     # lazy include Bio::Blast::Remote::XXX module
273     self.class.class_eval { include m }
274   end
275   return @server
276 end