class Bio::Blast

Description

The Bio::Blast class contains methods for running local or remote BLAST searches, as well as for parsing of the output of such BLASTs (i.e. the BLAST reports). For more information on similarity searches and the BLAST program, see www.ncbi.nlm.nih.gov/Education/BLASTinfo/similarity.html.

Usage

require 'bio'

# To run an actual BLAST analysis:
#   1. create a BLAST factory
remote_blast_factory = Bio::Blast.remote('blastp', 'swissprot',
                                         '-e 0.0001', 'genomenet')
#or:
local_blast_factory = Bio::Blast.local('blastn','/path/to/db')

#   2. run the actual BLAST by querying the factory
report = remote_blast_factory.query(sequence_text)

# Then, to parse the report, see Bio::Blast::Report

See also

References

Attributes

blastall[RW]

Full path for blastall. (default: 'blastall').

db[RW]

Database name (-d option for blastall)

filter[RW]

Filter option for blastall -F (T or F).

format[RW]

Output report format for blastall -m

0, pairwise; 1; 2; 3; 4; 5; 6; 7, XML Blast outpu;, 8, tabular; 9, tabular with comment lines; 10, ASN text; 11, ASN binery [intege].

matrix[RW]

Substitution matrix for blastall -M

options[R]

Options for blastall

output[R]

Returns a String containing blast execution output in as is the #format.

parser[W]
program[RW]

Program name (-p option for blastall): blastp, blastn, blastx, tblastn or tblastx

server[R]

Server to submit the BLASTs to

Public Class Methods

local(program, db, options = '', blastall = nil) click to toggle source

This is a shortcut for ::new:

Bio::Blast.local(program, database, options)

is equivalent to

Bio::Blast.new(program, database, options, 'local')

Arguments:

  • program (required): 'blastn', 'blastp', 'blastx', 'tblastn' or 'tblastx'

  • db (required): name of the local database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • blastall: full path to blastall program (e.g. “/opt/bin/blastall”; DEFAULT: “blastall”)

Returns

Bio::Blast factory object

# File lib/bio/appl/blast.rb, line 78
def self.local(program, db, options = '', blastall = nil)
  f = self.new(program, db, options, 'local')
  if blastall then
    f.blastall = blastall
  end
  f
end
new(program, db, opt = [], server = 'local') click to toggle source

Creates a Bio::Blast factory object.

To run any BLAST searches, a factory has to be created that describes a certain BLAST pipeline: the program to use, the database to search, any options and the server to use. E.g.

blast_factory = Bio::Blast.new('blastn','dbsts', '-e 0.0001 -r 4', 'genomenet')

Arguments:

  • program (required): 'blastn', 'blastp', 'blastx', 'tblastn' or 'tblastx'

  • db (required): name of the (local or remote) database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • server: server to use (e.g. 'genomenet'; DEFAULT = 'local')

Returns

Bio::Blast factory object

# File lib/bio/appl/blast.rb, line 316
def initialize(program, db, opt = [], server = 'local')
  @program  = program
  @db       = db

  @blastall = 'blastall'
  @matrix   = nil
  @filter   = nil

  @output   = ''
  @parser   = nil
  @format   = nil

  @options = set_options(opt, program, db)
  self.server = server
end
remote(program, db, option = '', server = 'genomenet') click to toggle source

::remote does exactly the same as ::new, but sets the remote server 'genomenet' as its default.


Arguments:

  • program (required): 'blastn', 'blastp', 'blastx', 'tblastn' or 'tblastx'

  • db (required): name of the remote database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • server: server to use (DEFAULT = 'genomenet')

Returns

Bio::Blast factory object

# File lib/bio/appl/blast.rb, line 96
def self.remote(program, db, option = '', server = 'genomenet')
  self.new(program, db, option, server)
end
reports(input, parser = nil) { |e| ... } click to toggle source

Bio::Blast.report parses given data, and returns an array of report (Bio::Blast::Report or Bio::Blast::Default::Report) objects, or yields each report object when a block is given.

Supported formats: NCBI default (-m 0), XML (-m 7), tabular (-m 8).


Arguments:

Returns

Undefiend when a block is given. Otherwise, an Array containing report (Bio::Blast::Report or Bio::Blast::Default::Report) objects.

# File lib/bio/appl/blast.rb, line 113
def self.reports(input, parser = nil)
  begin
    istr = input.to_str
  rescue NoMethodError
    istr = nil
  end
  if istr then
    input = StringIO.new(istr)
  end
  raise 'unsupported input data type' unless input.respond_to?(:gets)

  # if proper parser is given, emulates old behavior.
  case parser
  when :xmlparser, :rexml
    ff = Bio::FlatFile.new(Bio::Blast::Report, input)
    if block_given? then
      ff.each do |e|
        yield e
      end
      return []
    else
      return ff.to_a
    end
  when :tab
    istr = input.read unless istr
    rep = Report.new(istr, parser)
    if block_given? then
      yield rep
      return []
    else
      return [ rep ]
    end
  end

  # preparation of the new format autodetection rule if needed
  if !defined?(@@reports_format_autodetection_rule) or
      !@@reports_format_autodetection_rule then
    regrule = Bio::FlatFile::AutoDetect::RuleRegexp
    blastxml = regrule[ 'Bio::Blast::Report',
                        /\<\!DOCTYPE BlastOutput PUBLIC / ]
    blast    = regrule[ 'Bio::Blast::Default::Report',
                        /^BLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ]
    tblast   = regrule[ 'Bio::Blast::Default::Report_TBlast',
                        /^TBLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ]
    tab      = regrule[ 'Bio::Blast::Report_tab',
                        /^([^\t]*\t){11}[^\t]*$/ ]
    auto = Bio::FlatFile::AutoDetect[ blastxml,
                                      blast,
                                      tblast,
                                      tab
                                    ]
    # sets priorities
    blastxml.is_prior_to blast
    blast.is_prior_to tblast
    tblast.is_prior_to tab
    # rehash
    auto.rehash
    @@report_format_autodetection_rule = auto
  end

  # Creates a FlatFile object with dummy class
  ff = Bio::FlatFile.new(Object, input)
  ff.dbclass = nil

  # file format autodetection
  3.times do
    break if ff.eof? or
      ff.autodetect(31, @@report_format_autodetection_rule)
  end
  # If format detection failed, assumed to be tabular (-m 8)
  ff.dbclass = Bio::Blast::Report_tab unless ff.dbclass

  if block_given? then
    ff.each do |entry|
      yield entry
    end
    ret = []
  else
    ret = ff.to_a
  end
  ret
end
reports_xml(input, parser = nil) { |r| ... } click to toggle source

Note that this is the old implementation of ::reports. The aim of this method is keeping compatibility for older BLAST XML documents which might not be parsed by the new ::reports nor Bio::FlatFile. (Though we are not sure whether such documents exist or not.)

::reports_xml parses given data, and returns an array of Bio::Blast::Report objects, or yields each Bio::Blast::Report object when a block is given.

It can be used only for XML format. For default (-m 0) format, consider using Bio::FlatFile, or ::reports.


Arguments:

Returns

Undefiend when a block is given. Otherwise, an Array containing Bio::Blast::Report objects.

# File lib/bio/appl/blast.rb, line 219
def self.reports_xml(input, parser = nil)
  ary = []
  input.each_line("</BlastOutput>\n") do |xml|
    xml.sub!(/[^<]*(<?)/, '\1') # skip before <?xml> tag
    next if xml.empty?          # skip trailing no hits
    rep = Report.new(xml, parser)
    if rep.reports then
      if block_given?
        rep.reports.each { |r| yield r }
      else
        ary.concat rep.reports
      end
    else
      if block_given?
        yield rep
      else
        ary.push rep
      end
    end
  end
  return ary
end

Public Instance Methods

option() click to toggle source

Returns options of blastall

# File lib/bio/appl/blast.rb, line 373
def option
  # backward compatibility
  Bio::Command.make_command_line(options)
end
option=(str) click to toggle source

Set options for blastall

# File lib/bio/appl/blast.rb, line 379
def option=(str)
  # backward compatibility
  self.options = Shellwords.shellwords(str)
end
options=(ary) click to toggle source

Sets options for blastall

# File lib/bio/appl/blast.rb, line 254
def options=(ary)
  @options = set_options(ary)
end
query(query) click to toggle source

This method submits a sequence to a BLAST factory, which performs the actual BLAST.

# example 1
seq = Bio::Sequence::NA.new('agggcattgccccggaagatcaagtcgtgctcctg')
report = blast_factory.query(seq)

# example 2
str <<END_OF_FASTA
>lcl|MySequence
MPPSAISKISNSTTPQVQSSSAPNLTMLEGKGISVEKSFRVYSEEENQNQHKAKDSLGF
KELEKDAIKNSKQDKKDHKNWLETLYDQAEQKWLQEPKKKLQDLIKNSGDNSRVILKDS
END_OF_FASTA
report = blast_factory.query(str)

Bug note: When multi-FASTA is given and the format is 7 (XML) or 8 (tab), it should return an array of Bio::Blast::Report objects, but it returns a single Bio::Blast::Report object. This is a known bug and should be fixed in the future.


Arguments:

  • query (required): single- or multiple-FASTA formatted sequence(s)

Returns

a Bio::Blast::Report (or Bio::Blast::Default::Report) object when single query is given. When multiple sequences are given as the query, it returns an array of Bio::Blast::Report (or Bio::Blast::Default::Report) objects. If it can not parse result, nil will be returnd.

# File lib/bio/appl/blast.rb, line 357
def query(query)
  case query
  when Bio::Sequence
    query = query.output(:fasta)
  when Bio::Sequence::NA, Bio::Sequence::AA, Bio::Sequence::Generic
    query = query.to_fasta('query', 70)
  else
    query = query.to_s
  end

  @output = self.__send__("exec_#{@server}", query)
  report = parse_result(@output)
  return report
end
server=(str) click to toggle source

Sets server to submit the BLASTs to. The exec_xxxx method should be defined in Bio::Blast or Bio::Blast::Remote::Xxxx class.

# File lib/bio/appl/blast.rb, line 264
def server=(str)
  @server = str
  begin
    m = Bio::Blast::Remote.const_get(@server.capitalize)
  rescue NameError
    m = nil
  end
  if m and !(self.is_a?(m)) then
    # lazy include Bio::Blast::Remote::XXX module
    self.class.class_eval { include m }
  end
  return @server
end