Parent

Class/Module Index [+]

Quicksearch

Bio::FastaFormat

Treats a FASTA formatted entry, such as:

>id and/or some comments                    <== comment line
ATGCATGCATGCATGCATGCATGCATGCATGCATGC        <== sequence lines
ATGCATGCATGCATGCATGCATGCATGCATGCATGC
ATGCATGCATGC

The precedent ‘>’ can be omitted and the trailing ‘>’ will be removed automatically.

Examples

f_str = <<END_OF_STRING
>sce:YBR160W  CDC28, SRM5; cyclin-dependent protein kinase catalytic subunit [EC:2.7.1.-] [SP:CC28_YEAST]
MSGELANYKRLEKVGEGTYGVVYKALDLRPGQGQRVVALKKIRLESEDEG
VPSTAIREISLLKELKDDNIVRLYDIVHSDAHKLYLVFEFLDLDLKRYME
GIPKDQPLGADIVKKFMMQLCKGIAYCHSHRILHRDLKPQNLLINKDGNL
KLGDFGLARAFGVPLRAYTHEIVTLWYRAPEVLLGGKQYSTGVDTWSIGC
IFAEMCNRKPIFSGDSEIDQIFKIFRVLGTPNEAIWPDIVYLPDFKPSFP
QWRRKDLSQVVPSLDPRGIDLLDKLLAYDPINRISARRAAIHPYFQES
>sce:YBR274W  CHK1; probable serine/threonine-protein kinase [EC:2.7.1.-] [SP:KB9S_YEAST]
MSLSQVSPLPHIKDVVLGDTVGQGAFACVKNAHLQMDPSIILAVKFIHVP
TCKKMGLSDKDITKEVVLQSKCSKHPNVLRLIDCNVSKEYMWIILEMADG
GDLFDKIEPDVGVDSDVAQFYFQQLVSAINYLHVECGVAHRDIKPENILL
DKNGNLKLADFGLASQFRRKDGTLRVSMDQRGSPPYMAPEVLYSEEGYYA
DRTDIWSIGILLFVLLTGQTPWELPSLENEDFVFFIENDGNLNWGPWSKI
EFTHLNLLRKILQPDPNKRVTLKALKLHPWVLRRASFSGDDGLCNDPELL
AKKLFSHLKVSLSNENYLKFTQDTNSNNRYISTQPIGNELAELEHDSMHF
QTVSNTQRAFTSYDSNTNYNSGTGMTQEAKWTQFISYDIAALQFHSDEND
CNELVKRHLQFNPNKLTKFYTLQPMDVLLPILEKALNLSQIRVKPDLFAN
FERLCELLGYDNVFPLIINIKTKSNGGYQLCGSISIIKIEEELKSVGFER
KTGDPLEWRRLFKKISTICRDIILIPN
END_OF_STRING

f = Bio::FastaFormat.new(f_str)
puts "### FastaFormat"
puts "# entry"
puts f.entry
puts "# entry_id"
p f.entry_id
puts "# definition"
p f.definition
puts "# data"
p f.data
puts "# seq"
p f.seq
puts "# seq.type"
p f.seq.type
puts "# length"
p f.length
puts "# aaseq"
p f.aaseq
puts "# aaseq.type"
p f.aaseq.type
puts "# aaseq.composition"
p f.aaseq.composition
puts "# aalen"
p f.aalen

References

Constants

DELIMITER

Entry delimiter in flatfile text.

DELIMITER_OVERRUN

(Integer) excess read size included in DELIMITER.

Attributes

data[RW]

The seuqnce lines in text.

definition[RW]

The comment line of the FASTA formatted data.

entry_overrun[R]

Public Class Methods

new(str) click to toggle source

Stores the comment and sequence information from one entry of the FASTA format string. If the argument contains more than one entry, only the first entry is used.

# File lib/bio/db/fasta.rb, line 119
def initialize(str)
  @definition = str[/.*/].sub(/^>/, '').strip       # 1st line
  @data = str.sub(/.*/, '')                         # rests
  @data.sub!(/^>.*/, '')   # remove trailing entries for sure
  @entry_overrun = $&
end

Public Instance Methods

aalen() click to toggle source

Returens the length of Bio::Sequence::AA.

# File lib/bio/db/fasta.rb, line 209
def aalen
  self.aaseq.length
end
aaseq() click to toggle source

Returens the Bio::Sequence::AA.

# File lib/bio/db/fasta.rb, line 204
def aaseq
  Sequence::AA.new(seq)
end
acc_version() click to toggle source

Returns accession number with version.

# File lib/bio/db/fasta.rb, line 265
def acc_version
  identifiers.acc_version
end
accession() click to toggle source

Returns an accession number.

# File lib/bio/db/fasta.rb, line 253
def accession
  identifiers.accession
end
accessions() click to toggle source

Parsing FASTA Defline (using identifiers method), and shows accession numbers. It returns an array of strings.

# File lib/bio/db/fasta.rb, line 260
def accessions
  identifiers.accessions
end
blast(factory) click to toggle source
Alias for: query
comment() click to toggle source

Returns comments.

# File lib/bio/db/fasta.rb, line 183
def comment
  seq
  @comment
end
entry() click to toggle source

Returns the stored one entry as a FASTA format. (same as to_s)

# File lib/bio/db/fasta.rb, line 127
def entry
  @entry = ">#{@definition}\n#{@data.strip}\n"
end
Also aliased as: to_s
entry_id() click to toggle source

Parsing FASTA Defline (using identifiers method), and shows a possibly unique identifier. It returns a string.

# File lib/bio/db/fasta.rb, line 239
def entry_id
  identifiers.entry_id
end
fasta(factory) click to toggle source
Alias for: query
gi() click to toggle source

Parsing FASTA Defline (using identifiers method), and shows GI/locus/accession/accession with version number. If a entry has more than two of such IDs, only the first ID are shown. It returns a string or nil.

# File lib/bio/db/fasta.rb, line 248
def gi
  identifiers.gi
end
identifiers() click to toggle source

Parsing FASTA Defline, and extract IDs. IDs are NSIDs (NCBI standard FASTA sequence identifiers) or “:”-separated IDs. It returns a Bio::FastaDefline instance.

# File lib/bio/db/fasta.rb, line 229
def identifiers
  unless defined?(@ids) then
    @ids = FastaDefline.new(@definition)
  end
  @ids
end
length() click to toggle source

Returns sequence length.

# File lib/bio/db/fasta.rb, line 189
def length
  seq.length
end
locus() click to toggle source

Returns locus.

# File lib/bio/db/fasta.rb, line 270
def locus
  identifiers.locus
end
nalen() click to toggle source

Returens the length of Bio::Sequence::NA.

# File lib/bio/db/fasta.rb, line 199
def nalen
  self.naseq.length
end
naseq() click to toggle source

Returens the Bio::Sequence::NA.

# File lib/bio/db/fasta.rb, line 194
def naseq
  Sequence::NA.new(seq)
end
query(factory) click to toggle source

Executes FASTA/BLAST search by using a Bio::Fasta or a Bio::Blast factory object.

#!/usr/bin/env ruby
require 'bio'

factory = Bio::Fasta.local('fasta34', 'db/swissprot.f')
flatfile = Bio::FlatFile.open(Bio::FastaFormat, 'queries.f')
flatfile.each do |entry|
  p entry.definition
  result = entry.fasta(factory)
  result.each do |hit|
    print "#{hit.query_id} : #{hit.evalue}\t#{hit.target_id} at "
    p hit.lap_at
  end
end
# File lib/bio/db/fasta.rb, line 150
def query(factory)
  factory.query(entry)
end
Also aliased as: fasta, blast
seq() click to toggle source

Returns a joined sequence line as a String.

# File lib/bio/db/fasta.rb, line 157
def seq
  unless defined?(@seq)
    unless /\A\s*^\#/ =~ @data then
      @seq = Sequence::Generic.new(@data.tr(" \t\r\n0-9", '')) # lazy clean up
    else
      a = @data.split(/(^\#.*$)/)
      i = 0
      cmnt = {}
      s = []
      a.each do |x|
        if /^# ?(.*)$/ =~ x then
          cmnt[i] ? cmnt[i] << "\n" << $1 : cmnt[i] = $1
        else
          x.tr!(" \t\r\n0-9", '') # lazy clean up
          i += x.length
          s << x
        end
      end
      @comment = cmnt
      @seq = Bio::Sequence::Generic.new(s.join(''))
    end
  end
  @seq
end
to_biosequence() click to toggle source

Returns sequence as a Bio::Sequence object.

Note: If you modify the returned Bio::Sequence object, the sequence or definition in this FastaFormat object might also be changed (but not always be changed) because of efficiency.

# File lib/bio/db/fasta.rb, line 220
def to_biosequence
  Bio::Sequence.adapter(self, Bio::Sequence::Adapter::FastaFormat)
end
Also aliased as: to_seq
to_s() click to toggle source
Alias for: entry
to_seq() click to toggle source
Alias for: to_biosequence

[Validate]

Generated with the Darkfish Rdoc Generator 2.