class Bio::NBRF

Sequence data class for NBRF/PIR flatfile format.

Constants

DELIMITER

Delimiter of each entry. Bio::FlatFile uses it.

DELIMITER_OVERRUN

(Integer) excess read size included in DELIMITER.

Attributes

accession[RW]

Returns ID described in the entry.

data[RW]

sequence data of the entry (???)

definition[RW]

Returns the description line of the NBRF/PIR formatted data.

entry_id[RW]

Returns ID described in the entry.

entry_overrun[R]

piece of next entry. Bio::FlatFile uses it.

seq_type[RW]

Returns sequence type described in the entry.

P1 (protein), F1 (protein fragment)
DL (DNA linear), DC (DNA circular)
RL (DNA linear), RC (DNA circular)
N3 (tRNA), N1 (other functional RNA)

Public Class Methods

new(str) click to toggle source

Creates a new NBRF object. It stores the comment and sequence information from one entry of the NBRF/PIR format string. If the argument contains more than one entry, only the first entry is used.

# File lib/bio/db/nbrf.rb, line 45
def initialize(str)
  str = str.sub(/\A[\r\n]+/, '') # remove first void lines
  line1, line2, rest = str.split(/^/, 3)

  rest = rest.to_s
  rest.sub!(/^>.*/m, '') # remove trailing entries for sure
  @entry_overrun = $&
  rest.sub!(/\*\s*\z/, '') # remove last '*' and "\n"
  @data = rest

  @definition = line2.to_s.chomp
  if /^>?([A-Za-z0-9]{2})\;(.*)/ =~ line1.to_s then
    @seq_type = $1
    @entry_id = $2
  end
end
to_nbrf(hash) click to toggle source

Creates a NBRF/PIR formatted text. Parameters can be omitted.

# File lib/bio/db/nbrf.rb, line 167
def self.to_nbrf(hash)
  seq_type = hash[:seq_type]
  seq = hash[:seq]
  unless seq_type
    if seq.is_a?(Bio::Sequence::AA) then
      seq_type = 'P1'
    elsif seq.is_a?(Bio::Sequence::NA) then
      seq_type = /u/i =~ seq ? 'RL' : 'DL'
    else
      seq_type = 'XX'
    end
  end
  width = hash.has_key?(:width) ? hash[:width] : 70
  if width then
    seq = seq.to_s + "*"
    seq.gsub!(Regexp.new(".{1,#{width}}"), "\\0\n")
  else
    seq = seq.to_s + "*\n"
  end
  ">#{seq_type};#{hash[:entry_id]}\n#{hash[:definition]}\n#{seq}"
end

Public Instance Methods

aalen() click to toggle source

Returens the length of protein (amino acids) sequence. If you call aaseq for nucleic acids sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.

# File lib/bio/db/nbrf.rb, line 157
def aalen
  aaseq.length
end
aaseq() click to toggle source

Returens the protein (amino acids) sequence. If you call aaseq for nucleic acids sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.

# File lib/bio/db/nbrf.rb, line 143
def aaseq
  if seq.is_a?(Bio::Sequence::NA) then
    raise 'not nucleic but protein sequence'
  elsif seq.is_a?(Bio::Sequence::AA) then
    seq
  else
    Bio::Sequence::AA.new(seq)
  end
end
entry() click to toggle source

Returns the stored one entry as a NBRF/PIR format. (same as #to_s)

# File lib/bio/db/nbrf.rb, line 84
def entry
  @entry = ">#{@seq_type or 'XX'};#{@entry_id}\n#{definition}\n#{@data}*\n"
end
Also aliased as: to_s
length() click to toggle source

Returns sequence length.

# File lib/bio/db/nbrf.rb, line 115
def length
  seq.length
end
nalen() click to toggle source

Returens the length of sequence. If you call nalen for protein sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.

# File lib/bio/db/nbrf.rb, line 135
def nalen
  naseq.length
end
naseq() click to toggle source

Returens the nucleic acid sequence. If you call naseq for protein sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.

# File lib/bio/db/nbrf.rb, line 122
def naseq
  if seq.is_a?(Bio::Sequence::AA) then
    raise 'not nucleic but protein sequence'
  elsif seq.is_a?(Bio::Sequence::NA) then
    seq
  else
    Bio::Sequence::NA.new(seq)
  end
end
seq() click to toggle source

Returns sequence data. Returns Bio::Sequence::NA, Bio::Sequence::AA or Bio::Sequence, according to the sequence type.

# File lib/bio/db/nbrf.rb, line 107
def seq
  unless defined?(@seq)
    @seq = seq_class.new(@data.tr(" \t\r\n0-9", '')) # lazy clean up
  end
  @seq
end
seq_class() click to toggle source

Returns Bio::Sequence::AA, Bio::Sequence::NA, or Bio::Sequence, depending on sequence type.

# File lib/bio/db/nbrf.rb, line 91
def seq_class
  case @seq_type
  when /[PF]1/
    # protein
    Sequence::AA
  when /[DR][LC]/, /N[13]/
    # nucleic
    Sequence::NA
  else
    Sequence
  end
end
to_s()
Alias for: entry