class Bio::NBRF
Sequence
data class for NBRF/PIR flatfile format.
Constants
- DELIMITER
Delimiter of each entry.
Bio::FlatFile
uses it.- DELIMITER_OVERRUN
(Integer) excess read size included in
DELIMITER
.
Attributes
Returns ID described in the entry.
sequence data of the entry (???)
Returns the description line of the NBRF/PIR formatted data.
Returns ID described in the entry.
piece of next entry. Bio::FlatFile
uses it.
Returns sequence type described in the entry.
P1 (protein), F1 (protein fragment) DL (DNA linear), DC (DNA circular) RL (DNA linear), RC (DNA circular) N3 (tRNA), N1 (other functional RNA)
Public Class Methods
Creates a new NBRF
object. It stores the comment and sequence information from one entry of the NBRF/PIR format string. If the argument contains more than one entry, only the first entry is used.
# File lib/bio/db/nbrf.rb 45 def initialize(str) 46 str = str.sub(/\A[\r\n]+/, '') # remove first void lines 47 line1, line2, rest = str.split(/^/, 3) 48 49 rest = rest.to_s 50 rest.sub!(/^>.*/m, '') # remove trailing entries for sure 51 @entry_overrun = $& 52 rest.sub!(/\*\s*\z/, '') # remove last '*' and "\n" 53 @data = rest 54 55 @definition = line2.to_s.chomp 56 if /^>?([A-Za-z0-9]{2})\;(.*)/ =~ line1.to_s then 57 @seq_type = $1 58 @entry_id = $2 59 end 60 end
Creates a NBRF/PIR formatted text. Parameters can be omitted.
# File lib/bio/db/nbrf.rb 167 def self.to_nbrf(hash) 168 seq_type = hash[:seq_type] 169 seq = hash[:seq] 170 unless seq_type 171 if seq.is_a?(Bio::Sequence::AA) then 172 seq_type = 'P1' 173 elsif seq.is_a?(Bio::Sequence::NA) then 174 seq_type = /u/i =~ seq ? 'RL' : 'DL' 175 else 176 seq_type = 'XX' 177 end 178 end 179 width = hash.has_key?(:width) ? hash[:width] : 70 180 if width then 181 seq = seq.to_s + "*" 182 seq.gsub!(Regexp.new(".{1,#{width}}"), "\\0\n") 183 else 184 seq = seq.to_s + "*\n" 185 end 186 ">#{seq_type};#{hash[:entry_id]}\n#{hash[:definition]}\n#{seq}" 187 end
Public Instance Methods
Returens the length of protein (amino acids) sequence. If you call aaseq for nucleic acids sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.
# File lib/bio/db/nbrf.rb 157 def aalen 158 aaseq.length 159 end
Returens the protein (amino acids) sequence. If you call aaseq for nucleic acids sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.
# File lib/bio/db/nbrf.rb 143 def aaseq 144 if seq.is_a?(Bio::Sequence::NA) then 145 raise 'not nucleic but protein sequence' 146 elsif seq.is_a?(Bio::Sequence::AA) then 147 seq 148 else 149 Bio::Sequence::AA.new(seq) 150 end 151 end
Returns the stored one entry as a NBRF/PIR format. (same as to_s
)
# File lib/bio/db/nbrf.rb 84 def entry 85 @entry = ">#{@seq_type or 'XX'};#{@entry_id}\n#{definition}\n#{@data}*\n" 86 end
Returns sequence length.
# File lib/bio/db/nbrf.rb 115 def length 116 seq.length 117 end
Returens the length of sequence. If you call nalen for protein sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.
# File lib/bio/db/nbrf.rb 135 def nalen 136 naseq.length 137 end
Returens the nucleic acid sequence. If you call naseq for protein sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.
# File lib/bio/db/nbrf.rb 122 def naseq 123 if seq.is_a?(Bio::Sequence::AA) then 124 raise 'not nucleic but protein sequence' 125 elsif seq.is_a?(Bio::Sequence::NA) then 126 seq 127 else 128 Bio::Sequence::NA.new(seq) 129 end 130 end
Returns sequence data. Returns Bio::Sequence::NA
, Bio::Sequence::AA
or Bio::Sequence
, according to the sequence type.
# File lib/bio/db/nbrf.rb 107 def seq 108 unless defined?(@seq) 109 @seq = seq_class.new(@data.tr(" \t\r\n0-9", '')) # lazy clean up 110 end 111 @seq 112 end
Returns Bio::Sequence::AA
, Bio::Sequence::NA
, or Bio::Sequence
, depending on sequence type.
# File lib/bio/db/nbrf.rb 91 def seq_class 92 case @seq_type 93 when /[PF]1/ 94 # protein 95 Sequence::AA 96 when /[DR][LC]/, /N[13]/ 97 # nucleic 98 Sequence::NA 99 else 100 Sequence 101 end 102 end