class Bio::GCG::Seq

Bio::GCG::Seq

This is GCG sequence file format (.seq or .pep) parser class.

References

www.accelrys.com/products/gcg_wisconsin_package .

www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/SequenceFormats.html

docs.bioperl.org/releases/bioperl-1.2.3/Bio/SeqIO/gcg.html

Constants

DELIMITER

delimiter used by Bio::FlatFile

Attributes

checksum[R]

“Check:” field, which indicates checksum of current sequence.

date[R]

Date field of this entry.

definition[R]

Description field.

entry_id[R]

ID field.

heading[R]

heading (‘!!NA_SEQUENCE 1.0’ or whatever like this)

length[R]

“Length:” field. Note that sometimes this might differ from real sequence length.

seq_type[R]

“Type:” field, which indicates sequence type. “N” means nucleic acid sequence, “P” means protein sequence.

Public Class Methods

calc_checksum(str) click to toggle source

Calculates checksum from given string.

    # File lib/bio/appl/gcg/seq.rb
141 def self.calc_checksum(str)
142   # Reference: Bio::SeqIO::gcg of BioPerl-1.2.3
143   idx = 0
144   sum = 0
145   str.upcase.tr('^A-Z.~', '').each_byte do |c|
146     idx += 1
147     sum += idx * c
148     idx = 0 if idx >= 57
149   end
150   (sum % 10000)
151 end
new(str) click to toggle source

Creates new instance of this class. str must be a GCG seq formatted string.

   # File lib/bio/appl/gcg/seq.rb
38 def initialize(str)
39   @heading = str[/.*/] # '!!NA_SEQUENCE 1.0' or like this
40   str = str.sub(/.*/, '')
41   str.sub!(/.*\.\.$/m, '')
42   @definition = $&.to_s.sub(/^.*\.\.$/, '').to_s
43   desc = $&.to_s
44   if m = /(.+)\s+Length\:\s+(\d+)\s+(.+)\s+Type\:\s+(\w)\s+Check\:\s+(\d+)/.match(desc) then
45     @entry_id = m[1].to_s.strip
46     @length   = (m[2] ? m[2].to_i : nil)
47     @date     = m[3].to_s.strip
48     @seq_type = m[4]
49     @checksum = (m[5] ? m[5].to_i : nil)
50   end
51   @data = str
52   @seq = nil
53   @definition.strip!
54 end
to_gcg(hash) click to toggle source

Creates a new GCG sequence format text. Parameters can be omitted.

Examples:

Bio::GCG::Seq.to_gcg(:definition=>'H.sapiens DNA',
                     :seq_type=>'N', :entry_id=>'gi-1234567',
                     :seq=>seq, :date=>date)
    # File lib/bio/appl/gcg/seq.rb
161 def self.to_gcg(hash)
162   seq = hash[:seq]
163   if seq.is_a?(Bio::Sequence::NA) then
164     seq_type = 'N'
165   elsif seq.is_a?(Bio::Sequence::AA) then
166     seq_type = 'P'
167   else
168     seq_type = (hash[:seq_type] or 'P')
169   end
170   if seq_type == 'N' then
171     head = '!!NA_SEQUENCE 1.0'
172   else
173     head = '!!AA_SEQUENCE 1.0'
174   end
175   date = (hash[:date] or Time.now.strftime('%B %d, %Y %H:%M'))
176   entry_id = hash[:entry_id].to_s.strip
177   len = seq.length
178   checksum = self.calc_checksum(seq)
179   definition = hash[:definition].to_s.strip
180   seq = seq.upcase.gsub(/.{1,50}/, "\\0\n")
181   seq.gsub!(/.{10}/, "\\0 ")
182   w = len.to_s.size + 1
183   i = 1
184   seq.gsub!(/^/) { |x| s = sprintf("\n%*d ", w, i); i += 50; s }
185 
186   [ head, "\n", definition, "\n\n",
187     "#{entry_id}  Length: #{len}  #{date}  " \
188     "Type: #{seq_type}  Check: #{checksum}  ..\n",
189     seq, "\n" ].join('')
190 end

Public Instance Methods

aaseq() click to toggle source

If you know the sequence is AA, use this method. Returns a Bio::Sequence::AA object.

If you call naseq for protein sequence, or aaseq for nucleic sequence, RuntimeError will be raised.

    # File lib/bio/appl/gcg/seq.rb
108 def aaseq
109   if seq.is_a?(Bio::Sequence::AA) then
110     @seq
111   else
112     raise 'seq_type != \'P\''
113   end
114 end
naseq() click to toggle source

If you know the sequence is NA, use this method. Returens a Bio::Sequence::NA object.

If you call naseq for protein sequence, or aaseq for nucleic sequence, RuntimeError will be raised.

    # File lib/bio/appl/gcg/seq.rb
121 def naseq
122   if seq.is_a?(Bio::Sequence::NA) then
123     @seq
124   else
125     raise 'seq_type != \'N\''
126   end
127 end
seq() click to toggle source

Sequence data. The class of the sequence is Bio::Sequence::NA, Bio::Sequence::AA or Bio::Sequence::Generic, according to the sequence type.

    # File lib/bio/appl/gcg/seq.rb
 88 def seq
 89   unless @seq then
 90     case @seq_type
 91     when 'N', 'n'
 92       k = Bio::Sequence::NA
 93     when 'P', 'p'
 94       k = Bio::Sequence::AA
 95     else
 96       k = Bio::Sequence
 97     end
 98     @seq = k.new(@data.tr('^-a-zA-Z.~', ''))
 99   end
100   @seq
101 end
validate_checksum() click to toggle source

Validates checksum. If validation succeeds, returns true. Otherwise, returns false.

    # File lib/bio/appl/gcg/seq.rb
132 def validate_checksum
133   checksum == self.class.calc_checksum(seq)
134 end