module Bio::Sequence::Common

DESCRIPTION

Bio::Sequence::Common is a Mixin implementing methods common to Bio::Sequence::AA and Bio::Sequence::NA. All of these methods are available to either Amino Acid or Nucleic Acid sequences, and by encapsulation are also available to Bio::Sequence objects.

USAGE

# Create a sequence
dna = Bio::Sequence.auto('atgcatgcatgc')

# Splice out a subsequence using a Genbank-style location string
puts dna.splice('complement(1..4)')

# What is the base composition?
puts dna.composition

# Create a random sequence with the composition of a current sequence
puts dna.randomize

Public Instance Methods

+(*arg) click to toggle source

Create a new sequence by adding to an existing sequence. The existing sequence is not modified.

s = Bio::Sequence::NA.new('atgc')
s2 = s + 'atgc'
puts s2                                 #=> "atgcatgc"
puts s                                  #=> "atgc"

The new sequence is of the same class as the existing sequence if the new data was added to an existing sequence,

puts s2.class == s.class                #=> true

but if an existing sequence is added to a String, the result is a String

s3 = 'atgc' + s
puts s3.class                           #=> String

Returns

new Bio::Sequence::NA/AA or String object

Calls superclass method
    # File lib/bio/sequence/common.rb
121 def +(*arg)
122   self.class.new(super(*arg))
123 end
<<(*arg) click to toggle source
    # File lib/bio/sequence/common.rb
 98 def <<(*arg)
 99   concat(*arg)
100 end
composition() click to toggle source

Returns a hash of the occurrence counts for each residue or base.

s = Bio::Sequence::NA.new('atgc')
puts s.composition              #=> {"a"=>1, "c"=>1, "g"=>1, "t"=>1}

Returns

Hash object

    # File lib/bio/sequence/common.rb
215 def composition
216   count = Hash.new(0)
217   self.scan(/./) do |x|
218     count[x] += 1
219   end
220   return count
221 end
concat(*arg) click to toggle source

Add new data to the end of the current sequence. The original sequence is modified.

s = Bio::Sequence::NA.new('atgc')
s << 'atgc'
puts s                                  #=> "atgcatgc"
s << s
puts s                                  #=> "atgcatgcatgcatgc"

Returns

current Bio::Sequence::NA/AA object (modified)

Calls superclass method
   # File lib/bio/sequence/common.rb
94 def concat(*arg)
95   super(self.class.new(*arg))
96 end
normalize!() click to toggle source

Normalize the current sequence, removing all whitespace and transforming all positions to uppercase if the sequence is AA or transforming all positions to lowercase if the sequence is NA. The original sequence is modified.

s = Bio::Sequence::NA.new('atgc')
s.normalize!

Returns

current Bio::Sequence::NA/AA object (modified)

   # File lib/bio/sequence/common.rb
78 def normalize!
79   initialize(self)
80   self
81 end
Also aliased as: seq!
randomize(hash = nil) { |seq| ... } click to toggle source

Returns a randomized sequence. The default is to retain the same base/residue composition as the original. If a hash of base/residue counts is given, the new sequence will be based on that hash composition. If a block is given, each new randomly selected position will be passed into the block. In all cases, the original sequence is not modified.

s = Bio::Sequence::NA.new('atgc')
puts s.randomize                        #=> "tcag"  (for example)

new_composition = {'a' => 2, 't' => 2}
puts s.randomize(new_composition)       #=> "ttaa"  (for example)

count = 0
s.randomize { |x| count += 1 }
puts count                              #=> 4

Arguments:

  • (optional) hash: Hash object

Returns

new Bio::Sequence::NA/AA object

    # File lib/bio/sequence/common.rb
243 def randomize(hash = nil)
244   if hash
245     tmp = ''
246     hash.each {|k, v|
247       tmp += k * v.to_i
248     }
249   else
250     tmp = self
251   end
252   seq = self.class.new(tmp)
253   # Reference: http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
254   seq.length.downto(2) do |n|
255     k = rand(n)
256     c = seq[n - 1]
257     seq[n - 1] = seq[k]
258     seq[k] = c
259   end
260   if block_given? then
261     (0...seq.length).each do |i|
262       yield seq[i, 1]
263     end
264     return self.class.new('')
265   else
266     return seq
267   end
268 end
seq() click to toggle source

Create a new sequence based on the current sequence. The original sequence is unchanged.

s = Bio::Sequence::NA.new('atgc')
s2 = s.seq
puts s2                                 #=> 'atgc'

Returns

new Bio::Sequence::NA/AA object

   # File lib/bio/sequence/common.rb
65 def seq
66   self.class.new(self)
67 end
seq!()
Alias for: normalize!
splice(position) click to toggle source

Return a new sequence extracted from the original using a GenBank style position string. See also documentation for the Bio::Location class.

s = Bio::Sequence::NA.new('atgcatgcatgcatgc')
puts s.splice('1..3')                           #=> "atg"
puts s.splice('join(1..3,8..10)')               #=> "atgcat"
puts s.splice('complement(1..3)')               #=> "cat"
puts s.splice('complement(join(1..3,8..10))')   #=> "atgcat"

Note that ‘complement’ed Genbank position strings will have no effect on Bio::Sequence::AA objects.


Arguments:

Returns

Bio::Sequence::NA/AA object

    # File lib/bio/sequence/common.rb
285 def splice(position)
286   unless position.is_a?(Locations) then
287     position = Locations.new(position)
288   end
289   s = ''
290   position.each do |location|
291     if location.sequence
292       s << location.sequence
293     else
294       exon = self.subseq(location.from, location.to)
295       begin
296         exon.complement! if location.strand < 0
297       rescue NameError
298       end
299       s << exon
300     end
301   end
302   return self.class.new(s)
303 end
Also aliased as: splicing
splicing(position)
Alias for: splice
split(*arg) click to toggle source

Acts almost the same as String#split.

Calls superclass method
    # File lib/bio/sequence/common.rb
311 def split(*arg)
312   if block_given?
313     super
314   else
315     ret = super(*arg)
316     ret.collect! { |x| self.class.new('').replace(x) }
317     ret
318   end
319 end
subseq(s = 1, e = self.length) click to toggle source

Returns a new sequence containing the subsequence identified by the start and end numbers given as parameters. Important: Biological sequence numbering conventions (one-based) rather than ruby’s (zero-based) numbering conventions are used.

s = Bio::Sequence::NA.new('atggaatga')
puts s.subseq(1,3)                      #=> "atg"

Start defaults to 1 and end defaults to the entire existing string, so subseq called without any parameters simply returns a new sequence identical to the existing sequence.

puts s.subseq                           #=> "atggaatga"

Arguments:

  • (optional) s(start): Integer (default 1)

  • (optional) e(end): Integer (default current sequence length)

Returns

new Bio::Sequence::NA/AA object

    # File lib/bio/sequence/common.rb
143 def subseq(s = 1, e = self.length)
144   raise "Error: start/end position must be a positive integer" unless s > 0 and e > 0
145   s -= 1
146   e -= 1
147   self[s..e]
148 end
to_fasta(header = '', width = nil) click to toggle source

Bio::Sequence#to_fasta is DEPRECATED Do not use Bio::Sequence#to_fasta ! Use Bio::Sequence#output instead. Note that Bio::Sequence::NA#to_fasta, Bio::Sequence::AA#to_fasata, and Bio::Sequence::Generic#to_fasta can still be used, because there are no alternative methods.

Output the FASTA format string of the sequence. The 1st argument is used as the comment string. If the 2nd option is given, the output sequence will be folded.


Arguments:

  • (optional) header: String object

  • (optional) width: Fixnum object (default nil)

Returns

String

   # File lib/bio/sequence/compat.rb
49 def to_fasta(header = '', width = nil)
50   warn "Bio::Sequence#to_fasta is obsolete. Use Bio::Sequence#output(:fasta) instead" if $DEBUG
51   ">#{header}\n" +
52   if width
53     self.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n")
54   else
55     self.to_s + "\n"
56   end
57 end
to_s() click to toggle source

Return sequence as String. The original sequence is unchanged.

seq = Bio::Sequence::NA.new('atgc')
puts s.to_s                             #=> 'atgc'
puts s.to_s.class                       #=> String
puts s                                  #=> 'atgc'
puts s.class                            #=> Bio::Sequence::NA

Returns

String object

   # File lib/bio/sequence/common.rb
52 def to_s
53   String.new(self)
54 end
Also aliased as: to_str
to_str()
Alias for: to_s
total(hash) click to toggle source

Returns a float total value for the sequence given a hash of base or residue values,

values = {'a' => 0.1, 't' => 0.2, 'g' => 0.3, 'c' => 0.4}
s = Bio::Sequence::NA.new('atgc')
puts s.total(values)                    #=> 1.0

Arguments:

  • (required) hash: Hash object

Returns

Float object

    # File lib/bio/sequence/common.rb
198 def total(hash)
199   hash.default = 0.0 unless hash.default
200   sum = 0.0
201   self.each_byte do |x|
202     begin
203       sum += hash[x.chr]
204     end
205   end
206   return sum
207 end