module Bio::Sequence::Common

DESCRIPTION

Bio::Sequence::Common is a Mixin implementing methods common to Bio::Sequence::AA and Bio::Sequence::NA. All of these methods are available to either Amino Acid or Nucleic Acid sequences, and by encapsulation are also available to Bio::Sequence objects.

USAGE

# Create a sequence
dna = Bio::Sequence.auto('atgcatgcatgc')

# Splice out a subsequence using a Genbank-style location string
puts dna.splice('complement(1..4)')

# What is the base composition?
puts dna.composition

# Create a random sequence with the composition of a current sequence
puts dna.randomize

Public Instance Methods

+(*arg) click to toggle source

Create a new sequence by adding to an existing sequence. The existing sequence is not modified.

s = Bio::Sequence::NA.new('atgc')
s2 = s + 'atgc'
puts s2                                 #=> "atgcatgc"
puts s                                  #=> "atgc"

The new sequence is of the same class as the existing sequence if the new data was added to an existing sequence,

puts s2.class == s.class                #=> true

but if an existing sequence is added to a String, the result is a String

s3 = 'atgc' + s
puts s3.class                           #=> String

Returns

new Bio::Sequence::NA/AA or String object

Calls superclass method
# File lib/bio/sequence/common.rb, line 121
def +(*arg)
  self.class.new(super(*arg))
end
<<(*arg) click to toggle source
# File lib/bio/sequence/common.rb, line 98
def <<(*arg)
  concat(*arg)
end
composition() click to toggle source

Returns a hash of the occurrence counts for each residue or base.

s = Bio::Sequence::NA.new('atgc')
puts s.composition              #=> {"a"=>1, "c"=>1, "g"=>1, "t"=>1}

Returns

Hash object

# File lib/bio/sequence/common.rb, line 215
def composition
  count = Hash.new(0)
  self.scan(/./) do |x|
    count[x] += 1
  end
  return count
end
concat(*arg) click to toggle source

Add new data to the end of the current sequence. The original sequence is modified.

s = Bio::Sequence::NA.new('atgc')
s << 'atgc'
puts s                                  #=> "atgcatgc"
s << s
puts s                                  #=> "atgcatgcatgcatgc"

Returns

current Bio::Sequence::NA/AA object (modified)

Calls superclass method
# File lib/bio/sequence/common.rb, line 94
def concat(*arg)
  super(self.class.new(*arg))
end
normalize!() click to toggle source

Normalize the current sequence, removing all whitespace and transforming all positions to uppercase if the sequence is AA or transforming all positions to lowercase if the sequence is NA. The original sequence is modified.

s = Bio::Sequence::NA.new('atgc')
s.normalize!

Returns

current Bio::Sequence::NA/AA object (modified)

# File lib/bio/sequence/common.rb, line 78
def normalize!
  initialize(self)
  self
end
Also aliased as: seq!
randomize(hash = nil) { |seq| ... } click to toggle source

Returns a randomized sequence. The default is to retain the same base/residue composition as the original. If a hash of base/residue counts is given, the new sequence will be based on that hash composition. If a block is given, each new randomly selected position will be passed into the block. In all cases, the original sequence is not modified.

s = Bio::Sequence::NA.new('atgc')
puts s.randomize                        #=> "tcag"  (for example)

new_composition = {'a' => 2, 't' => 2}
puts s.randomize(new_composition)       #=> "ttaa"  (for example)

count = 0
s.randomize { |x| count += 1 }
puts count                              #=> 4

Arguments:

  • (optional) hash: Hash object

Returns

new Bio::Sequence::NA/AA object

# File lib/bio/sequence/common.rb, line 243
def randomize(hash = nil)
  if hash
    tmp = ''
    hash.each {|k, v|
      tmp += k * v.to_i
    }
  else
    tmp = self
  end
  seq = self.class.new(tmp)
  # Reference: http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
  seq.length.downto(2) do |n|
    k = rand(n)
    c = seq[n - 1]
    seq[n - 1] = seq[k]
    seq[k] = c
  end
  if block_given? then
    (0...seq.length).each do |i|
      yield seq[i, 1]
    end
    return self.class.new('')
  else
    return seq
  end
end
seq() click to toggle source

Create a new sequence based on the current sequence. The original sequence is unchanged.

s = Bio::Sequence::NA.new('atgc')
s2 = s.seq
puts s2                                 #=> 'atgc'

Returns

new Bio::Sequence::NA/AA object

# File lib/bio/sequence/common.rb, line 65
def seq
  self.class.new(self)
end
seq!()
Alias for: normalize!
splice(position) click to toggle source

Return a new sequence extracted from the original using a GenBank style position string. See also documentation for the Bio::Location class.

s = Bio::Sequence::NA.new('atgcatgcatgcatgc')
puts s.splice('1..3')                           #=> "atg"
puts s.splice('join(1..3,8..10)')               #=> "atgcat"
puts s.splice('complement(1..3)')               #=> "cat"
puts s.splice('complement(join(1..3,8..10))')   #=> "atgcat"

Note that 'complement'ed Genbank position strings will have no effect on Bio::Sequence::AA objects.


Arguments:

Returns

Bio::Sequence::NA/AA object

# File lib/bio/sequence/common.rb, line 285
def splice(position)
  unless position.is_a?(Locations) then
    position = Locations.new(position)
  end
  s = ''
  position.each do |location|
    if location.sequence
      s << location.sequence
    else
      exon = self.subseq(location.from, location.to)
      begin
        exon.complement! if location.strand < 0
      rescue NameError
      end
      s << exon
    end
  end
  return self.class.new(s)
end
Also aliased as: splicing
splicing(position)
Alias for: splice
subseq(s = 1, e = self.length) click to toggle source

Returns a new sequence containing the subsequence identified by the start and end numbers given as parameters. Important: Biological sequence numbering conventions (one-based) rather than ruby's (zero-based) numbering conventions are used.

s = Bio::Sequence::NA.new('atggaatga')
puts s.subseq(1,3)                      #=> "atg"

Start defaults to 1 and end defaults to the entire existing string, so subseq called without any parameters simply returns a new sequence identical to the existing sequence.

puts s.subseq                           #=> "atggaatga"

Arguments:

  • (optional) s(start): Integer (default 1)

  • (optional) e(end): Integer (default current sequence length)

Returns

new Bio::Sequence::NA/AA object

# File lib/bio/sequence/common.rb, line 143
def subseq(s = 1, e = self.length)
  raise "Error: start/end position must be a positive integer" unless s > 0 and e > 0
  s -= 1
  e -= 1
  self[s..e]
end
to_fasta(header = '', width = nil) click to toggle source

Bio::Sequence#to_fasta is DEPRECATED Do not use Bio::Sequence#to_fasta ! Use Bio::Sequence::Format#output instead. Note that #to_fasta, Bio::Sequence::AA#to_fasata, and Bio::Sequence::Generic#to_fasta can still be used, because there are no alternative methods.

Output the FASTA format string of the sequence. The 1st argument is used as the comment string. If the 2nd option is given, the output sequence will be folded.


Arguments:

  • (optional) header: String object

  • (optional) width: Fixnum object (default nil)

Returns

String

# File lib/bio/sequence/compat.rb, line 49
def to_fasta(header = '', width = nil)
  warn "Bio::Sequence#to_fasta is obsolete. Use Bio::Sequence#output(:fasta) instead" if $DEBUG
  ">#{header}\n" +
  if width
    self.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n")
  else
    self.to_s + "\n"
  end
end
to_s() click to toggle source

Return sequence as String. The original sequence is unchanged.

seq = Bio::Sequence::NA.new('atgc')
puts s.to_s                             #=> 'atgc'
puts s.to_s.class                       #=> String
puts s                                  #=> 'atgc'
puts s.class                            #=> Bio::Sequence::NA

Returns

String object

# File lib/bio/sequence/common.rb, line 52
def to_s
  String.new(self)
end
Also aliased as: to_str
to_str()
Alias for: to_s
total(hash) click to toggle source

Returns a float total value for the sequence given a hash of base or residue values,

values = {'a' => 0.1, 't' => 0.2, 'g' => 0.3, 'c' => 0.4}
s = Bio::Sequence::NA.new('atgc')
puts s.total(values)                    #=> 1.0

Arguments:

  • (required) hash: Hash object

Returns

Float object

# File lib/bio/sequence/common.rb, line 198
def total(hash)
  hash.default = 0.0 unless hash.default
  sum = 0.0
  self.each_byte do |x|
    begin
      sum += hash[x.chr]
    end
  end
  return sum
end