class Bio::Sequence::NA
DESCRIPTION¶ ↑
Bio::Sequence::NA
represents a bare Nucleic Acid sequence in bioruby.
USAGE¶ ↑
# Create a Nucleic Acid sequence. dna = Bio::Sequence.auto('atgcatgcATGCATGCAAAA') rna = Bio::Sequence.auto('augcaugcaugcaugcaaaa') # What are the names of all the bases? puts dna.names puts rna.names # What is the GC percentage? puts dna.gc_percent puts rna.gc_percent # What is the molecular weight? puts dna.molecular_weight puts rna.molecular_weight # What is the reverse complement? puts dna.reverse_complement puts dna.complement # Is this sequence DNA or RNA? puts dna.rna? # Translate my sequence (see method docs for many options) puts dna.translate puts rna.translate
Public Class Methods
Generate an nucleic acid sequence object from a string.
s = Bio::Sequence::NA.new("aagcttggaccgttgaagt")
or maybe (if you have an nucleic acid sequence in a file)
s = Bio::Sequence:NA.new(File.open('dna.txt').read)
Nucleic Acid sequences are always all lowercase in bioruby
s = Bio::Sequence::NA.new("AAGcTtGG") puts s #=> "aagcttgg"
Whitespace is stripped from the sequence
seq = Bio::Sequence::NA.new("atg\nggg\ttt\r gc") puts s #=> "atggggttgc"
Arguments:
-
(required) str: String
- Returns
-
Bio::Sequence::NA
object
# File lib/bio/sequence/na.rb 75 def initialize(str) 76 super 77 self.downcase! 78 self.tr!(" \t\n\r",'') 79 end
Generate a new random sequence with the given frequency of bases. The sequence length is determined by their cumulative sum. (See also Bio::Sequence::Common#randomize
which creates a new randomized sequence object using the base composition of an existing sequence instance).
counts = {'a'=>1,'c'=>2,'g'=>3,'t'=>4} puts Bio::Sequence::NA.randomize(counts) #=> "ggcttgttac" (for example)
You may also feed the output of randomize into a block
actual_counts = {'a'=>0, 'c'=>0, 'g'=>0, 't'=>0} Bio::Sequence::NA.randomize(counts) {|x| actual_counts[x] += 1} actual_counts #=> {"a"=>1, "c"=>2, "g"=>3, "t"=>4}
Arguments:
-
(optional) hash: Hash object
- Returns
-
Bio::Sequence::NA
object
# File lib/bio/sequence/compat.rb 82 def self.randomize(*arg, &block) 83 self.new('').randomize(*arg, &block) 84 end
Public Instance Methods
Calculate the ratio of AT / ATGC bases. U is regarded as T.
s = Bio::Sequence::NA.new('atggcgtga') puts s.at_content #=> 4/9 puts s.at_content.to_f #=> 0.444444444444444
In older Ruby versions, Float is always returned.
s = Bio::Sequence::NA.new('atggcgtga') puts s.at_content #=> 0.444444444444444
Note that “u” is regarded as “t”. If there are no ATGC bases in the sequence, 0.0 is returned.
- Returns
-
Rational or Float
# File lib/bio/sequence/na.rb 346 def at_content 347 count = self.composition 348 at = count['a'] + count['t'] + count['u'] 349 gc = count['g'] + count['c'] 350 total = at + gc 351 return 0.0 if total == 0 352 return at.quo(total) 353 end
Calculate the ratio of (A - T) / (A + T) bases. U is regarded as T.
s = Bio::Sequence::NA.new('atgttgttgttc') puts s.at_skew #=> (-3/4) puts s.at_skew.to_f #=> -0.75
In older Ruby versions, Float is always returned.
s = Bio::Sequence::NA.new('atgttgttgttc') puts s.at_skew #=> -0.75
Note that “u” is regarded as “t”. If there are no AT bases in the sequence, 0.0 is returned.
- Returns
-
Rational or Float
# File lib/bio/sequence/na.rb 395 def at_skew 396 count = self.composition 397 a = count['a'] 398 t = count['t'] + count['u'] 399 at = a + t 400 return 0.0 if at == 0 401 return (a - t).quo(at) 402 end
Returns counts of each codon in the sequence in a hash.
s = Bio::Sequence::NA.new('atggcgtga') puts s.codon_usage #=> {"gcg"=>1, "tga"=>1, "atg"=>1}
This method does not validate codons! Any three letter group is a ‘codon’. So,
s = Bio::Sequence::NA.new('atggNNtga') puts s.codon_usage #=> {"tga"=>1, "gnn"=>1, "atg"=>1} seq = Bio::Sequence::NA.new('atgg--tga') puts s.codon_usage #=> {"tga"=>1, "g--"=>1, "atg"=>1}
Also, there is no option to work in any frame other than the first.
- Returns
-
Hash object
# File lib/bio/sequence/na.rb 273 def codon_usage 274 hash = Hash.new(0) 275 self.window_search(3, 3) do |codon| 276 hash[codon] += 1 277 end 278 return hash 279 end
Example:
seq = Bio::Sequence::NA.new('gaattc') cuts = seq.cut_with_enzyme('EcoRI')
or
seq = Bio::Sequence::NA.new('gaattc') cuts = seq.cut_with_enzyme('g^aattc')
See Bio::RestrictionEnzyme::Analysis.cut
# File lib/bio/sequence/na.rb 530 def cut_with_enzyme(*args) 531 Bio::RestrictionEnzyme::Analysis.cut(self, *args) 532 end
Returns a new sequence object with any ‘u’ bases changed to ‘t’. The original sequence is not modified.
s = Bio::Sequence::NA.new('augc') puts s.dna #=> 'atgc' puts s #=> 'augc'
- Returns
-
new
Bio::Sequence::NA
object
# File lib/bio/sequence/na.rb 474 def dna 475 self.tr('u', 't') 476 end
Changes any ‘u’ bases in the original sequence to ‘t’. The original sequence is modified.
s = Bio::Sequence::NA.new('augc') puts s.dna! #=> 'atgc' puts s #=> 'atgc'
- Returns
-
current
Bio::Sequence::NA
object (modified)
# File lib/bio/sequence/na.rb 486 def dna! 487 self.tr!('u', 't') 488 end
Returns a new complementary sequence object (without reversing). The original sequence object is not modified.
s = Bio::Sequence::NA.new('atgc') puts s.forward_complement #=> 'tacg' puts s #=> 'atgc'
- Returns
-
new
Bio::Sequence::NA
object
# File lib/bio/sequence/na.rb 100 def forward_complement 101 s = self.class.new(self) 102 s.forward_complement! 103 s 104 end
Converts the current sequence into its complement (without reversing). The original sequence object is modified.
seq = Bio::Sequence::NA.new('atgc') puts s.forward_complement! #=> 'tacg' puts s #=> 'tacg'
- Returns
-
current
Bio::Sequence::NA
object (modified)
# File lib/bio/sequence/na.rb 114 def forward_complement! 115 if self.rna? 116 self.tr!('augcrymkdhvbswn', 'uacgyrkmhdbvswn') 117 else 118 self.tr!('atgcrymkdhvbswn', 'tacgyrkmhdbvswn') 119 end 120 self 121 end
Calculate the ratio of GC / ATGC bases. U is regarded as T.
s = Bio::Sequence::NA.new('atggcgtga') puts s.gc_content #=> (5/9) puts s.gc_content.to_f #=> 0.5555555555555556
In older Ruby versions, Float is always returned.
s = Bio::Sequence::NA.new('atggcgtga') puts s.gc_content #=> 0.555555555555556
Note that “u” is regarded as “t”. If there are no ATGC bases in the sequence, 0.0 is returned.
- Returns
-
Rational or Float
# File lib/bio/sequence/na.rb 321 def gc_content 322 count = self.composition 323 at = count['a'] + count['t'] + count['u'] 324 gc = count['g'] + count['c'] 325 total = at + gc 326 return 0.0 if total == 0 327 return gc.quo(total) 328 end
Calculate the ratio of GC / ATGC bases as a percentage rounded to the nearest whole number. U is regarded as T.
s = Bio::Sequence::NA.new('atggcgtga') puts s.gc_percent #=> 55
Note that this method only returns an integer value. When more digits after decimal points are needed, use gc_content
and sprintf like below:
s = Bio::Sequence::NA.new('atggcgtga') puts sprintf("%3.2f", s.gc_content * 100) #=> "55.56"
- Returns
-
Fixnum
# File lib/bio/sequence/na.rb 296 def gc_percent 297 count = self.composition 298 at = count['a'] + count['t'] + count['u'] 299 gc = count['g'] + count['c'] 300 return 0 if at + gc == 0 301 gc = 100 * gc / (at + gc) 302 return gc 303 end
Calculate the ratio of (G - C) / (G + C) bases.
s = Bio::Sequence::NA.new('atggcgtga') puts s.gc_skew #=> 3/5 puts s.gc_skew.to_f #=> 0.6
In older Ruby versions, Float is always returned.
s = Bio::Sequence::NA.new('atggcgtga') puts s.gc_skew #=> 0.6
If there are no GC bases in the sequence, 0.0 is returned.
- Returns
-
Rational or Float
# File lib/bio/sequence/na.rb 370 def gc_skew 371 count = self.composition 372 g = count['g'] 373 c = count['c'] 374 gc = g + c 375 return 0.0 if gc == 0 376 return (g - c).quo(gc) 377 end
Returns an alphabetically sorted array of any non-standard bases (other than ‘atgcu’).
s = Bio::Sequence::NA.new('atgStgQccR') puts s.illegal_bases #=> ["q", "r", "s"]
- Returns
-
Array object
# File lib/bio/sequence/na.rb 411 def illegal_bases 412 self.scan(/[^atgcu]/).sort.uniq 413 end
Estimate molecular weight (using the values from BioPerl’s SeqStats.pm module).
s = Bio::Sequence::NA.new('atggcgtga') puts s.molecular_weight #=> 2841.00708
RNA and DNA do not have the same molecular weights,
s = Bio::Sequence::NA.new('auggcguga') puts s.molecular_weight #=> 2956.94708
- Returns
-
Float object
# File lib/bio/sequence/na.rb 427 def molecular_weight 428 if self.rna? 429 Bio::NucleicAcid.weight(self, true) 430 else 431 Bio::NucleicAcid.weight(self) 432 end 433 end
Generate the list of the names of each nucleotide along with the sequence (full name). Names used in bioruby are found in the Bio::AminoAcid::NAMES hash.
s = Bio::Sequence::NA.new('atg') puts s.names #=> ["Adenine", "Thymine", "Guanine"]
- Returns
-
Array object
# File lib/bio/sequence/na.rb 458 def names 459 array = [] 460 self.each_byte do |x| 461 array.push(Bio::NucleicAcid.names[x.chr.upcase]) 462 end 463 return array 464 end
Returns a new sequence object with the reverse complement sequence to the original. The original sequence is not modified.
s = Bio::Sequence::NA.new('atgc') puts s.reverse_complement #=> 'gcat' puts s #=> 'atgc'
- Returns
-
new
Bio::Sequence::NA
object
# File lib/bio/sequence/na.rb 131 def reverse_complement 132 s = self.class.new(self) 133 s.reverse_complement! 134 s 135 end
Converts the original sequence into its reverse complement.
The original sequence is modified.
s = Bio::Sequence::NA.new('atgc') puts s.reverse_complement #=> 'gcat' puts s #=> 'gcat'
- Returns
-
current
Bio::Sequence::NA
object (modified)
# File lib/bio/sequence/na.rb 145 def reverse_complement! 146 self.reverse! 147 self.forward_complement! 148 end
Returns a new sequence object with any ‘t’ bases changed to ‘u’. The original sequence is not modified.
s = Bio::Sequence::NA.new('atgc') puts s.dna #=> 'augc' puts s #=> 'atgc'
- Returns
-
new
Bio::Sequence::NA
object
# File lib/bio/sequence/na.rb 498 def rna 499 self.tr('t', 'u') 500 end
Changes any ‘t’ bases in the original sequence to ‘u’. The original sequence is modified.
s = Bio::Sequence::NA.new('atgc') puts s.dna! #=> 'augc' puts s #=> 'augc'
- Returns
-
current
Bio::Sequence::NA
object (modified)
# File lib/bio/sequence/na.rb 510 def rna! 511 self.tr!('t', 'u') 512 end
Create a ruby regular expression instance (Regexp)
s = Bio::Sequence::NA.new('atggcgtga') puts s.to_re #=> /atggcgtga/
- Returns
-
Regexp object
# File lib/bio/sequence/na.rb 442 def to_re 443 if self.rna? 444 Bio::NucleicAcid.to_re(self.dna, true) 445 else 446 Bio::NucleicAcid.to_re(self) 447 end 448 end
Translate into an amino acid sequence.
s = Bio::Sequence::NA.new('atggcgtga') puts s.translate #=> "MA*"
By default, translate starts in reading frame position 1, but you can start in either 2 or 3 as well,
puts s.translate(2) #=> "WR" puts s.translate(3) #=> "GV"
You may also translate the reverse complement in one step by using frame values of -1, -2, and -3 (or 4, 5, and 6)
puts s.translate(-1) #=> "SRH" puts s.translate(4) #=> "SRH" puts s.reverse_complement.translate(1) #=> "SRH"
The default codon table in the translate function is the Standard Eukaryotic codon table. The translate function takes either a number or a Bio::CodonTable
object for its table argument. The available tables are (NCBI):
1. "Standard (Eukaryote)" 2. "Vertebrate Mitochondrial" 3. "Yeast Mitochondorial" 4. "Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma" 5. "Invertebrate Mitochondrial" 6. "Ciliate Macronuclear and Dasycladacean" 9. "Echinoderm Mitochondrial" 10. "Euplotid Nuclear" 11. "Bacteria" 12. "Alternative Yeast Nuclear" 13. "Ascidian Mitochondrial" 14. "Flatworm Mitochondrial" 15. "Blepharisma Macronuclear" 16. "Chlorophycean Mitochondrial" 21. "Trematode Mitochondrial" 22. "Scenedesmus obliquus mitochondrial" 23. "Thraustochytrium Mitochondrial"
If you are using anything other than the default table, you must specify frame in the translate method call,
puts s.translate #=> "MA*" (using defaults) puts s.translate(1,1) #=> "MA*" (same as above, but explicit) puts s.translate(1,2) #=> "MAW" (different codon table)
and using a Bio::CodonTable
instance in the translate method call,
mt_table = Bio::CodonTable[2] puts s.translate(1, mt_table) #=> "MAW"
By default, any invalid or unknown codons (as could happen if the sequence contains ambiguities) will be represented by ‘X’ in the translated sequence. You may change this to any character of your choice.
s = Bio::Sequence::NA.new('atgcNNtga') puts s.translate #=> "MX*" puts s.translate(1,1,'9') #=> "M9*"
The translate method considers gaps to be unknown characters and treats them as such (i.e. does not collapse sequences prior to translation), so
s = Bio::Sequence::NA.new('atgc--tga') puts s.translate #=> "MX*"
Arguments:
-
(optional) frame: one of 1,2,3,4,5,6,-1,-2,-3 (default 1)
-
(optional) table: Fixnum in range 1,23 or
Bio::CodonTable
object (default 1) -
(optional) unknown: Character (default ‘X’)
- Returns
-
Bio::Sequence::AA
object
# File lib/bio/sequence/na.rb 232 def translate(frame = 1, table = 1, unknown = 'X') 233 if table.is_a?(Bio::CodonTable) 234 ct = table 235 else 236 ct = Bio::CodonTable[table] 237 end 238 naseq = self.dna 239 case frame 240 when 1, 2, 3 241 from = frame - 1 242 when 4, 5, 6 243 from = frame - 4 244 naseq.complement! 245 when -1, -2, -3 246 from = -1 - frame 247 naseq.complement! 248 else 249 from = 0 250 end 251 nalen = naseq.length - from 252 nalen -= nalen % 3 253 aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or unknown} 254 return Bio::Sequence::AA.new(aaseq) 255 end
Protected Instance Methods
# File lib/bio/sequence/na.rb 514 def rna? 515 self.index('u') 516 end