class Bio::GFF::GFF3::Record::Gap
Bio:GFF::GFF3::Record::Gap is a class to store data of “Gap” attribute.
Constants
- Code
-
Code
is a class to store length of single-letter code.
Attributes
Internal data. Users must not use it.
Public Class Methods
Source
# File lib/bio/db/gff.rb 1283 def initialize(str = nil) 1284 if str then 1285 @data = str.split(/ +/).collect do |x| 1286 if /\A([A-Z])([0-9]+)\z/ =~ x.strip then 1287 Code.new($1.intern, $2.to_i) 1288 else 1289 warn "ignored unknown token: #{x}.inspect" if $VERBOSE 1290 nil 1291 end 1292 end 1293 @data.compact! 1294 else 1295 @data = [] 1296 end 1297 end
Creates a new Gap
object.
Arguments:
-
str: a formatted string, or nil.
Source
# File lib/bio/db/gff.rb 1399 def self.new_from_sequences_na(reference, target, 1400 gap_regexp = /[^a-zA-Z]/) 1401 gap = self.new 1402 gap.instance_eval { 1403 __initialize_from_sequences_na(reference, target, 1404 gap_regexp) 1405 } 1406 gap 1407 end
Creates a new Gap
object from given sequence alignment.
Note that sites of which both reference and target are gaps are silently removed.
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (nucleotide sequence)
-
gap_regexp: regexp to identify gap
Source
# File lib/bio/db/gff.rb 1595 def self.new_from_sequences_na_aa(reference, target, 1596 gap_regexp = /[^a-zA-Z]/, 1597 space_regexp = /\s/, 1598 forward_frameshift_regexp = /\>/, 1599 reverse_frameshift_regexp = /\</) 1600 gap = self.new 1601 gap.instance_eval { 1602 __initialize_from_sequences_na_aa(reference, target, 1603 gap_regexp, 1604 space_regexp, 1605 forward_frameshift_regexp, 1606 reverse_frameshift_regexp) 1607 } 1608 gap 1609 end
Creates a new Gap
object from given sequence alignment.
Note that sites of which both reference and target are gaps are silently removed.
For incorrect alignments that break 3:1 rule, gap positions will be moved inside codons, unwanted gaps will be removed, and some forward or reverse frameshift will be inserted.
For example,
atgg-taagac-att M V K - I
is treated as:
atggt<aagacatt M V K >>I
Incorrect combination of frameshift with frameshift or gap may cause undefined behavior.
Forward frameshifts are recomended to be indicated in the target sequence. Reverse frameshifts can be indicated in the reference sequence or the target sequence.
Priority of regular expressions:
space > forward/reverse frameshift > gap
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (amino acid sequence)
-
gap_regexp: regexp to identify gap
-
space_regexp: regexp to identify space character which is completely ignored
-
forward_frameshift_regexp: regexp to identify forward frameshift
-
reverse_frameshift_regexp: regexp to identify reverse frameshift
Source
# File lib/bio/db/gff.rb 1300 def self.parse(str) 1301 self.new(str) 1302 end
Same as new(str).
Public Instance Methods
Source
# File lib/bio/db/gff.rb 1623 def ==(other) 1624 if other.class == self.class and 1625 @data == other.data then 1626 true 1627 else 1628 false 1629 end 1630 end
If self == other, returns true. otherwise, returns false.
Source
# File lib/bio/db/gff.rb 1723 def process_sequences_na(reference, target, gap_char = '-') 1724 s_ref, s_tgt = dup_seqs(reference, target) 1725 1726 s_ref, s_tgt = __process_sequences(s_ref, s_tgt, 1727 gap_char, gap_char, 1728 1, 1, 1729 gap_char, gap_char) 1730 1731 if $VERBOSE and s_ref.length != s_tgt.length then 1732 warn "returned sequences not equal length" 1733 end 1734 return s_ref, s_tgt 1735 end
Processes nucleotide sequences and returns gapped sequences as an array of sequences.
Note for forward/reverse frameshift: Forward/Reverse_frameshift is simply treated as gap insertion to the target/reference sequence.
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (nucleotide sequence)
-
gap_char: gap character
Source
# File lib/bio/db/gff.rb 1760 def process_sequences_na_aa(reference, target, 1761 gap_char = '-', 1762 space_char = ' ', 1763 forward_frameshift = '>', 1764 reverse_frameshift = '<') 1765 s_ref, s_tgt = dup_seqs(reference, target) 1766 s_tgt = s_tgt.gsub(/./, "\\0#{space_char}#{space_char}") 1767 ref_increment = 3 1768 tgt_increment = 1 + space_char.length * 2 1769 ref_gap = gap_char * 3 1770 tgt_gap = "#{gap_char}#{space_char}#{space_char}" 1771 return __process_sequences(s_ref, s_tgt, 1772 ref_gap, tgt_gap, 1773 ref_increment, tgt_increment, 1774 forward_frameshift, 1775 reverse_frameshift) 1776 end
Processes sequences and returns gapped sequences as an array of sequences. reference must be a nucleotide sequence, and target must be an amino acid sequence.
Note for reverse frameshift: Reverse_frameshift characers are inserted in the reference sequence. For example, alignment of “Gap=M3 R1 M2” is:
atgaagat<aatgtc M K I N V
Alignment
of “Gap=M3 R3 M3” is:
atgaag<<<attaatgtc M K I I N V
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (amino acid sequence)
-
gap_char: gap character
-
space_char: space character inserted to amino sequence for matching na-aa alignment
-
forward_frameshift: forward frameshift character
-
reverse_frameshift: reverse frameshift character
Source
# File lib/bio/db/gff.rb 1612 def to_s 1613 @data.collect { |x| x.to_s }.join(" ") 1614 end
string representation