class Bio::GFF::GFF3::Record::Gap
Bio:GFF::GFF3::Record::Gap is a class to store data of “Gap” attribute.
Constants
- Code
Code
is a class to store length of single-letter code.
Attributes
Internal data. Users must not use it.
Public Class Methods
Creates a new Gap
object.
Arguments:
-
str: a formatted string, or nil.
# File lib/bio/db/gff.rb 1277 def initialize(str = nil) 1278 if str then 1279 @data = str.split(/ +/).collect do |x| 1280 if /\A([A-Z])([0-9]+)\z/ =~ x.strip then 1281 Code.new($1.intern, $2.to_i) 1282 else 1283 warn "ignored unknown token: #{x}.inspect" if $VERBOSE 1284 nil 1285 end 1286 end 1287 @data.compact! 1288 else 1289 @data = [] 1290 end 1291 end
Creates a new Gap
object from given sequence alignment.
Note that sites of which both reference and target are gaps are silently removed.
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (nucleotide sequence)
-
gap_regexp: regexp to identify gap
# File lib/bio/db/gff.rb 1393 def self.new_from_sequences_na(reference, target, 1394 gap_regexp = /[^a-zA-Z]/) 1395 gap = self.new 1396 gap.instance_eval { 1397 __initialize_from_sequences_na(reference, target, 1398 gap_regexp) 1399 } 1400 gap 1401 end
Creates a new Gap
object from given sequence alignment.
Note that sites of which both reference and target are gaps are silently removed.
For incorrect alignments that break 3:1 rule, gap positions will be moved inside codons, unwanted gaps will be removed, and some forward or reverse frameshift will be inserted.
For example,
atgg-taagac-att M V K - I
is treated as:
atggt<aagacatt M V K >>I
Incorrect combination of frameshift with frameshift or gap may cause undefined behavior.
Forward frameshifts are recomended to be indicated in the target sequence. Reverse frameshifts can be indicated in the reference sequence or the target sequence.
Priority of regular expressions:
space > forward/reverse frameshift > gap
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (amino acid sequence)
-
gap_regexp: regexp to identify gap
-
space_regexp: regexp to identify space character which is completely ignored
-
forward_frameshift_regexp: regexp to identify forward frameshift
-
reverse_frameshift_regexp: regexp to identify reverse frameshift
# File lib/bio/db/gff.rb 1589 def self.new_from_sequences_na_aa(reference, target, 1590 gap_regexp = /[^a-zA-Z]/, 1591 space_regexp = /\s/, 1592 forward_frameshift_regexp = /\>/, 1593 reverse_frameshift_regexp = /\</) 1594 gap = self.new 1595 gap.instance_eval { 1596 __initialize_from_sequences_na_aa(reference, target, 1597 gap_regexp, 1598 space_regexp, 1599 forward_frameshift_regexp, 1600 reverse_frameshift_regexp) 1601 } 1602 gap 1603 end
Same as new(str).
# File lib/bio/db/gff.rb 1294 def self.parse(str) 1295 self.new(str) 1296 end
Public Instance Methods
If self == other, returns true. otherwise, returns false.
# File lib/bio/db/gff.rb 1617 def ==(other) 1618 if other.class == self.class and 1619 @data == other.data then 1620 true 1621 else 1622 false 1623 end 1624 end
Processes nucleotide sequences and returns gapped sequences as an array of sequences.
Note for forward/reverse frameshift: Forward/Reverse_frameshift is simply treated as gap insertion to the target/reference sequence.
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (nucleotide sequence)
-
gap_char: gap character
# File lib/bio/db/gff.rb 1717 def process_sequences_na(reference, target, gap_char = '-') 1718 s_ref, s_tgt = dup_seqs(reference, target) 1719 1720 s_ref, s_tgt = __process_sequences(s_ref, s_tgt, 1721 gap_char, gap_char, 1722 1, 1, 1723 gap_char, gap_char) 1724 1725 if $VERBOSE and s_ref.length != s_tgt.length then 1726 warn "returned sequences not equal length" 1727 end 1728 return s_ref, s_tgt 1729 end
Processes sequences and returns gapped sequences as an array of sequences. reference must be a nucleotide sequence, and target must be an amino acid sequence.
Note for reverse frameshift: Reverse_frameshift characers are inserted in the reference sequence. For example, alignment of “Gap=M3 R1 M2” is:
atgaagat<aatgtc M K I N V
Alignment
of “Gap=M3 R3 M3” is:
atgaag<<<attaatgtc M K I I N V
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (amino acid sequence)
-
gap_char: gap character
-
space_char: space character inserted to amino sequence for matching na-aa alignment
-
forward_frameshift: forward frameshift character
-
reverse_frameshift: reverse frameshift character
# File lib/bio/db/gff.rb 1754 def process_sequences_na_aa(reference, target, 1755 gap_char = '-', 1756 space_char = ' ', 1757 forward_frameshift = '>', 1758 reverse_frameshift = '<') 1759 s_ref, s_tgt = dup_seqs(reference, target) 1760 s_tgt = s_tgt.gsub(/./, "\\0#{space_char}#{space_char}") 1761 ref_increment = 3 1762 tgt_increment = 1 + space_char.length * 2 1763 ref_gap = gap_char * 3 1764 tgt_gap = "#{gap_char}#{space_char}#{space_char}" 1765 return __process_sequences(s_ref, s_tgt, 1766 ref_gap, tgt_gap, 1767 ref_increment, tgt_increment, 1768 forward_frameshift, 1769 reverse_frameshift) 1770 end
string representation
# File lib/bio/db/gff.rb 1606 def to_s 1607 @data.collect { |x| x.to_s }.join(" ") 1608 end