class Bio::GFF::GFF3::Record::Gap

Bio:GFF::GFF3::Record::Gap is a class to store data of “Gap” attribute.

Constants

Code

Code is a class to store length of single-letter code.

Attributes

data[R]

Internal data. Users must not use it.

Public Class Methods

new(str = nil) click to toggle source

Creates a new Gap object.


Arguments:

  • str: a formatted string, or nil.

     # File lib/bio/db/gff.rb
1277 def initialize(str = nil)
1278   if str then
1279     @data = str.split(/ +/).collect do |x|
1280       if /\A([A-Z])([0-9]+)\z/ =~ x.strip then
1281         Code.new($1.intern, $2.to_i)
1282       else
1283         warn "ignored unknown token: #{x}.inspect" if $VERBOSE
1284         nil
1285       end
1286     end
1287     @data.compact!
1288   else
1289     @data = []
1290   end
1291 end
new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) click to toggle source

Creates a new Gap object from given sequence alignment.

Note that sites of which both reference and target are gaps are silently removed.


Arguments:

  • reference: reference sequence (nucleotide sequence)

  • target: target sequence (nucleotide sequence)

  • gap_regexp: regexp to identify gap

     # File lib/bio/db/gff.rb
1393 def self.new_from_sequences_na(reference, target,
1394                                gap_regexp = /[^a-zA-Z]/)
1395   gap = self.new
1396   gap.instance_eval { 
1397     __initialize_from_sequences_na(reference, target,
1398                                    gap_regexp)
1399   }
1400   gap
1401 end
new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</) click to toggle source

Creates a new Gap object from given sequence alignment.

Note that sites of which both reference and target are gaps are silently removed.

For incorrect alignments that break 3:1 rule, gap positions will be moved inside codons, unwanted gaps will be removed, and some forward or reverse frameshift will be inserted.

For example,

atgg-taagac-att
M  V  K  -  I

is treated as:

atggt<aagacatt
M  V  K  >>I

Incorrect combination of frameshift with frameshift or gap may cause undefined behavior.

Forward frameshifts are recomended to be indicated in the target sequence. Reverse frameshifts can be indicated in the reference sequence or the target sequence.

Priority of regular expressions:

space > forward/reverse frameshift > gap

Arguments:

  • reference: reference sequence (nucleotide sequence)

  • target: target sequence (amino acid sequence)

  • gap_regexp: regexp to identify gap

  • space_regexp: regexp to identify space character which is completely ignored

  • forward_frameshift_regexp: regexp to identify forward frameshift

  • reverse_frameshift_regexp: regexp to identify reverse frameshift

     # File lib/bio/db/gff.rb
1589 def self.new_from_sequences_na_aa(reference, target,
1590                                   gap_regexp = /[^a-zA-Z]/,
1591                                   space_regexp = /\s/,
1592                                   forward_frameshift_regexp = /\>/,
1593                                   reverse_frameshift_regexp = /\</)
1594   gap = self.new
1595   gap.instance_eval { 
1596     __initialize_from_sequences_na_aa(reference, target,
1597                                       gap_regexp,
1598                                       space_regexp,
1599                                       forward_frameshift_regexp,
1600                                       reverse_frameshift_regexp)
1601   }
1602   gap
1603 end
parse(str) click to toggle source

Same as new(str).

     # File lib/bio/db/gff.rb
1294 def self.parse(str)
1295   self.new(str)
1296 end

Public Instance Methods

==(other) click to toggle source

If self == other, returns true. otherwise, returns false.

     # File lib/bio/db/gff.rb
1617 def ==(other)
1618   if other.class == self.class and
1619       @data == other.data then
1620     true
1621   else
1622     false
1623   end
1624 end
process_sequences_na(reference, target, gap_char = '-') click to toggle source

Processes nucleotide sequences and returns gapped sequences as an array of sequences.

Note for forward/reverse frameshift: Forward/Reverse_frameshift is simply treated as gap insertion to the target/reference sequence.


Arguments:

  • reference: reference sequence (nucleotide sequence)

  • target: target sequence (nucleotide sequence)

  • gap_char: gap character

     # File lib/bio/db/gff.rb
1717 def process_sequences_na(reference, target, gap_char = '-')
1718   s_ref, s_tgt = dup_seqs(reference, target)
1719 
1720   s_ref, s_tgt = __process_sequences(s_ref, s_tgt,
1721                                      gap_char, gap_char,
1722                                      1, 1,
1723                                      gap_char, gap_char)
1724 
1725   if $VERBOSE and s_ref.length != s_tgt.length then
1726     warn "returned sequences not equal length"
1727   end
1728   return s_ref, s_tgt
1729 end
process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') click to toggle source

Processes sequences and returns gapped sequences as an array of sequences. reference must be a nucleotide sequence, and target must be an amino acid sequence.

Note for reverse frameshift: Reverse_frameshift characers are inserted in the reference sequence. For example, alignment of “Gap=M3 R1 M2” is:

atgaagat<aatgtc
M  K  I  N  V

Alignment of “Gap=M3 R3 M3” is:

atgaag<<<attaatgtc
M  K  I  I  N  V

Arguments:

  • reference: reference sequence (nucleotide sequence)

  • target: target sequence (amino acid sequence)

  • gap_char: gap character

  • space_char: space character inserted to amino sequence for matching na-aa alignment

  • forward_frameshift: forward frameshift character

  • reverse_frameshift: reverse frameshift character

     # File lib/bio/db/gff.rb
1754 def process_sequences_na_aa(reference, target,
1755                             gap_char = '-',
1756                             space_char = ' ',
1757                             forward_frameshift = '>',
1758                             reverse_frameshift = '<')
1759   s_ref, s_tgt = dup_seqs(reference, target)
1760   s_tgt = s_tgt.gsub(/./, "\\0#{space_char}#{space_char}")
1761   ref_increment = 3
1762   tgt_increment = 1 + space_char.length * 2
1763   ref_gap = gap_char * 3
1764   tgt_gap = "#{gap_char}#{space_char}#{space_char}"
1765   return __process_sequences(s_ref, s_tgt,
1766                              ref_gap, tgt_gap,
1767                              ref_increment, tgt_increment,
1768                              forward_frameshift,
1769                              reverse_frameshift)
1770 end
to_s() click to toggle source

string representation

     # File lib/bio/db/gff.rb
1606 def to_s
1607   @data.collect { |x| x.to_s }.join(" ")
1608 end