class Bio::GFF::GFF3::Record::Gap

Bio:GFF::GFF3::Record::Gap is a class to store data of “Gap” attribute.

Constants

Code: Code is a class to store length of single-letter code.

Attributes

data [R]

Internal data. Users must not use it.

Public Class Methods

new (str = nil)

Source

     # File lib/bio/db/gff.rb
1283 def initialize(str = nil)
1284   if str then
1285     @data = str.split(/ +/).collect do |x|
1286       if /\A([A-Z])([0-9]+)\z/ =~ x.strip then
1287         Code.new($1.intern, $2.to_i)
1288       else
1289         warn "ignored unknown token: #{x}.inspect" if $VERBOSE
1290         nil
1291       end
1292     end
1293     @data.compact!
1294   else
1295     @data = []
1296   end
1297 end

Creates a new Gap object.

Arguments:

str: a formatted string, or nil.

new_from_sequences_na (reference, target, gap_regexp = /[^a-zA-Z]/)

Source

     # File lib/bio/db/gff.rb
1399 def self.new_from_sequences_na(reference, target,
1400                                gap_regexp = /[^a-zA-Z]/)
1401   gap = self.new
1402   gap.instance_eval { 
1403     __initialize_from_sequences_na(reference, target,
1404                                    gap_regexp)
1405   }
1406   gap
1407 end

Creates a new Gap object from given sequence alignment.

Note that sites of which both reference and target are gaps are silently removed.

Arguments:

reference: reference sequence (nucleotide sequence)
target: target sequence (nucleotide sequence)
gap_regexp: regexp to identify gap

new_from_sequences_na_aa (reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</)

Source

     # File lib/bio/db/gff.rb
1595 def self.new_from_sequences_na_aa(reference, target,
1596                                   gap_regexp = /[^a-zA-Z]/,
1597                                   space_regexp = /\s/,
1598                                   forward_frameshift_regexp = /\>/,
1599                                   reverse_frameshift_regexp = /\</)
1600   gap = self.new
1601   gap.instance_eval { 
1602     __initialize_from_sequences_na_aa(reference, target,
1603                                       gap_regexp,
1604                                       space_regexp,
1605                                       forward_frameshift_regexp,
1606                                       reverse_frameshift_regexp)
1607   }
1608   gap
1609 end

Creates a new Gap object from given sequence alignment.

Note that sites of which both reference and target are gaps are silently removed.

For incorrect alignments that break 3:1 rule, gap positions will be moved inside codons, unwanted gaps will be removed, and some forward or reverse frameshift will be inserted.

For example,

atgg-taagac-att
M  V  K  -  I

is treated as:

atggt<aagacatt
M  V  K  >>I

Incorrect combination of frameshift with frameshift or gap may cause undefined behavior.

Forward frameshifts are recomended to be indicated in the target sequence. Reverse frameshifts can be indicated in the reference sequence or the target sequence.

Priority of regular expressions:

space > forward/reverse frameshift > gap

Arguments:

reference: reference sequence (nucleotide sequence)
target: target sequence (amino acid sequence)
gap_regexp: regexp to identify gap
space_regexp: regexp to identify space character which is completely ignored
forward_frameshift_regexp: regexp to identify forward frameshift
reverse_frameshift_regexp: regexp to identify reverse frameshift

parse (str)

Source

     # File lib/bio/db/gff.rb
1300 def self.parse(str)
1301   self.new(str)
1302 end

Same as new(str).

Public Instance Methods

== (other)

Source

     # File lib/bio/db/gff.rb
1623 def ==(other)
1624   if other.class == self.class and
1625       @data == other.data then
1626     true
1627   else
1628     false
1629   end
1630 end

If self == other, returns true. otherwise, returns false.

process_sequences_na (reference, target, gap_char = '-')

Source

     # File lib/bio/db/gff.rb
1723 def process_sequences_na(reference, target, gap_char = '-')
1724   s_ref, s_tgt = dup_seqs(reference, target)
1725 
1726   s_ref, s_tgt = __process_sequences(s_ref, s_tgt,
1727                                      gap_char, gap_char,
1728                                      1, 1,
1729                                      gap_char, gap_char)
1730 
1731   if $VERBOSE and s_ref.length != s_tgt.length then
1732     warn "returned sequences not equal length"
1733   end
1734   return s_ref, s_tgt
1735 end

Processes nucleotide sequences and returns gapped sequences as an array of sequences.

Note for forward/reverse frameshift: Forward/Reverse_frameshift is simply treated as gap insertion to the target/reference sequence.

Arguments:

reference: reference sequence (nucleotide sequence)
target: target sequence (nucleotide sequence)
gap_char: gap character

process_sequences_na_aa (reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<')

Source

     # File lib/bio/db/gff.rb
1760 def process_sequences_na_aa(reference, target,
1761                             gap_char = '-',
1762                             space_char = ' ',
1763                             forward_frameshift = '>',
1764                             reverse_frameshift = '<')
1765   s_ref, s_tgt = dup_seqs(reference, target)
1766   s_tgt = s_tgt.gsub(/./, "\\0#{space_char}#{space_char}")
1767   ref_increment = 3
1768   tgt_increment = 1 + space_char.length * 2
1769   ref_gap = gap_char * 3
1770   tgt_gap = "#{gap_char}#{space_char}#{space_char}"
1771   return __process_sequences(s_ref, s_tgt,
1772                              ref_gap, tgt_gap,
1773                              ref_increment, tgt_increment,
1774                              forward_frameshift,
1775                              reverse_frameshift)
1776 end

Processes sequences and returns gapped sequences as an array of sequences. reference must be a nucleotide sequence, and target must be an amino acid sequence.

Note for reverse frameshift: Reverse_frameshift characers are inserted in the reference sequence. For example, alignment of “Gap=M3 R1 M2” is:

atgaagat<aatgtc
M  K  I  N  V

Alignment of “Gap=M3 R3 M3” is:

atgaag<<<attaatgtc
M  K  I  I  N  V

Arguments:

reference: reference sequence (nucleotide sequence)
target: target sequence (amino acid sequence)
gap_char: gap character
space_char: space character inserted to amino sequence for matching na-aa alignment
forward_frameshift: forward frameshift character
reverse_frameshift: reverse frameshift character

to_s ()

Source

     # File lib/bio/db/gff.rb
1612 def to_s
1613   @data.collect { |x| x.to_s }.join(" ")
1614 end

string representation