module Bio::Alignment::EnumerableExtension

The module Bio::Alignment::EnumerableExtension is a set of useful methods for multiple sequence alignment. It can be included by any classes or can be extended to any objects. The classes or objects must have methods defined in Enumerable, and must have the each method which iterates over each sequence (or string) and yields a sequence (or string) object.

Optionally, if each_seq method is defined, which iterates over each sequence (or string) and yields each sequence (or string) object, it is used instead of each.

Note that the each or each_seq method would be called multiple times. This means that the module is not suitable for IO objects. In addition, break would be used in the given block and destructive methods would be used to the sequences.

For Array or Hash objects, you’d better using ArrayExtension or HashExtension modules, respectively. They would have built-in each_seq method and/or some methods would be redefined.

Public Instance Methods

alignment_collect() { |str| ... } click to toggle source

Iterates over each sequence and results running blocks are collected and returns a new alignment as a Bio::Alignment::SequenceArray object.

Note that it would be redefined if you want to change return value’s class.

    # File lib/bio/alignment.rb
445 def alignment_collect
446   a = SequenceArray.new
447   a.set_all_property(get_all_property)
448   each_seq do |str|
449     a << yield(str)
450   end
451   a
452 end
alignment_concat(align) click to toggle source

Concatenates the given alignment. align must have each_seq or each method.

Returns self.

Note that it is a destructive method.

For Hash, please use it carefully because the order of the sequences is inconstant and key information is completely ignored.

    # File lib/bio/alignment.rb
849 def alignment_concat(align)
850   flag = nil
851   a = []
852   each_seq { |s| a << s }
853   i = 0
854   begin
855     align.each_seq do |seq|
856       flag = true
857       a[i].concat(seq) if a[i] and seq
858       i += 1
859     end
860     return self
861   rescue NoMethodError, ArgumentError => evar
862     raise evar if flag
863   end
864   align.each do |seq|
865     a[i].concat(seq) if a[i] and seq
866     i += 1
867   end
868   self
869 end
alignment_length() click to toggle source

Returns the alignment length. Returns the longest length of the sequence in the alignment.

    # File lib/bio/alignment.rb
366 def alignment_length
367   maxlen = 0
368   each_seq do |s|
369     x = s.length
370     maxlen = x if x > maxlen
371   end
372   maxlen
373 end
Also aliased as: seq_length
alignment_lstrip!() click to toggle source

Removes excess gaps in the head of the sequences. If removes nothing, returns nil. Otherwise, returns self.

Note that it is a destructive method.

    # File lib/bio/alignment.rb
752 def alignment_lstrip!
753   #(String-like)
754   pos = 0
755   each_site do |a|
756     a.remove_gaps!
757     if a.empty?
758       pos += 1
759     else
760       break
761     end
762   end
763   return nil if pos <= 0
764   each_seq { |s| s[0, pos] = '' }
765   self
766 end
Also aliased as: lstrip!
alignment_normalize!() click to toggle source

Fills gaps to the tail of each sequence if the length of the sequence is shorter than the alignment length.

Note that it is a destructive method.

    # File lib/bio/alignment.rb
712 def alignment_normalize!
713   #(original)
714   len = alignment_length
715   each_seq do |s|
716     s << (gap_char * (len - s.length)) if s.length < len
717   end
718   self
719 end
Also aliased as: normalize!
alignment_rstrip!() click to toggle source

Removes excess gaps in the tail of the sequences. If removes nothing, returns nil. Otherwise, returns self.

Note that it is a destructive method.

    # File lib/bio/alignment.rb
727 def alignment_rstrip!
728   #(String-like)
729   len = alignment_length
730   newlen = len
731   each_site_step(len - 1, 0, -1) do |a|
732     a.remove_gaps!
733     if a.empty? then
734       newlen -= 1
735     else
736       break
737     end
738   end
739   return nil if newlen >= len
740   each_seq do |s|
741     s[newlen..-1] = '' if s.length > newlen
742   end
743   self
744 end
Also aliased as: rstrip!
alignment_site(position) click to toggle source

Gets a site of the position. Returns a Bio::Alignment::Site object.

If the position is out of range, it returns the site of which all are gaps.

    # File lib/bio/alignment.rb
403 def alignment_site(position)
404   site = _alignment_site(position)
405   site.set_all_property(get_all_property)
406   site
407 end
alignment_slice(*arg) click to toggle source

Returns the specified range of the alignment. For each sequence, the ‘slice’ method (it may be String#slice, which is the same as String#[]) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray object.

Unlike alignment_window method, the result alignment might contain nil.

If you want to change return value’s class, you should redefine alignment_collect method.

    # File lib/bio/alignment.rb
807 def alignment_slice(*arg)
808   #(String-like)
809   #(BioPerl) AlignI::slice like method
810   alignment_collect do |s|
811     s.slice(*arg)
812   end
813 end
Also aliased as: slice
alignment_strip!() click to toggle source

Removes excess gaps in the sequences. If removes nothing, returns nil. Otherwise, returns self.

Note that it is a destructive method.

    # File lib/bio/alignment.rb
774 def alignment_strip!
775   #(String-like)
776   r = alignment_rstrip!
777   l = alignment_lstrip!
778   (r or l)
779 end
Also aliased as: strip!
alignment_subseq(*arg) click to toggle source

For each sequence, the ‘subseq’ method (Bio::Seqeunce::Common#subseq is expected) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray object.

All sequences in the alignment are expected to be kind of Bio::Sequence::NA or Bio::Sequence::AA objects.

Unlike alignment_window method, the result alignment might contain nil.

If you want to change return value’s class, you should redefine alignment_collect method.

    # File lib/bio/alignment.rb
829 def alignment_subseq(*arg)
830   #(original)
831   alignment_collect do |s|
832     s.subseq(*arg)
833   end
834 end
Also aliased as: subseq
alignment_window(*arg) click to toggle source

Returns specified range of the alignment. For each sequence, the ‘[]’ method (it may be String#[]) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray object.

Unlike alignment_slice method, the result alignment are guaranteed to contain String object if the range specified is out of range.

If you want to change return value’s class, you should redefine alignment_collect method.

    # File lib/bio/alignment.rb
466 def alignment_window(*arg)
467   alignment_collect do |s|
468     s[*arg] or seqclass.new('')
469   end
470 end
Also aliased as: window
collect_each_site() { |site| ... } click to toggle source

Iterates over each site of the alignment and results running the block are collected and returns an array. It yields a Bio::Alignment::Site object.

    # File lib/bio/alignment.rb
503 def collect_each_site
504   ary = []
505   each_site do |site|
506     ary << yield(site)
507   end
508   ary
509 end
consensus_each_site(opt = {}) { |a| ... } click to toggle source

Helper method for calculating consensus sequence. It iterates over each site of the alignment. In each site, gaps will be removed if specified with opt. It yields a Bio::Alignment::Site object. Results running the block (String objects are expected) are joined to a string and it returns the string.

opt[:gap_mode] ==> 0 -- gaps are regarded as normal characters
                   1 -- a site within gaps is regarded as a gap
                  -1 -- gaps are eliminated from consensus calculation
    default: 0
    # File lib/bio/alignment.rb
523 def consensus_each_site(opt = {})
524   mchar = (opt[:missing_char] or self.missing_char)
525   gap_mode = opt[:gap_mode]
526   case gap_mode
527   when 0, nil
528     collect_each_site do |a|
529       yield(a) or mchar
530     end.join('')
531   when 1
532     collect_each_site do |a|
533       a.has_gap? ? gap_char : (yield(a) or mchar)
534     end.join('')
535   when -1
536     collect_each_site do |a|
537       a.remove_gaps!
538       a.empty? ? gap_char : (yield(a) or mchar)
539     end.join('')
540   else
541     raise ':gap_mode must be 0, 1 or -1'
542   end
543 end
consensus_iupac(opt = {}) click to toggle source

Returns the IUPAC consensus string of the alignment of nucleic-acid sequences.

It resembles the BioPerl’s AlignI::consensus_iupac method.

Please refer to the consensus_each_site method for opt.

    # File lib/bio/alignment.rb
565 def consensus_iupac(opt = {})
566   consensus_each_site(opt) do |a|
567     a.consensus_iupac
568   end
569 end
consensus_string(threshold = 1.0, opt = {}) click to toggle source

Returns the consensus string of the alignment. 0.0 <= threshold <= 1.0 is expected.

It resembles the BioPerl’s AlignI::consensus_string method.

Please refer to the consensus_each_site method for opt.

    # File lib/bio/alignment.rb
552 def consensus_string(threshold = 1.0, opt = {})
553   consensus_each_site(opt) do |a|
554     a.consensus_string(threshold)
555   end
556 end
convert_match(match_char = '.') click to toggle source

This is the BioPerl’s AlignI::match like method.

Changes second to last sequences’ sites to match_char(default: ‘.’) when a site is equeal to the first sequence’s corresponding site.

Note that it is a destructive method.

For Hash, please use it carefully because the order of the sequences is inconstant.

    # File lib/bio/alignment.rb
662 def convert_match(match_char = '.')
663   #(BioPerl) AlignI::match like method
664   len = alignment_length
665   firstseq = nil
666   each_seq do |s|
667     unless firstseq then
668       firstseq = s
669     else
670       (0...len).each do |i|
671         if s[i] and firstseq[i] == s[i] and !is_gap?(firstseq[i..i])
672           s[i..i] = match_char
673         end
674       end
675     end
676   end
677   self
678 end
convert_unmatch(match_char = '.') click to toggle source

This is the BioPerl’s AlignI::unmatch like method.

Changes second to last sequences’ sites match_char(default: ‘.’) to original sites’ characters.

Note that it is a destructive method.

For Hash, please use it carefully because the order of the sequences is inconstant.

    # File lib/bio/alignment.rb
690 def convert_unmatch(match_char = '.')
691   #(BioPerl) AlignI::unmatch like method
692   len = alignment_length
693   firstseq = nil
694   each_seq do |s|
695     unless firstseq then
696       firstseq = s
697     else
698       (0...len).each do |i|
699         if s[i..i] == match_char then
700           s[i..i] = (firstseq[i..i] or match_char)
701         end
702       end
703     end
704   end
705   self
706 end
each_seq() { |seq| ... } click to toggle source

Iterates over each sequences. Yields a sequence. It acts the same as Enumerable#each.

You would redefine the method suitable for the class/object.

    # File lib/bio/alignment.rb
340 def each_seq(&block) #:yields: seq
341   each(&block)
342 end
each_site() { |site| ... } click to toggle source

Iterates over each site of the alignment. It yields a Bio::Alignment::Site object (which inherits Array). It returns self.

    # File lib/bio/alignment.rb
412 def each_site
413   cp = get_all_property
414   (0...alignment_length).each do |i|
415     site = _alignment_site(i)
416     site.set_all_property(cp)
417     yield(site)
418   end
419   self
420 end
each_site_step(start, stop, step = 1) { |site| ... } click to toggle source

Iterates over each site of the alignment, with specifying start, stop positions and step. It yields Bio::Alignment::Site object (which inherits Array). It returns self. It is same as start.step(stop, step) { |i| yield alignment_site(i) }.

    # File lib/bio/alignment.rb
428 def each_site_step(start, stop, step = 1)
429   cp = get_all_property
430   start.step(stop, step) do |i|
431     site = _alignment_site(i)
432     site.set_all_property(cp)
433     yield(site)
434   end
435   self
436 end
each_window(window_size, step_size = 1) { |alignment_window(i, window_size)| ... } click to toggle source

Iterates over each sliding window of the alignment. window_size is the size of sliding window. step is the step of each sliding. It yields a Bio::Alignment::SequenceArray object which contains each sliding window. It returns a Bio::Alignment::SequenceArray object which contains remainder alignment at the terminal end. If window_size is smaller than 0, it returns nil.

    # File lib/bio/alignment.rb
481 def each_window(window_size, step_size = 1)
482   return nil if window_size < 0
483   if step_size >= 0 then
484     last_step = nil
485     0.step(alignment_length - window_size, step_size) do |i|
486       yield alignment_window(i, window_size)
487       last_step = i
488     end
489     alignment_window((last_step + window_size)..-1)
490   else
491     i = alignment_length - window_size
492     while i >= 0
493       yield alignment_window(i, window_size)
494       i += step_size
495     end
496     alignment_window(0...(i-step_size))
497   end
498 end
lstrip!()
Alias for: alignment_lstrip!
match_line(opt = {}) click to toggle source

Returns the match line stirng of the alignment of nucleic- or amino-acid sequences. The type of the sequence is automatically determined or you can specify with opt.

It resembles the BioPerl’s AlignI::match_line method.

opt[:type] ==> :na or :aa (or determined by sequence class)
opt[:match_line_char]   ==> 100% equal    default: '*'
opt[:strong_match_char] ==> strong match  default: ':'
opt[:weak_match_char]   ==> weak match    default: '.'
opt[:mismatch_char]     ==> mismatch      default: ' '
  :strong_ and :weak_match_char are used only in amino mode (:aa)

More opt can be accepted. Please refer to the consensus_each_site method for opt.

    # File lib/bio/alignment.rb
624 def match_line(opt = {})
625   case opt[:type]
626   when :aa
627     amino = true
628   when :na, :dna, :rna
629     amino = false
630   else
631     if seqclass == Bio::Sequence::AA then
632       amino = true
633     elsif seqclass == Bio::Sequence::NA then
634       amino = false
635     else
636       amino = nil
637       self.each_seq do |x|
638         if /[EFILPQ]/i =~ x
639           amino = true
640           break
641         end
642       end
643     end
644   end
645   if amino then
646     match_line_amino(opt)
647   else
648     match_line_nuc(opt)
649   end
650 end
match_line_amino(opt = {}) click to toggle source

Returns the match line stirng of the alignment of amino-acid sequences.

It resembles the BioPerl’s AlignI::match_line method.

opt[:match_line_char]   ==> 100% equal    default: '*'
opt[:strong_match_char] ==> strong match  default: ':'
opt[:weak_match_char]   ==> weak match    default: '.'
opt[:mismatch_char]     ==> mismatch      default: ' '

More opt can be accepted. Please refer to the consensus_each_site method for opt.

    # File lib/bio/alignment.rb
584 def match_line_amino(opt = {})
585   collect_each_site do |a|
586     a.match_line_amino(opt)
587   end.join('')
588 end
match_line_nuc(opt = {}) click to toggle source

Returns the match line stirng of the alignment of nucleic-acid sequences.

It resembles the BioPerl’s AlignI::match_line method.

opt[:match_line_char]   ==> 100% equal    default: '*'
opt[:mismatch_char]     ==> mismatch      default: ' '

More opt can be accepted. Please refer to the consensus_each_site method for opt.

    # File lib/bio/alignment.rb
601 def match_line_nuc(opt = {})
602   collect_each_site do |a|
603     a.match_line_nuc(opt)
604   end.join('')
605 end
normalize!()
number_of_sequences() click to toggle source

Returns number of sequences in this alignment.

     # File lib/bio/alignment.rb
1315 def number_of_sequences
1316   i = 0
1317   self.each_seq { |s| i += 1 }
1318   i
1319 end
remove_all_gaps!() click to toggle source

Completely removes ALL gaps in the sequences. If removes nothing, returns nil. Otherwise, returns self.

Note that it is a destructive method.

    # File lib/bio/alignment.rb
787 def remove_all_gaps!
788   ret = nil
789   each_seq do |s|
790     x = s.gsub!(gap_regexp, '')
791     ret ||= x
792   end
793   ret ? self : nil
794 end
rstrip!()
Alias for: alignment_rstrip!
seq_length()
Alias for: alignment_length
seqclass() click to toggle source

Returns class of the sequence. If instance variable @seqclass (which can be set by ‘seqclass=’ method) is set, simply returns the value. Otherwise, returns the first sequence’s class. If no sequences are found, returns nil.

    # File lib/bio/alignment.rb
349 def seqclass
350   if (defined? @seqclass) and @seqclass then
351     @seqclass
352   else
353     klass = nil
354     each_seq do |s|
355       if s then
356         klass = s.class
357         break if klass
358       end
359     end
360     (klass or String)
361   end
362 end
sequence_names() click to toggle source

Returns an array of sequence names. The order of the names must be the same as the order of each_seq.

     # File lib/bio/alignment.rb
1324 def sequence_names
1325   (0...(self.number_of_sequences)).to_a
1326 end
slice(*arg)
Alias for: alignment_slice
strip!()
Alias for: alignment_strip!
subseq(*arg)
Alias for: alignment_subseq
window(*arg)
Alias for: alignment_window