module Bio::Alignment::EnumerableExtension
The module Bio::Alignment::EnumerableExtension
is a set of useful methods for multiple sequence alignment. It can be included by any classes or can be extended to any objects. The classes or objects must have methods defined in Enumerable, and must have the each
method which iterates over each sequence (or string) and yields a sequence (or string) object.
Optionally, if each_seq
method is defined, which iterates over each sequence (or string) and yields each sequence (or string) object, it is used instead of each
.
Note that the each
or each_seq
method would be called multiple times. This means that the module is not suitable for IO objects. In addition, break
would be used in the given block and destructive methods would be used to the sequences.
For Array or Hash objects, you’d better using ArrayExtension
or HashExtension
modules, respectively. They would have built-in each_seq
method and/or some methods would be redefined.
Public Instance Methods
Iterates over each sequence and results running blocks are collected and returns a new alignment as a Bio::Alignment::SequenceArray
object.
Note that it would be redefined if you want to change return value’s class.
# File lib/bio/alignment.rb 445 def alignment_collect 446 a = SequenceArray.new 447 a.set_all_property(get_all_property) 448 each_seq do |str| 449 a << yield(str) 450 end 451 a 452 end
Concatenates the given alignment. align must have each_seq
or each
method.
Returns self.
Note that it is a destructive method.
For Hash, please use it carefully because the order of the sequences is inconstant and key information is completely ignored.
# File lib/bio/alignment.rb 849 def alignment_concat(align) 850 flag = nil 851 a = [] 852 each_seq { |s| a << s } 853 i = 0 854 begin 855 align.each_seq do |seq| 856 flag = true 857 a[i].concat(seq) if a[i] and seq 858 i += 1 859 end 860 return self 861 rescue NoMethodError, ArgumentError => evar 862 raise evar if flag 863 end 864 align.each do |seq| 865 a[i].concat(seq) if a[i] and seq 866 i += 1 867 end 868 self 869 end
Returns the alignment length. Returns the longest length of the sequence in the alignment.
# File lib/bio/alignment.rb 366 def alignment_length 367 maxlen = 0 368 each_seq do |s| 369 x = s.length 370 maxlen = x if x > maxlen 371 end 372 maxlen 373 end
Removes excess gaps in the head of the sequences. If removes nothing, returns nil. Otherwise, returns self.
Note that it is a destructive method.
# File lib/bio/alignment.rb 752 def alignment_lstrip! 753 #(String-like) 754 pos = 0 755 each_site do |a| 756 a.remove_gaps! 757 if a.empty? 758 pos += 1 759 else 760 break 761 end 762 end 763 return nil if pos <= 0 764 each_seq { |s| s[0, pos] = '' } 765 self 766 end
Fills gaps to the tail of each sequence if the length of the sequence is shorter than the alignment length.
Note that it is a destructive method.
# File lib/bio/alignment.rb 712 def alignment_normalize! 713 #(original) 714 len = alignment_length 715 each_seq do |s| 716 s << (gap_char * (len - s.length)) if s.length < len 717 end 718 self 719 end
Removes excess gaps in the tail of the sequences. If removes nothing, returns nil. Otherwise, returns self.
Note that it is a destructive method.
# File lib/bio/alignment.rb 727 def alignment_rstrip! 728 #(String-like) 729 len = alignment_length 730 newlen = len 731 each_site_step(len - 1, 0, -1) do |a| 732 a.remove_gaps! 733 if a.empty? then 734 newlen -= 1 735 else 736 break 737 end 738 end 739 return nil if newlen >= len 740 each_seq do |s| 741 s[newlen..-1] = '' if s.length > newlen 742 end 743 self 744 end
Gets a site of the position. Returns a Bio::Alignment::Site
object.
If the position is out of range, it returns the site of which all are gaps.
# File lib/bio/alignment.rb 403 def alignment_site(position) 404 site = _alignment_site(position) 405 site.set_all_property(get_all_property) 406 site 407 end
Returns the specified range of the alignment. For each sequence, the ‘slice’ method (it may be String#slice, which is the same as String#[]) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray
object.
Unlike alignment_window
method, the result alignment might contain nil.
If you want to change return value’s class, you should redefine alignment_collect
method.
# File lib/bio/alignment.rb 807 def alignment_slice(*arg) 808 #(String-like) 809 #(BioPerl) AlignI::slice like method 810 alignment_collect do |s| 811 s.slice(*arg) 812 end 813 end
Removes excess gaps in the sequences. If removes nothing, returns nil. Otherwise, returns self.
Note that it is a destructive method.
# File lib/bio/alignment.rb 774 def alignment_strip! 775 #(String-like) 776 r = alignment_rstrip! 777 l = alignment_lstrip! 778 (r or l) 779 end
For each sequence, the ‘subseq’ method (Bio::Seqeunce::Common#subseq is expected) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray
object.
All sequences in the alignment are expected to be kind of Bio::Sequence::NA
or Bio::Sequence::AA
objects.
Unlike alignment_window
method, the result alignment might contain nil.
If you want to change return value’s class, you should redefine alignment_collect
method.
# File lib/bio/alignment.rb 829 def alignment_subseq(*arg) 830 #(original) 831 alignment_collect do |s| 832 s.subseq(*arg) 833 end 834 end
Returns specified range of the alignment. For each sequence, the ‘[]’ method (it may be String#[]) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray
object.
Unlike alignment_slice
method, the result alignment are guaranteed to contain String object if the range specified is out of range.
If you want to change return value’s class, you should redefine alignment_collect
method.
# File lib/bio/alignment.rb 466 def alignment_window(*arg) 467 alignment_collect do |s| 468 s[*arg] or seqclass.new('') 469 end 470 end
Iterates over each site of the alignment and results running the block are collected and returns an array. It yields a Bio::Alignment::Site
object.
# File lib/bio/alignment.rb 503 def collect_each_site 504 ary = [] 505 each_site do |site| 506 ary << yield(site) 507 end 508 ary 509 end
Helper method for calculating consensus sequence. It iterates over each site of the alignment. In each site, gaps will be removed if specified with opt. It yields a Bio::Alignment::Site
object. Results running the block (String objects are expected) are joined to a string and it returns the string.
opt[:gap_mode] ==> 0 -- gaps are regarded as normal characters 1 -- a site within gaps is regarded as a gap -1 -- gaps are eliminated from consensus calculation default: 0
# File lib/bio/alignment.rb 523 def consensus_each_site(opt = {}) 524 mchar = (opt[:missing_char] or self.missing_char) 525 gap_mode = opt[:gap_mode] 526 case gap_mode 527 when 0, nil 528 collect_each_site do |a| 529 yield(a) or mchar 530 end.join('') 531 when 1 532 collect_each_site do |a| 533 a.has_gap? ? gap_char : (yield(a) or mchar) 534 end.join('') 535 when -1 536 collect_each_site do |a| 537 a.remove_gaps! 538 a.empty? ? gap_char : (yield(a) or mchar) 539 end.join('') 540 else 541 raise ':gap_mode must be 0, 1 or -1' 542 end 543 end
Returns the IUPAC consensus string of the alignment of nucleic-acid sequences.
It resembles the BioPerl’s AlignI::consensus_iupac method.
Please refer to the consensus_each_site
method for opt.
# File lib/bio/alignment.rb 565 def consensus_iupac(opt = {}) 566 consensus_each_site(opt) do |a| 567 a.consensus_iupac 568 end 569 end
Returns the consensus string of the alignment. 0.0 <= threshold <= 1.0 is expected.
It resembles the BioPerl’s AlignI::consensus_string method.
Please refer to the consensus_each_site
method for opt.
# File lib/bio/alignment.rb 552 def consensus_string(threshold = 1.0, opt = {}) 553 consensus_each_site(opt) do |a| 554 a.consensus_string(threshold) 555 end 556 end
This is the BioPerl’s AlignI::match like method.
Changes second to last sequences’ sites to match_char(default: ‘.’) when a site is equeal to the first sequence’s corresponding site.
Note that it is a destructive method.
For Hash, please use it carefully because the order of the sequences is inconstant.
# File lib/bio/alignment.rb 662 def convert_match(match_char = '.') 663 #(BioPerl) AlignI::match like method 664 len = alignment_length 665 firstseq = nil 666 each_seq do |s| 667 unless firstseq then 668 firstseq = s 669 else 670 (0...len).each do |i| 671 if s[i] and firstseq[i] == s[i] and !is_gap?(firstseq[i..i]) 672 s[i..i] = match_char 673 end 674 end 675 end 676 end 677 self 678 end
This is the BioPerl’s AlignI::unmatch like method.
Changes second to last sequences’ sites match_char(default: ‘.’) to original sites’ characters.
Note that it is a destructive method.
For Hash, please use it carefully because the order of the sequences is inconstant.
# File lib/bio/alignment.rb 690 def convert_unmatch(match_char = '.') 691 #(BioPerl) AlignI::unmatch like method 692 len = alignment_length 693 firstseq = nil 694 each_seq do |s| 695 unless firstseq then 696 firstseq = s 697 else 698 (0...len).each do |i| 699 if s[i..i] == match_char then 700 s[i..i] = (firstseq[i..i] or match_char) 701 end 702 end 703 end 704 end 705 self 706 end
Iterates over each sequences. Yields a sequence. It acts the same as Enumerable#each.
You would redefine the method suitable for the class/object.
# File lib/bio/alignment.rb 340 def each_seq(&block) #:yields: seq 341 each(&block) 342 end
Iterates over each site of the alignment. It yields a Bio::Alignment::Site
object (which inherits Array). It returns self.
# File lib/bio/alignment.rb 412 def each_site 413 cp = get_all_property 414 (0...alignment_length).each do |i| 415 site = _alignment_site(i) 416 site.set_all_property(cp) 417 yield(site) 418 end 419 self 420 end
Iterates over each site of the alignment, with specifying start, stop positions and step. It yields Bio::Alignment::Site
object (which inherits Array). It returns self. It is same as start.step(stop, step) { |i| yield alignment_site(i) }
.
# File lib/bio/alignment.rb 428 def each_site_step(start, stop, step = 1) 429 cp = get_all_property 430 start.step(stop, step) do |i| 431 site = _alignment_site(i) 432 site.set_all_property(cp) 433 yield(site) 434 end 435 self 436 end
Iterates over each sliding window of the alignment. window_size is the size of sliding window. step is the step of each sliding. It yields a Bio::Alignment::SequenceArray
object which contains each sliding window. It returns a Bio::Alignment::SequenceArray
object which contains remainder alignment at the terminal end. If window_size is smaller than 0, it returns nil.
# File lib/bio/alignment.rb 481 def each_window(window_size, step_size = 1) 482 return nil if window_size < 0 483 if step_size >= 0 then 484 last_step = nil 485 0.step(alignment_length - window_size, step_size) do |i| 486 yield alignment_window(i, window_size) 487 last_step = i 488 end 489 alignment_window((last_step + window_size)..-1) 490 else 491 i = alignment_length - window_size 492 while i >= 0 493 yield alignment_window(i, window_size) 494 i += step_size 495 end 496 alignment_window(0...(i-step_size)) 497 end 498 end
Returns the match line stirng of the alignment of nucleic- or amino-acid sequences. The type of the sequence is automatically determined or you can specify with opt.
It resembles the BioPerl’s AlignI::match_line method.
opt[:type] ==> :na or :aa (or determined by sequence class) opt[:match_line_char] ==> 100% equal default: '*' opt[:strong_match_char] ==> strong match default: ':' opt[:weak_match_char] ==> weak match default: '.' opt[:mismatch_char] ==> mismatch default: ' ' :strong_ and :weak_match_char are used only in amino mode (:aa)
More opt can be accepted. Please refer to the consensus_each_site
method for opt.
# File lib/bio/alignment.rb 624 def match_line(opt = {}) 625 case opt[:type] 626 when :aa 627 amino = true 628 when :na, :dna, :rna 629 amino = false 630 else 631 if seqclass == Bio::Sequence::AA then 632 amino = true 633 elsif seqclass == Bio::Sequence::NA then 634 amino = false 635 else 636 amino = nil 637 self.each_seq do |x| 638 if /[EFILPQ]/i =~ x 639 amino = true 640 break 641 end 642 end 643 end 644 end 645 if amino then 646 match_line_amino(opt) 647 else 648 match_line_nuc(opt) 649 end 650 end
Returns the match line stirng of the alignment of amino-acid sequences.
It resembles the BioPerl’s AlignI::match_line method.
opt[:match_line_char] ==> 100% equal default: '*' opt[:strong_match_char] ==> strong match default: ':' opt[:weak_match_char] ==> weak match default: '.' opt[:mismatch_char] ==> mismatch default: ' '
More opt can be accepted. Please refer to the consensus_each_site
method for opt.
# File lib/bio/alignment.rb 584 def match_line_amino(opt = {}) 585 collect_each_site do |a| 586 a.match_line_amino(opt) 587 end.join('') 588 end
Returns the match line stirng of the alignment of nucleic-acid sequences.
It resembles the BioPerl’s AlignI::match_line method.
opt[:match_line_char] ==> 100% equal default: '*' opt[:mismatch_char] ==> mismatch default: ' '
More opt can be accepted. Please refer to the consensus_each_site
method for opt.
# File lib/bio/alignment.rb 601 def match_line_nuc(opt = {}) 602 collect_each_site do |a| 603 a.match_line_nuc(opt) 604 end.join('') 605 end
Returns number of sequences in this alignment.
# File lib/bio/alignment.rb 1315 def number_of_sequences 1316 i = 0 1317 self.each_seq { |s| i += 1 } 1318 i 1319 end
Completely removes ALL gaps in the sequences. If removes nothing, returns nil. Otherwise, returns self.
Note that it is a destructive method.
# File lib/bio/alignment.rb 787 def remove_all_gaps! 788 ret = nil 789 each_seq do |s| 790 x = s.gsub!(gap_regexp, '') 791 ret ||= x 792 end 793 ret ? self : nil 794 end
Returns class of the sequence. If instance variable @seqclass (which can be set by ‘seqclass=’ method) is set, simply returns the value. Otherwise, returns the first sequence’s class. If no sequences are found, returns nil.
# File lib/bio/alignment.rb 349 def seqclass 350 if (defined? @seqclass) and @seqclass then 351 @seqclass 352 else 353 klass = nil 354 each_seq do |s| 355 if s then 356 klass = s.class 357 break if klass 358 end 359 end 360 (klass or String) 361 end 362 end
Returns an array of sequence names. The order of the names must be the same as the order of each_seq
.
# File lib/bio/alignment.rb 1324 def sequence_names 1325 (0...(self.number_of_sequences)).to_a 1326 end