class Bio::RestrictionEnzyme::Analysis

Public Class Methods

cut( sequence, *args ) click to toggle source

See cut instance method

# File lib/bio/util/restriction_enzyme/analysis.rb, line 23
def self.cut( sequence, *args )
  self.new.cut( sequence, *args )
end
cut_without_permutations( sequence, *args ) click to toggle source

See ::cut_without_permutations instance method

# File lib/bio/util/restriction_enzyme/analysis_basic.rb, line 21
def self.cut_without_permutations( sequence, *args )
  self.new.cut_without_permutations( sequence, *args )
end

Public Instance Methods

cut( sequence, *args ) click to toggle source

See main documentation for Bio::RestrictionEnzyme

cut takes into account permutations of cut variations based on competitiveness of enzymes for an enzyme cutsite or enzyme bindsite on a sequence.

Example:

FIXME add output

Bio::RestrictionEnzyme::Analysis.cut('gaattc', 'EcoRI')

_same as:_

Bio::RestrictionEnzyme::Analysis.cut('gaattc', 'g^aattc')

Arguments

  • sequence: String kind of object that will be used as a nucleic acid sequence.

  • args: Series of enzyme names, enzymes sequences with cut marks, or RestrictionEnzyme objects.

Returns

Bio::RestrictionEnzyme::Fragments object populated with Bio::RestrictionEnzyme::Fragment objects. (Note: unrelated to Bio::RestrictionEnzyme::Range::SequenceRange::Fragments) or a Symbol containing an error code

# File lib/bio/util/restriction_enzyme/analysis.rb, line 48
def cut( sequence, *args )
  view_ranges = false
  
  args.select { |i| i.class == Hash }.each do |hsh|
    hsh.each do |key, value|
      if key == :view_ranges
        unless ( value.kind_of?(TrueClass) or value.kind_of?(FalseClass) )
          raise ArgumentError, "view_ranges must be set to true or false, currently #{value.inspect}."
        end
        view_ranges = value
      end
    end
  end
  
  res = cut_and_return_by_permutations( sequence, *args )
  return res if res.class == Symbol
  # Format the fragments for the user
  fragments_for_display( res, view_ranges )
end
cut_without_permutations( sequence, *args ) click to toggle source

See main documentation for Bio::RestrictionEnzyme

Bio::RestrictionEnzyme.cut is preferred over this!

USE AT YOUR OWN RISK

This is a simpler version of method cut. cut takes into account permutations of cut variations based on competitiveness of enzymes for an enzyme cutsite or enzyme bindsite on a sequence. This does not take into account those possibilities and is therefore faster, but less likely to be accurate.

This code is mainly included as an academic example without having to wade through the extra layer of complexity added by the permutations.

Example:

FIXME add output

Bio::RestrictionEnzyme::Analysis.cut_without_permutations('gaattc', 'EcoRI')

_same as:_

Bio::RestrictionEnzyme::Analysis.cut_without_permutations('gaattc', 'g^aattc')

Arguments

  • sequence: String kind of object that will be used as a nucleic acid sequence.

  • args: Series of enzyme names, enzymes sequences with cut marks, or RestrictionEnzyme objects.

Returns

Bio::RestrictionEnzyme::Fragments object populated with Bio::RestrictionEnzyme::Fragment objects. (Note: unrelated to Bio::RestrictionEnzyme::Range::SequenceRange::Fragments)

# File lib/bio/util/restriction_enzyme/analysis_basic.rb, line 55
def cut_without_permutations( sequence, *args )
  return fragments_for_display( {} ) if !sequence.kind_of?(String) or sequence.empty?
  sequence = Bio::Sequence::NA.new( sequence )

  # create_enzyme_actions returns two seperate array elements, they're not
  # needed separated here so we put them into one array
  enzyme_actions = create_enzyme_actions( sequence, *args ).flatten
  return fragments_for_display( {} ) if enzyme_actions.empty?
  
  # Primary and complement strands are both measured from '0' to 'sequence.size-1' here
  sequence_range = Bio::RestrictionEnzyme::Range::SequenceRange.new( 0, 0, sequence.size-1, sequence.size-1 )
  
  # Add the cuts to the sequence_range from each enzyme_action
  enzyme_actions.each do |enzyme_action|
    enzyme_action.cut_ranges.each do |cut_range|
      sequence_range.add_cut_range(cut_range)
    end
  end

  # Fill in the source sequence for sequence_range so it knows what bases
  # to use
  sequence_range.fragments.primary = sequence
  sequence_range.fragments.complement = sequence.forward_complement
  
  # Format the fragments for the user
  fragments_for_display( {0 => sequence_range} )
end

Protected Instance Methods

create_enzyme_actions( sequence, *args ) click to toggle source

Creates an array of EnzymeActions based on the DNA sequence and supplied enzymes.


Arguments

  • sequence: The string of DNA to match the enzyme recognition sites against

  • args

    The enzymes to use.

Returns

Array with the first element being an array of EnzymeAction objects that sometimes_cut, and are subject to competition. The second is an array of EnzymeAction objects that always_cut and are not subject to competition.

# File lib/bio/util/restriction_enzyme/analysis_basic.rb, line 120
def create_enzyme_actions( sequence, *args )
  all_enzyme_actions = []
  
  args.each do |enzyme|
    enzyme = Bio::RestrictionEnzyme.new(enzyme) unless enzyme.class == Bio::RestrictionEnzyme::DoubleStranded

    # make sure pattern is the proper size
    # for more info see the internal documentation of 
    # Bio::RestrictionEnzyme::DoubleStranded.create_action_at
    pattern = Bio::Sequence::NA.new(
      Bio::RestrictionEnzyme::DoubleStranded::AlignedStrands.align(
        enzyme.primary, enzyme.complement
      ).primary
    ).to_re
    
    find_match_locations( sequence, pattern ).each do |offset|
      all_enzyme_actions << enzyme.create_action_at( offset )
    end
  end
  
  # FIXME VerticalCutRange should really be called VerticalAndHorizontalCutRange
  
  # * all_enzyme_actions is now full of EnzymeActions at specific locations across 
  #   the sequence.
  # * all_enzyme_actions will now be examined to see if any EnzymeActions may
  #   conflict with one another, and if they do they'll be made note of in
  #   indicies_of_sometimes_cut.  They will then be remove FIXME
  # * a conflict occurs if another enzyme's bind site is compromised do due
  #   to another enzyme's cut.  Enzyme's bind sites may overlap and not be
  #   competitive, however neither bind site may be part of the other
  #   enzyme's cut or else they do become competitive.
  #
  # Take current EnzymeAction's entire bind site and compare it to all other
  # EzymeAction's cut ranges.  Only look for vertical cuts as boundaries
  # since trailing horizontal cuts would have no influence on the bind site.
  #
  # If example Enzyme A makes this cut pattern (cut range 2..5):
  #
  # 0 1 2|3 4 5 6 7
  #      +-----+
  # 0 1 2 3 4 5|6 7
  #
  # Then the bind site (and EnzymeAction range) for Enzyme B would need it's
  # right side to be at index 2 or less, or it's left side to be 6 or greater.
  
  competition_indexes = Set.new

  all_enzyme_actions[0..-2].each_with_index do |current_enzyme_action, i|
    next if competition_indexes.include? i
    next if current_enzyme_action.cut_ranges.empty?  # no cuts, some enzymes are like this (ex. CjuI)
    
    all_enzyme_actions[i+1..-1].each_with_index do |comparison_enzyme_action, j|
      j += (i + 1)
      next if competition_indexes.include? j
      next if comparison_enzyme_action.cut_ranges.empty?  # no cuts
      
      if (current_enzyme_action.right <= comparison_enzyme_action.cut_ranges.min_vertical) or
         (current_enzyme_action.left > comparison_enzyme_action.cut_ranges.max_vertical)
        # no conflict
      else
        competition_indexes += [i, j] # merge both indexes into the flat set
      end
    end
  end
      
  sometimes_cut = all_enzyme_actions.values_at( *competition_indexes )
  always_cut = all_enzyme_actions
  always_cut.delete_if {|x| sometimes_cut.include? x }

  [sometimes_cut, always_cut]
end
cut_and_return_by_permutations( sequence, *args ) click to toggle source

See cut instance method


Arguments

  • sequence: String kind of object that will be used as a nucleic acid sequence.

  • args: Series of enzyme names, enzymes sequences with cut marks, or RestrictionEnzyme objects.

May also supply a Hash with the key “:max_permutations” to specificy how many permutations are allowed - a value of 0 indicates no permutations are allowed.

Returns

Hash Keys are a permutation ID, values are SequenceRange objects that have cuts applied.

also may return the Symbol ':sequence_empty', ':no_cuts_found', or ':too_many_permutations'

# File lib/bio/util/restriction_enzyme/analysis.rb, line 81
def cut_and_return_by_permutations( sequence, *args )
  my_hash = {}
  maximum_permutations = nil

  hashes_in_args = args.select { |i| i.class == Hash }
  args.delete_if { |i| i.class == Hash }
  hashes_in_args.each do |hsh|
    hsh.each do |key, value|
      case key
      when :max_permutations, 'max_permutations', :maximum_permutations, 'maximum_permutations'
        maximum_permutations = value.to_i unless value == nil
      when :view_ranges
      else
        raise ArgumentError, "Received key #{key.inspect} in argument - I only know the key ':max_permutations' and ':view_ranges' currently.  Hash passed: #{hsh.inspect}"
      end
    end
  end
  
  if !sequence.kind_of?(String) or sequence.empty?
    logger.warn "The supplied sequence is empty." if defined?(logger)
    return :sequence_empty
  end
  sequence = Bio::Sequence::NA.new( sequence )
  
  enzyme_actions, initial_cuts = create_enzyme_actions( sequence, *args )

  if enzyme_actions.empty? and initial_cuts.empty?
    logger.warn "This enzyme does not make any cuts on this sequence." if defined?(logger)
    return :no_cuts_found
  end

  # * When enzyme_actions.size is equal to '1' that means there are no permutations.
  # * If enzyme_actions.size is equal to '2' there is one
  #   permutation ("[0, 1]")
  # * If enzyme_actions.size is equal to '3' there are two
  #   permutations ("[0, 1, 2]")
  # * and so on..
  if maximum_permutations and enzyme_actions.size > 1
    if (enzyme_actions.size - 1) > maximum_permutations.to_i
      logger.warn "More permutations than maximum, skipping.  Found: #{enzyme_actions.size-1}  Max: #{maximum_permutations.to_i}" if defined?(logger)
      return :too_many_permutations
    end
  end
  
  if enzyme_actions.size > 1
    permutations = permute(enzyme_actions.size)
    
    permutations.each do |permutation|
      previous_cut_ranges = []
      # Primary and complement strands are both measured from '0' to 'sequence.size-1' here
      sequence_range = Bio::RestrictionEnzyme::Range::SequenceRange.new( 0, 0, sequence.size-1, sequence.size-1 )

      # Add the cuts to the sequence_range from each enzyme_action contained
      # in initial_cuts.  These are the cuts that have no competition so are
      # not subject to permutations.
      initial_cuts.each do |enzyme_action|
        enzyme_action.cut_ranges.each do |cut_range|
          sequence_range.add_cut_range(cut_range)
        end
      end

      permutation.each do |id|
        enzyme_action = enzyme_actions[id]

        # conflict is false if the current enzyme action may cut in it's range.
        # conflict is true if it cannot due to a previous enzyme action making
        # a cut where this enzyme action needs a whole recognition site.
        conflict = false

        # If current size of enzyme_action overlaps with previous cut_range, don't cut
        # note that the enzyme action may fall in the middle of a previous enzyme action
        # so all cut locations must be checked that would fall underneath.
        previous_cut_ranges.each do |cut_range|
          next unless cut_range.class == Bio::RestrictionEnzyme::Range::VerticalCutRange  # we aren't concerned with horizontal cuts
          previous_cut_left = cut_range.range.first 
          previous_cut_right = cut_range.range.last

          # Keep in mind: 
          # * The cut location is to the immediate right of the base located at the index.
          #   ex: at^gc -- the cut location is at index 1
          # * The enzyme action location is located at the base of the index.
          #   ex: atgc -- 0 => 'a', 1 => 't', 2 => 'g', 3 => 'c'
          # method create_enzyme_actions has similar commentary if interested
          if (enzyme_action.right <= previous_cut_left) or
             (enzyme_action.left > previous_cut_right) or
             (enzyme_action.left > previous_cut_left and enzyme_action.right <= previous_cut_right) # in between cuts
            # no conflict
          else
            conflict = true
          end
        end

        next if conflict == true
        enzyme_action.cut_ranges.each { |cut_range| sequence_range.add_cut_range(cut_range) }
        previous_cut_ranges += enzyme_action.cut_ranges        
      end # permutation.each

      # Fill in the source sequence for sequence_range so it knows what bases
      # to use
      sequence_range.fragments.primary = sequence
      sequence_range.fragments.complement = sequence.forward_complement
      my_hash[permutation] = sequence_range
    end # permutations.each
    
  else # if enzyme_actions.size == 1
    # no permutations, just do it
    sequence_range = Bio::RestrictionEnzyme::Range::SequenceRange.new( 0, 0, sequence.size-1, sequence.size-1 )
    initial_cuts.each { |enzyme_action| enzyme_action.cut_ranges.each { |cut_range| sequence_range.add_cut_range(cut_range) } }
    sequence_range.fragments.primary = sequence
    sequence_range.fragments.complement = sequence.forward_complement
    my_hash[0] = sequence_range
  end

  my_hash
end
find_match_locations( string, re ) click to toggle source

Returns an Array of the match indicies of a RegExp to a string.

Example:

find_match_locations('abccdefeg', /[ce]/) # => [2,3,5,7]

Arguments

  • string: The string to scan

  • re: A RegExp to use

Returns

Array with indicies of match locations

# File lib/bio/util/restriction_enzyme/analysis_basic.rb, line 203
def find_match_locations( string, re )
  md = string.match( re )
  locations = []
  counter = 0
  while md
    # save the match index relative to the original string
    locations << (counter += md.begin(0))
    # find the next match
    md = string[ (counter += 1)..-1 ].match( re )
  end
  locations
end
fragments_for_display( hsh, view_ranges=false ) click to toggle source

Take the fragments from SequenceRange objects generated from add_cut_range and return unique results as a Bio::RestrictionEnzyme::Analysis::Fragment object.


Arguments

  • hsh: Hash Keys are a permutation ID, if any. Values are SequenceRange objects that have cuts applied.

Returns

Bio::RestrictionEnzyme::Analysis::Fragments object populated with Bio::RestrictionEnzyme::Analysis::Fragment objects.

# File lib/bio/util/restriction_enzyme/analysis_basic.rb, line 94
def fragments_for_display( hsh, view_ranges=false )
  ary = Fragments.new
  return ary unless hsh

  hsh.each do |permutation_id, sequence_range|
    sequence_range.fragments.for_display.each do |fragment|
      if view_ranges
        ary << Bio::RestrictionEnzyme::Fragment.new(fragment.primary, fragment.complement, fragment.p_left, fragment.p_right, fragment.c_left, fragment.c_right)
      else
        ary << Bio::RestrictionEnzyme::Fragment.new(fragment.primary, fragment.complement)
      end
    end
  end
  
  ary.uniq! unless view_ranges
  
  ary
end
permute(count, permutations = [[0]]) click to toggle source

Returns permutation orders for a given number of elements.

Examples:

permute(0) # => [[0]]
permute(1) # => [[0]]
permute(2) # => [[1, 0], [0, 1]]
permute(3) # => [[2, 1, 0], [2, 0, 1], [1, 2, 0], [0, 2, 1], [1, 0, 2], [0, 1, 2]]
permute(4) # => [[3, 2, 1, 0],
                 [3, 2, 0, 1],
                 [3, 1, 2, 0],
                 [3, 0, 2, 1],
                 [3, 1, 0, 2],
                 [3, 0, 1, 2],
                 [2, 3, 1, 0],
                 [2, 3, 0, 1],
                 [1, 3, 2, 0],
                 [0, 3, 2, 1],
                 [1, 3, 0, 2],
                 [0, 3, 1, 2],
                 [2, 1, 3, 0],
                 [2, 0, 3, 1],
                 [1, 2, 3, 0],
                 [0, 2, 3, 1],
                 [1, 0, 3, 2],
                 [0, 1, 3, 2],
                 [2, 1, 0, 3],
                 [2, 0, 1, 3],
                 [1, 2, 0, 3],
                 [0, 2, 1, 3],
                 [1, 0, 2, 3],
                 [0, 1, 2, 3]]

Arguments

  • count: Number of different elements to be permuted

  • permutations: ignore - for the recursive algorithm

Returns

Array of Array objects with different possible permutation orders. See examples.

# File lib/bio/util/restriction_enzyme/analysis.rb, line 235
def permute(count, permutations = [[0]])
  return permutations if count <= 1
  new_arrays = []
  new_array = []

  (permutations[0].size + 1).times do |n|
    new_array.clear
    permutations.each { |a| new_array << a.dup }
    new_array.each { |e| e.insert(n, permutations[0].size) }
    new_arrays += new_array
  end

  permute(count-1, new_arrays)
end