class Bio::FlatFileIndex

Bio::FlatFileIndex is a class for OBDA flatfile index.

Constants

MAGIC_BDB

magic string for BerkeleyDB/1 index

MAGIC_FLAT

magic string for flat/1 index

Public Class Methods

formatstring2class(format_string) click to toggle source
# File lib/bio/io/flatfile/indexer.rb, line 734
def self.formatstring2class(format_string)
  case format_string
  when /genbank/i
    dbclass = Bio::GenBank
  when /genpept/i
    dbclass = Bio::GenPept
  when /embl/i
    dbclass = Bio::EMBL
  when /sptr/i
    dbclass = Bio::SPTR
  when /fasta/i
    dbclass = Bio::FastaFormat
  else
    raise "Unsupported format : #{format}"
  end
end
makeindex(is_bdb, dbname, format, options, *files) click to toggle source
# File lib/bio/io/flatfile/indexer.rb, line 751
def self.makeindex(is_bdb, dbname, format, options, *files)
  if format then
    dbclass = formatstring2class(format)
  else
    dbclass = Bio::FlatFile.autodetect_file(files[0])
    raise "Cannot determine format" unless dbclass
    DEBUG.print "file format is #{dbclass}\n"
  end

  options = {} unless options
  pns = options['primary_namespace']
  sns = options['secondary_namespaces']

  parser = Indexer::Parser.new(dbclass, pns, sns)

  #if /(EMBL|SPTR)/ =~ dbclass.to_s then
    #a = [ 'DR' ]
    #parser.add_secondary_namespaces(*a)
  #end
  if sns = options['additional_secondary_namespaces'] then
    parser.add_secondary_namespaces(*sns)
  end

  if is_bdb then
    Indexer::makeindexBDB(dbname, parser, options, *files)
  else
    Indexer::makeindexFlat(dbname, parser, options, *files)
  end
end
new(name) click to toggle source

Opens existing databank. Databank is a directory which contains indexed files and configuration files. The type of the databank (flat or BerkeleyDB) are determined automatically.

Unlike FlatFileIndex.open, block is not allowed.

# File lib/bio/io/flatfile/index.rb, line 113
def initialize(name)
  @db = DataBank.open(name)
end
open(name) { |i| ... } click to toggle source

Opens existing databank. Databank is a directory which contains indexed files and configuration files. The type of the databank (flat or BerkeleyDB) are determined automatically.

If block is given, the databank object is passed to the block. The databank will be automatically closed when the block terminates.

# File lib/bio/io/flatfile/index.rb, line 88
def self.open(name)
  if block_given? then
    begin
      i = self.new(name)
      r = yield i
    ensure
      if i then
        begin
          i.close
        rescue IOError
        end
      end
    end
  else
    r = self.new(name)
  end
  r
end
update_index(dbname, format, options, *files) click to toggle source
# File lib/bio/io/flatfile/indexer.rb, line 781
def self.update_index(dbname, format, options, *files)
  if format then
    parser = Indexer::Parser.new(dbclass)
  else
    parser = nil
  end
  Indexer::update_index(dbname, parser, options, *files)
end

Public Instance Methods

always_check_consistency(bool) click to toggle source

If true, consistency checks will be performed every time accessing flatfiles. If nil/false, no checks are performed.

By default, #always_check_consistency is true.

# File lib/bio/io/flatfile/index.rb, line 297
def always_check_consistency(bool)
  @db.always_check
end
always_check_consistency=(bool) click to toggle source

If true is given, consistency checks will be performed every time accessing flatfiles. If nil/false, no checks are performed.

By default, #always_check_consistency is true.

# File lib/bio/io/flatfile/index.rb, line 288
def always_check_consistency=(bool)
  @db.always_check=(bool)
end
check_consistency() click to toggle source

Check consistency between the databank(index) and original flat files.

If the original flat files are changed after creating the databank, raises RuntimeError.

Note that this check only compares file sizes as described in the OBDA specification.

# File lib/bio/io/flatfile/index.rb, line 278
def check_consistency
  check_closed?
  @db.check_consistency
end
close() click to toggle source

Closes the databank. Returns nil.

# File lib/bio/io/flatfile/index.rb, line 132
def close
  check_closed?
  @db.close
  @db = nil
end
closed?() click to toggle source

Returns true if already closed. Otherwise, returns false.

# File lib/bio/io/flatfile/index.rb, line 139
def closed?
  if @db then
    false
  else
    true
  end
end
default_namespaces() click to toggle source

Returns default namespaces. Returns an array of strings or nil. nil means all namespaces.

# File lib/bio/io/flatfile/index.rb, line 172
def default_namespaces
  @names
end
default_namespaces=(names) click to toggle source

Set default namespaces. default_namespaces = nil means all namespaces in the databank.

default_namespaces= [ str1, str2, ... ] means set default namespeces to str1, str2, …

Default namespaces specified in this method only affect get_by_id, search, and include? methods.

Default of default namespaces is nil (that is, all namespaces are search destinations by default).

# File lib/bio/io/flatfile/index.rb, line 160
def default_namespaces=(names)
  if names then
    @names = []
    names.each { |x| @names.push(x.dup) }
  else
    @names = nil
  end
end
get_by_id(key) click to toggle source

common interface defined in registry.rb Searching databank and returns entry (or entries) as a string. Multiple entries (contatinated to one string) may be returned. Returns empty string if not found.

# File lib/bio/io/flatfile/index.rb, line 122
def get_by_id(key)
  search(key).to_s
end
include?(key) click to toggle source

Searching databank. If some entries are found, returns an array of unique IDs (primary identifiers). If not found anything, returns nil.

This method is useful when search result is very large and search method is very slow.

# File lib/bio/io/flatfile/index.rb, line 210
def include?(key)
  check_closed?
  if @names then
    r = @db.search_namespaces_get_unique_id(key, *@names)
  else
    r = @db.search_all_get_unique_id(key)
  end
  if r.empty? then
    nil
  else
    r
  end
end
include_in_namespaces?(key, *names) click to toggle source

Same as include?, but serching only specified namespaces.

# File lib/bio/io/flatfile/index.rb, line 226
def include_in_namespaces?(key, *names)
  check_closed?
  r = @db.search_namespaces_get_unique_id(key, *names)
  if r.empty? then
    nil
  else
    r
  end
end
include_in_primary?(key) click to toggle source

Same as include?, but serching only primary namespace.

# File lib/bio/io/flatfile/index.rb, line 238
def include_in_primary?(key)
  check_closed?
  r = @db.search_primary_get_unique_id(key)
  if r.empty? then
    nil
  else
    r
  end
end
namespaces() click to toggle source

Returns names of namespaces defined in the databank. (example: [ 'LOCUS', 'ACCESSION', 'VERSION' ] )

# File lib/bio/io/flatfile/index.rb, line 251
def namespaces
  check_closed?
  r = secondary_namespaces
  r.unshift primary_namespace
  r
end
primary_namespace() click to toggle source

Returns name of primary namespace as a string.

# File lib/bio/io/flatfile/index.rb, line 259
def primary_namespace
  check_closed?
  @db.primary.name
end
search_namespaces(key, *names) click to toggle source

Searching only specified namespeces. Returns a Bio::FlatFileIndex::Results object.

# File lib/bio/io/flatfile/index.rb, line 189
def search_namespaces(key, *names)
  check_closed?
  @db.search_namespaces(key, *names)
end
search_primary(key) click to toggle source

Searching only primary namespece. Returns a Bio::FlatFileIndex::Results object.

# File lib/bio/io/flatfile/index.rb, line 197
def search_primary(key)
  check_closed?
  @db.search_primary(key)
end
secondary_namespaces() click to toggle source

Returns names of secondary namespaces as an array of strings.

# File lib/bio/io/flatfile/index.rb, line 265
def secondary_namespaces
  check_closed?
  @db.secondary.names
end