class Bio::FlatFileIndex

Bio::FlatFileIndex is a class for OBDA flatfile index.

Constants

MAGIC_BDB

magic string for BerkeleyDB/1 index

MAGIC_FLAT

magic string for flat/1 index

Public Class Methods

formatstring2class(format_string) click to toggle source
    # File lib/bio/io/flatfile/indexer.rb
734 def self.formatstring2class(format_string)
735   case format_string
736   when /genbank/i
737     dbclass = Bio::GenBank
738   when /genpept/i
739     dbclass = Bio::GenPept
740   when /embl/i
741     dbclass = Bio::EMBL
742   when /sptr/i
743     dbclass = Bio::SPTR
744   when /fasta/i
745     dbclass = Bio::FastaFormat
746   else
747     raise "Unsupported format : #{format}"
748   end
749 end
makeindex(is_bdb, dbname, format, options, *files) click to toggle source
    # File lib/bio/io/flatfile/indexer.rb
751 def self.makeindex(is_bdb, dbname, format, options, *files)
752   if format then
753     dbclass = formatstring2class(format)
754   else
755     dbclass = Bio::FlatFile.autodetect_file(files[0])
756     raise "Cannot determine format" unless dbclass
757     DEBUG.print "file format is #{dbclass}\n"
758   end
759 
760   options = {} unless options
761   pns = options['primary_namespace']
762   sns = options['secondary_namespaces']
763 
764   parser = Indexer::Parser.new(dbclass, pns, sns)
765 
766   #if /(EMBL|SPTR)/ =~ dbclass.to_s then
767     #a = [ 'DR' ]
768     #parser.add_secondary_namespaces(*a)
769   #end
770   if sns = options['additional_secondary_namespaces'] then
771     parser.add_secondary_namespaces(*sns)
772   end
773 
774   if is_bdb then
775     Indexer::makeindexBDB(dbname, parser, options, *files)
776   else
777     Indexer::makeindexFlat(dbname, parser, options, *files)
778   end
779 end
new(name) click to toggle source

Opens existing databank. Databank is a directory which contains indexed files and configuration files. The type of the databank (flat or BerkeleyDB) are determined automatically.

Unlike FlatFileIndex.open, block is not allowed.

    # File lib/bio/io/flatfile/index.rb
113 def initialize(name)
114   @db = DataBank.open(name)
115 end
open(name) { |i| ... } click to toggle source

Opens existing databank. Databank is a directory which contains indexed files and configuration files. The type of the databank (flat or BerkeleyDB) are determined automatically.

If block is given, the databank object is passed to the block. The databank will be automatically closed when the block terminates.

    # File lib/bio/io/flatfile/index.rb
 88 def self.open(name)
 89   if block_given? then
 90     begin
 91       i = self.new(name)
 92       r = yield i
 93     ensure
 94       if i then
 95         begin
 96           i.close
 97         rescue IOError
 98         end
 99       end
100     end
101   else
102     r = self.new(name)
103   end
104   r
105 end
update_index(dbname, format, options, *files) click to toggle source
    # File lib/bio/io/flatfile/indexer.rb
781 def self.update_index(dbname, format, options, *files)
782   if format then
783     parser = Indexer::Parser.new(dbclass)
784   else
785     parser = nil
786   end
787   Indexer::update_index(dbname, parser, options, *files)
788 end

Public Instance Methods

always_check_consistency(bool) click to toggle source

If true, consistency checks will be performed every time accessing flatfiles. If nil/false, no checks are performed.

By default, always_check_consistency is true.

    # File lib/bio/io/flatfile/index.rb
297 def always_check_consistency(bool)
298   @db.always_check
299 end
always_check_consistency=(bool) click to toggle source

If true is given, consistency checks will be performed every time accessing flatfiles. If nil/false, no checks are performed.

By default, always_check_consistency is true.

    # File lib/bio/io/flatfile/index.rb
288 def always_check_consistency=(bool)
289   @db.always_check=(bool)
290 end
check_consistency() click to toggle source

Check consistency between the databank(index) and original flat files.

If the original flat files are changed after creating the databank, raises RuntimeError.

Note that this check only compares file sizes as described in the OBDA specification.

    # File lib/bio/io/flatfile/index.rb
278 def check_consistency
279   check_closed?
280   @db.check_consistency
281 end
close() click to toggle source

Closes the databank. Returns nil.

    # File lib/bio/io/flatfile/index.rb
132 def close
133   check_closed?
134   @db.close
135   @db = nil
136 end
closed?() click to toggle source

Returns true if already closed. Otherwise, returns false.

    # File lib/bio/io/flatfile/index.rb
139 def closed?
140   if @db then
141     false
142   else
143     true
144   end
145 end
default_namespaces() click to toggle source

Returns default namespaces. Returns an array of strings or nil. nil means all namespaces.

    # File lib/bio/io/flatfile/index.rb
172 def default_namespaces
173   @names
174 end
default_namespaces=(names) click to toggle source

Set default namespaces. default_namespaces = nil means all namespaces in the databank.

default_namespaces= [ str1, str2, ... ] means set default namespeces to str1, str2, …

Default namespaces specified in this method only affect get_by_id, search, and include? methods.

Default of default namespaces is nil (that is, all namespaces are search destinations by default).

    # File lib/bio/io/flatfile/index.rb
160 def default_namespaces=(names)
161   if names then
162     @names = []
163     names.each { |x| @names.push(x.dup) }
164   else
165     @names = nil
166   end
167 end
get_by_id(key) click to toggle source

common interface defined in registry.rb Searching databank and returns entry (or entries) as a string. Multiple entries (contatinated to one string) may be returned. Returns empty string if not found.

    # File lib/bio/io/flatfile/index.rb
122 def get_by_id(key)
123   search(key).to_s
124 end
include?(key) click to toggle source

Searching databank. If some entries are found, returns an array of unique IDs (primary identifiers). If not found anything, returns nil.

This method is useful when search result is very large and search method is very slow.

    # File lib/bio/io/flatfile/index.rb
210 def include?(key)
211   check_closed?
212   if @names then
213     r = @db.search_namespaces_get_unique_id(key, *@names)
214   else
215     r = @db.search_all_get_unique_id(key)
216   end
217   if r.empty? then
218     nil
219   else
220     r
221   end
222 end
include_in_namespaces?(key, *names) click to toggle source

Same as include?, but serching only specified namespaces.

    # File lib/bio/io/flatfile/index.rb
226 def include_in_namespaces?(key, *names)
227   check_closed?
228   r = @db.search_namespaces_get_unique_id(key, *names)
229   if r.empty? then
230     nil
231   else
232     r
233   end
234 end
include_in_primary?(key) click to toggle source

Same as include?, but serching only primary namespace.

    # File lib/bio/io/flatfile/index.rb
238 def include_in_primary?(key)
239   check_closed?
240   r = @db.search_primary_get_unique_id(key)
241   if r.empty? then
242     nil
243   else
244     r
245   end
246 end
namespaces() click to toggle source

Returns names of namespaces defined in the databank. (example: [ ‘LOCUS’, ‘ACCESSION’, ‘VERSION’ ] )

    # File lib/bio/io/flatfile/index.rb
251 def namespaces
252   check_closed?
253   r = secondary_namespaces
254   r.unshift primary_namespace
255   r
256 end
primary_namespace() click to toggle source

Returns name of primary namespace as a string.

    # File lib/bio/io/flatfile/index.rb
259 def primary_namespace
260   check_closed?
261   @db.primary.name
262 end
search_namespaces(key, *names) click to toggle source

Searching only specified namespeces. Returns a Bio::FlatFileIndex::Results object.

    # File lib/bio/io/flatfile/index.rb
189 def search_namespaces(key, *names)
190   check_closed?
191   @db.search_namespaces(key, *names)
192 end
search_primary(key) click to toggle source

Searching only primary namespece. Returns a Bio::FlatFileIndex::Results object.

    # File lib/bio/io/flatfile/index.rb
197 def search_primary(key)
198   check_closed?
199   @db.search_primary(key)
200 end
secondary_namespaces() click to toggle source

Returns names of secondary namespaces as an array of strings.

    # File lib/bio/io/flatfile/index.rb
265 def secondary_namespaces
266   check_closed?
267   @db.secondary.names
268 end