class Bio::FlatFile

Bio::FlatFile is a helper and wrapper class to read a biological data file. It acts like a IO object. It can automatically detect data format, and users do not need to tell the class what the data is.

Attributes

dbclass[R]

Returns database class which is automatically detected or given in FlatFile#initialize.

entry[R]
raw[R]

If true, raw mode.

skip_leader_mode[RW]

The mode how to skip leader of the data.

:firsttime

(DEFAULT) only head of file (= first time to read)

:everytime

everytime to read entry

nil

never skip

Public Class Methods

auto(*arg, &block) click to toggle source

Same as Bio::FlatFile.open(nil, filename_or_stream, mode, perm, options).

  • Example 1

    Bio::FlatFile.auto(ARGF)
    
  • Example 2

    Bio::FlatFile.auto("embl/est_hum17.dat")
    
  • Example 3

    Bio::FlatFile.auto(IO.popen("gzip -dc nc1101.flat.gz"))
    
    # File lib/bio/io/flatfile.rb
122 def self.auto(*arg, &block)
123   self.open(nil, *arg, &block)
124 end
autodetect(text) click to toggle source

Detects database class (== file format) of given string. If fails to determine, returns false or nil.

    # File lib/bio/io/flatfile.rb
460 def self.autodetect(text)
461   AutoDetect.default.autodetect(text)
462 end
autodetect_file(filename) click to toggle source

Detects database class (== file format) of given file. If fails to determine, returns nil.

    # File lib/bio/io/flatfile.rb
440 def self.autodetect_file(filename)
441   self.open_file(filename).dbclass
442 end
autodetect_io(io) click to toggle source

Detects database class (== file format) of given input stream. If fails to determine, returns nil. Caution: the method reads some data from the input stream, and the data will be lost.

    # File lib/bio/io/flatfile.rb
448 def self.autodetect_io(io)
449   self.new(nil, io).dbclass
450 end
autodetect_stream(io) click to toggle source

This is OBSOLETED. Please use autodetect_io(io) instead.

    # File lib/bio/io/flatfile.rb
453 def self.autodetect_stream(io)
454   $stderr.print "Bio::FlatFile.autodetect_stream will be deprecated." if $VERBOSE
455   self.autodetect_io(io)
456 end
foreach(*arg) { |entry| ... } click to toggle source

Executes the block for every entry in the stream. Same as FlatFile.open(*arg) { |ff| ff.each { |entry| … }}.

  • Example

    Bio::FlatFile.foreach('test.fst') { |e| puts e.definition }
    
    # File lib/bio/io/flatfile.rb
194 def self.foreach(*arg)
195   self.open(*arg) do |flatfileobj|
196     flatfileobj.each do |entry|
197       yield entry
198     end
199   end
200 end
new(IO object) click to toggle source

Same as FlatFile.open, except that ‘stream’ should be a opened stream object (IO, File, …, who have the ‘gets’ method).

  • Example 1

    Bio::FlatFile.new(Bio::GenBank, ARGF)
    
  • Example 2

    Bio::FlatFile.new(Bio::GenBank, IO.popen("gzip -dc nc1101.flat.gz"))
    

Compatibility Note: Now, you cannot specify “:raw => true” or “:raw => false”. Below styles are DEPRECATED.

  • Example 3 (deprecated)

    # Bio::FlatFile.new(nil, $stdin, :raw=>true) # => ERROR
    # Please rewrite as below.
    ff = Bio::FlatFile.new(nil, $stdin)
    ff.raw = true
    
  • Example 3 in old style (deprecated)

    # Bio::FlatFile.new(nil, $stdin, true) # => ERROR
    # Please rewrite as below.
    ff = Bio::FlatFile.new(nil, $stdin)
    ff.raw = true
    
    # File lib/bio/io/flatfile.rb
225 def initialize(dbclass, stream)
226   # 2nd arg: IO object
227   if stream.kind_of?(BufferedInputStream)
228     @stream = stream
229   else
230     @stream = BufferedInputStream.for_io(stream)
231   end
232   # 1st arg: database class (or file format autodetection)
233   if dbclass then
234     self.dbclass = dbclass
235   else
236     autodetect
237   end
238   #
239   @skip_leader_mode = :firsttime
240   @firsttime_flag = true
241   # default raw mode is false
242   self.raw = false
243 end
open(*arg) { |ff) : ff| ... } click to toggle source
Bio::FlatFile.open(file, *arg)
Bio::FlatFile.open(dbclass, file, *arg)

Creates a new Bio::FlatFile object to read a file or a stream which contains dbclass data.

dbclass should be a class (or module) or nil. e.g. Bio::GenBank, Bio::FastaFormat.

If file is a filename (which doesn’t have gets method), the method opens a local file named file with File.open(filename, *arg).

When dbclass is omitted or nil is given to dbclass, the method tries to determine database class (file format) automatically. When it fails to determine, dbclass is set to nil and FlatFile#next_entry would fail. You can still set dbclass using FlatFile#dbclass= method.

  • Example 1

    Bio::FlatFile.open(Bio::GenBank, "genbank/gbest40.seq")
    
  • Example 2

    Bio::FlatFile.open(nil, "embl/est_hum17.dat")
    
  • Example 3

    Bio::FlatFile.open("genbank/gbest40.seq")
    
  • Example 4

    Bio::FlatFile.open(Bio::GenBank, $stdin)
    

If it is called with a block, the block will be executed with a new Bio::FlatFile object. If filename is given, the file is automatically closed when leaving the block.

  • Example 5

    Bio::FlatFile.open(nil, 'test4.fst') do |ff|
        ff.each { |e| print e.definition, "\n" }
    end
    
  • Example 6

    Bio::FlatFile.open('test4.fst') do |ff|
        ff.each { |e| print e.definition, "\n" }
    end
    

Compatibility Note: *arg is completely passed to the File.open and you cannot specify “:raw => true” or “:raw => false”.

    # File lib/bio/io/flatfile.rb
 80 def self.open(*arg, &block)
 81   # FlatFile.open(dbclass, file, mode, perm)
 82   # FlatFile.open(file, mode, perm)
 83   if arg.size <= 0
 84     raise ArgumentError, 'wrong number of arguments (0 for 1)'
 85   end
 86   x = arg.shift
 87   if x.is_a?(Module) then
 88     # FlatFile.open(dbclass, filename_or_io, ...)
 89     dbclass = x
 90   elsif x.nil? then
 91     # FlatFile.open(nil, filename_or_io, ...)
 92     dbclass = nil
 93   else
 94     # FlatFile.open(filename, ...)
 95     dbclass = nil
 96     arg.unshift(x)
 97   end
 98   if arg.size <= 0
 99     raise ArgumentError, 'wrong number of arguments (1 for 2)'
100   end
101   file = arg.shift
102   # check if file is filename or IO object
103   unless file.respond_to?(:gets)
104     # 'file' is a filename
105     _open_file(dbclass, file, *arg, &block)
106   else
107     # 'file' is a IO object
108     ff = self.new(dbclass, file)
109     block_given? ? (yield ff) : ff
110   end
111 end
open_file(filename, *arg) click to toggle source

Same as FlatFile.auto(filename, *arg), except that it only accept filename and doesn’t accept IO object. File format is automatically determined.

It can accept a block. If a block is given, it returns the block’s return value. Otherwise, it returns a new FlatFile object.

    # File lib/bio/io/flatfile.rb
144 def self.open_file(filename, *arg)
145   _open_file(nil, filename, *arg)
146 end
open_uri(uri, *arg) { |self| ... } click to toggle source

Opens URI specified as uri. uri must be a String or URI object. *arg is passed to OpenURI.open_uri or URI#open.

Like FlatFile#open, it can accept a block.

Note that you MUST explicitly require ‘open-uri’. Because open-uri.rb modifies existing class, it isn’t required by default.

    # File lib/bio/io/flatfile.rb
177 def self.open_uri(uri, *arg)
178   if block_given? then
179     BufferedInputStream.open_uri(uri, *arg) do |stream|
180       yield self.new(nil, stream)
181     end
182   else
183     stream = BufferedInputStream.open_uri(uri, *arg)
184     self.new(nil, stream)
185   end
186 end
to_a(*arg) click to toggle source

Same as FlatFile.auto(filename_or_stream, *arg).to_a

(This method might be OBSOLETED in the future.)

    # File lib/bio/io/flatfile.rb
129 def self.to_a(*arg)
130   self.auto(*arg) do |ff|
131     raise 'cannot determine file format' unless ff.dbclass
132     ff.to_a
133   end
134 end

Public Instance Methods

autodetect(lines = 31, ad = AutoDetect.default) click to toggle source

Performs determination of database class (file format). Pre-reads lines lines for format determination (default 31 lines). If fails, returns nil or false. Otherwise, returns database class.

The method can be called anytime if you want (but not recommended). This might be useful if input file is a mixture of muitiple format data.

    # File lib/bio/io/flatfile.rb
429 def autodetect(lines = 31, ad = AutoDetect.default)
430   if r = ad.autodetect_flatfile(self, lines)
431     self.dbclass = r
432   else
433     self.dbclass = nil unless self.dbclass
434   end
435   r
436 end
close() click to toggle source

Closes input stream. (similar to IO#close)

    # File lib/bio/io/flatfile.rb
351 def close
352   @stream.close
353 end
dbclass=(klass) click to toggle source

Sets database class. Plese use only if autodetect fails.

    # File lib/bio/io/flatfile.rb
400 def dbclass=(klass)
401   if klass then
402     @dbclass = klass
403     begin
404       @splitter = @dbclass.flatfile_splitter(@dbclass, @stream)
405     rescue NameError, NoMethodError
406       begin
407         splitter_class = @dbclass::FLATFILE_SPLITTER
408       rescue NameError
409         splitter_class = Splitter::Default
410       end
411       @splitter = splitter_class.new(klass, @stream)
412     end
413   else
414     @dbclass = nil
415     @splitter = nil
416   end
417 end
each()
Alias for: each_entry
each_entry() { |e| ... } click to toggle source

Iterates over each entry in the flatfile.

  • Example

    include Bio
    ff = FlatFile.open(GenBank, "genbank/gbhtg14.seq")
    ff.each_entry do |x|
      puts x.definition
    end
    
    # File lib/bio/io/flatfile.rb
334 def each_entry
335   while e = self.next_entry
336     yield e
337   end
338 end
Also aliased as: each
entry_ended_pos() click to toggle source

(end position of the last entry) + 1

    # File lib/bio/io/flatfile.rb
322 def entry_ended_pos
323   @splitter.entry_ended_pos
324 end
entry_pos_flag() click to toggle source

a flag to write down entry start and end positions

    # File lib/bio/io/flatfile.rb
307 def entry_pos_flag
308   @splitter.entry_pos_flag
309 end
entry_pos_flag=(x) click to toggle source

Sets flag to write down entry start and end positions

    # File lib/bio/io/flatfile.rb
312 def entry_pos_flag=(x)
313   @splitter.entry_pos_flag = x
314 end
entry_raw() click to toggle source

Returns the last raw entry as a string.

    # File lib/bio/io/flatfile.rb
302 def entry_raw
303   @splitter.entry
304 end
entry_start_pos() click to toggle source

start position of the last entry

    # File lib/bio/io/flatfile.rb
317 def entry_start_pos
318   @splitter.entry_start_pos
319 end
eof?() click to toggle source

Returns true if input stream is end-of-file. Otherwise, returns false. (Similar to IO#eof?, but may not be equal to io.eof?, because FlatFile has its own internal buffer.)

    # File lib/bio/io/flatfile.rb
380 def eof?
381   @stream.eof?
382 end
gets(*arg) click to toggle source

Similar to IO#gets. Internal use only. Users should not call it directly.

    # File lib/bio/io/flatfile.rb
395 def gets(*arg)
396   @stream.gets(*arg)
397 end
io() click to toggle source

(DEPRECATED) IO object in the flatfile object.

Compatibility Note: Bio::FlatFile#io is deprecated. Please use Bio::FlatFile#to_io instead.

    # File lib/bio/io/flatfile.rb
255 def io
256   warn "Bio::FlatFile#io is deprecated."
257   @stream.to_io
258 end
next_entry() click to toggle source

Get next entry.

    # File lib/bio/io/flatfile.rb
277 def next_entry
278   raise UnknownDataFormatError, 
279   'file format auto-detection failed?' unless @dbclass
280   if @skip_leader_mode and
281       ((@firsttime_flag and @skip_leader_mode == :firsttime) or
282          @skip_leader_mode == :everytime)
283     @splitter.skip_leader
284   end
285   if raw then
286     r = @splitter.get_entry
287   else
288     r = @splitter.get_parsed_entry
289   end
290   @firsttime_flag = false
291   return nil unless r
292   if raw then
293     r
294   else
295     @entry = r
296     @entry
297   end
298 end
path() click to toggle source

Pathname, filename or URI (or nil).

    # File lib/bio/io/flatfile.rb
268 def path
269   @stream.path
270 end
pos() click to toggle source

Returns current position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos. Note that it will not be equal to io.pos, because FlatFile has its own internal buffer.

    # File lib/bio/io/flatfile.rb
361 def pos
362   @stream.pos
363 end
pos=(p) click to toggle source

(Not recommended to use it.) Sets position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos=. Note that it will not be equal to io.pos=, because FlatFile has its own internal buffer.

    # File lib/bio/io/flatfile.rb
372 def pos=(p)
373   @stream.pos=(p)
374 end
raw=(bool) click to toggle source

If true is given, the next_entry method returns a entry as a text, whereas if false, returns as a parsed object.

    # File lib/bio/io/flatfile.rb
386 def raw=(bool)
387   @raw = (bool ? true : false)
388 end
rewind() click to toggle source

Resets file pointer to the start of the flatfile. (similar to IO#rewind)

    # File lib/bio/io/flatfile.rb
343 def rewind
344   r = (@splitter || @stream).rewind
345   @firsttime_flag = true
346   r
347 end
to_io() click to toggle source

IO object in the flatfile object.

Compatibility Note: Bio::FlatFile#io is deprecated.

    # File lib/bio/io/flatfile.rb
263 def to_io
264   @stream.to_io
265 end