class Bio::FlatFile

Bio::FlatFile is a helper and wrapper class to read a biological data file. It acts like a IO object. It can automatically detect data format, and users do not need to tell the class what the data is.

Attributes

dbclass[R]

Returns database class which is automatically detected or given in FlatFile#initialize.

entry[R]
raw[R]

If true, raw mode.

skip_leader_mode[RW]

The mode how to skip leader of the data.

:firsttime

(DEFAULT) only head of file (= first time to read)

:everytime

everytime to read entry

nil

never skip

Public Class Methods

auto(*arg, &block) click to toggle source

Same as ::open(nil, filename_or_stream, mode, perm, options).

  • Example 1

    Bio::FlatFile.auto(ARGF)
    
  • Example 2

    Bio::FlatFile.auto("embl/est_hum17.dat")
    
  • Example 3

    Bio::FlatFile.auto(IO.popen("gzip -dc nc1101.flat.gz"))
    
# File lib/bio/io/flatfile.rb, line 122
def self.auto(*arg, &block)
  self.open(nil, *arg, &block)
end
autodetect(text) click to toggle source

Detects database class (== file format) of given string. If fails to determine, returns false or nil.

# File lib/bio/io/flatfile.rb, line 460
def self.autodetect(text)
  AutoDetect.default.autodetect(text)
end
autodetect_file(filename) click to toggle source

Detects database class (== file format) of given file. If fails to determine, returns nil.

# File lib/bio/io/flatfile.rb, line 440
def self.autodetect_file(filename)
  self.open_file(filename).dbclass
end
autodetect_io(io) click to toggle source

Detects database class (== file format) of given input stream. If fails to determine, returns nil. Caution: the method reads some data from the input stream, and the data will be lost.

# File lib/bio/io/flatfile.rb, line 448
def self.autodetect_io(io)
  self.new(nil, io).dbclass
end
autodetect_stream(io) click to toggle source

This is OBSOLETED. Please use ::autodetect_io(io) instead.

# File lib/bio/io/flatfile.rb, line 453
def self.autodetect_stream(io)
  $stderr.print "Bio::FlatFile.autodetect_stream will be deprecated." if $VERBOSE
  self.autodetect_io(io)
end
foreach(*arg) { |entry| ... } click to toggle source

Executes the block for every entry in the stream. Same as ::open { |ff| ff.each { |entry| … }}.

  • Example

    Bio::FlatFile.foreach('test.fst') { |e| puts e.definition }
    
# File lib/bio/io/flatfile.rb, line 194
def self.foreach(*arg)
  self.open(*arg) do |flatfileobj|
    flatfileobj.each do |entry|
      yield entry
    end
  end
end
new(dbclass, stream) click to toggle source

Same as ::open, except that 'stream' should be a opened stream object (IO, File, …, who have the 'gets' method).

  • Example 1

    Bio::FlatFile.new(Bio::GenBank, ARGF)
    
  • Example 2

    Bio::FlatFile.new(Bio::GenBank, IO.popen("gzip -dc nc1101.flat.gz"))
    

Compatibility Note: Now, you cannot specify “:raw => true” or “:raw => false”. Below styles are DEPRECATED.

  • Example 3 (deprecated)

    # Bio::FlatFile.new(nil, $stdin, :raw=>true) # => ERROR
    # Please rewrite as below.
    ff = Bio::FlatFile.new(nil, $stdin)
    ff.raw = true
    
  • Example 3 in old style (deprecated)

    # Bio::FlatFile.new(nil, $stdin, true) # => ERROR
    # Please rewrite as below.
    ff = Bio::FlatFile.new(nil, $stdin)
    ff.raw = true
    
# File lib/bio/io/flatfile.rb, line 225
def initialize(dbclass, stream)
  # 2nd arg: IO object
  if stream.kind_of?(BufferedInputStream)
    @stream = stream
  else
    @stream = BufferedInputStream.for_io(stream)
  end
  # 1st arg: database class (or file format autodetection)
  if dbclass then
    self.dbclass = dbclass
  else
    autodetect
  end
  #
  @skip_leader_mode = :firsttime
  @firsttime_flag = true
  # default raw mode is false
  self.raw = false
end
open(*arg) { |ff) : ff| ... } click to toggle source
Bio::FlatFile.open(file, *arg)
Bio::FlatFile.open(dbclass, file, *arg)

Creates a new Bio::FlatFile object to read a file or a stream which contains dbclass data.

dbclass should be a class (or module) or nil. e.g. Bio::GenBank, Bio::FastaFormat.

If file is a filename (which doesn't have gets method), the method opens a local file named file with File.open(filename, *arg).

When dbclass is omitted or nil is given to dbclass, the method tries to determine database class (file format) automatically. When it fails to determine, dbclass is set to nil and #next_entry would fail. You can still set dbclass using #dbclass= method.

  • Example 1

    Bio::FlatFile.open(Bio::GenBank, "genbank/gbest40.seq")
    
  • Example 2

    Bio::FlatFile.open(nil, "embl/est_hum17.dat")
    
  • Example 3

    Bio::FlatFile.open("genbank/gbest40.seq")
    
  • Example 4

    Bio::FlatFile.open(Bio::GenBank, $stdin)
    

If it is called with a block, the block will be executed with a new Bio::FlatFile object. If filename is given, the file is automatically closed when leaving the block.

  • Example 5

    Bio::FlatFile.open(nil, 'test4.fst') do |ff|
        ff.each { |e| print e.definition, "\n" }
    end
    
  • Example 6

    Bio::FlatFile.open('test4.fst') do |ff|
        ff.each { |e| print e.definition, "\n" }
    end
    

Compatibility Note: *arg is completely passed to the File.open and you cannot specify “:raw => true” or “:raw => false”.

# File lib/bio/io/flatfile.rb, line 80
def self.open(*arg, &block)
  # FlatFile.open(dbclass, file, mode, perm)
  # FlatFile.open(file, mode, perm)
  if arg.size <= 0
    raise ArgumentError, 'wrong number of arguments (0 for 1)'
  end
  x = arg.shift
  if x.is_a?(Module) then
    # FlatFile.open(dbclass, filename_or_io, ...)
    dbclass = x
  elsif x.nil? then
    # FlatFile.open(nil, filename_or_io, ...)
    dbclass = nil
  else
    # FlatFile.open(filename, ...)
    dbclass = nil
    arg.unshift(x)
  end
  if arg.size <= 0
    raise ArgumentError, 'wrong number of arguments (1 for 2)'
  end
  file = arg.shift
  # check if file is filename or IO object
  unless file.respond_to?(:gets)
    # 'file' is a filename
    _open_file(dbclass, file, *arg, &block)
  else
    # 'file' is a IO object
    ff = self.new(dbclass, file)
    block_given? ? (yield ff) : ff
  end
end
open_file(filename, *arg) click to toggle source

Same as ::auto(filename, *arg), except that it only accept filename and doesn't accept IO object. File format is automatically determined.

It can accept a block. If a block is given, it returns the block's return value. Otherwise, it returns a new FlatFile object.

# File lib/bio/io/flatfile.rb, line 144
def self.open_file(filename, *arg)
  _open_file(nil, filename, *arg)
end
open_uri(uri, *arg) { |self| ... } click to toggle source

Opens URI specified as uri. uri must be a String or URI object. *arg is passed to OpenURI.open_uri or URI#open.

Like FlatFile#open, it can accept a block.

Note that you MUST explicitly require 'open-uri'. Because open-uri.rb modifies existing class, it isn't required by default.

# File lib/bio/io/flatfile.rb, line 177
def self.open_uri(uri, *arg)
  if block_given? then
    BufferedInputStream.open_uri(uri, *arg) do |stream|
      yield self.new(nil, stream)
    end
  else
    stream = BufferedInputStream.open_uri(uri, *arg)
    self.new(nil, stream)
  end
end
to_a(*arg) click to toggle source

Same as ::auto(filename_or_stream, *arg).to_a

(This method might be OBSOLETED in the future.)

# File lib/bio/io/flatfile.rb, line 129
def self.to_a(*arg)
  self.auto(*arg) do |ff|
    raise 'cannot determine file format' unless ff.dbclass
    ff.to_a
  end
end

Public Instance Methods

autodetect(lines = 31, ad = AutoDetect.default) click to toggle source

Performs determination of database class (file format). Pre-reads lines lines for format determination (default 31 lines). If fails, returns nil or false. Otherwise, returns database class.

The method can be called anytime if you want (but not recommended). This might be useful if input file is a mixture of muitiple format data.

# File lib/bio/io/flatfile.rb, line 429
def autodetect(lines = 31, ad = AutoDetect.default)
  if r = ad.autodetect_flatfile(self, lines)
    self.dbclass = r
  else
    self.dbclass = nil unless self.dbclass
  end
  r
end
close() click to toggle source

Closes input stream. (similar to IO#close)

# File lib/bio/io/flatfile.rb, line 351
def close
  @stream.close
end
dbclass=(klass) click to toggle source

Sets database class. Plese use only if autodetect fails.

# File lib/bio/io/flatfile.rb, line 400
def dbclass=(klass)
  if klass then
    @dbclass = klass
    begin
      @splitter = @dbclass.flatfile_splitter(@dbclass, @stream)
    rescue NameError, NoMethodError
      begin
        splitter_class = @dbclass::FLATFILE_SPLITTER
      rescue NameError
        splitter_class = Splitter::Default
      end
      @splitter = splitter_class.new(klass, @stream)
    end
  else
    @dbclass = nil
    @splitter = nil
  end
end
each()
Alias for: each_entry
each_entry() { |e| ... } click to toggle source

Iterates over each entry in the flatfile.

  • Example

    include Bio
    ff = FlatFile.open(GenBank, "genbank/gbhtg14.seq")
    ff.each_entry do |x|
      puts x.definition
    end
    
# File lib/bio/io/flatfile.rb, line 334
def each_entry
  while e = self.next_entry
    yield e
  end
end
Also aliased as: each
entry_ended_pos() click to toggle source

(end position of the last entry) + 1

# File lib/bio/io/flatfile.rb, line 322
def entry_ended_pos
  @splitter.entry_ended_pos
end
entry_pos_flag() click to toggle source

a flag to write down entry start and end positions

# File lib/bio/io/flatfile.rb, line 307
def entry_pos_flag
  @splitter.entry_pos_flag
end
entry_pos_flag=(x) click to toggle source

Sets flag to write down entry start and end positions

# File lib/bio/io/flatfile.rb, line 312
def entry_pos_flag=(x)
  @splitter.entry_pos_flag = x
end
entry_raw() click to toggle source

Returns the last raw entry as a string.

# File lib/bio/io/flatfile.rb, line 302
def entry_raw
  @splitter.entry
end
entry_start_pos() click to toggle source

start position of the last entry

# File lib/bio/io/flatfile.rb, line 317
def entry_start_pos
  @splitter.entry_start_pos
end
eof?() click to toggle source

Returns true if input stream is end-of-file. Otherwise, returns false. (Similar to IO#eof?, but may not be equal to io.eof?, because FlatFile has its own internal buffer.)

# File lib/bio/io/flatfile.rb, line 380
def eof?
  @stream.eof?
end
gets(*arg) click to toggle source

Similar to IO#gets. Internal use only. Users should not call it directly.

# File lib/bio/io/flatfile.rb, line 395
def gets(*arg)
  @stream.gets(*arg)
end
io() click to toggle source

(DEPRECATED) IO object in the flatfile object.

Compatibility Note: #io is deprecated. Please use #to_io instead.

# File lib/bio/io/flatfile.rb, line 255
def io
  warn "Bio::FlatFile#io is deprecated."
  @stream.to_io
end
next_entry() click to toggle source

Get next entry.

# File lib/bio/io/flatfile.rb, line 277
def next_entry
  raise UnknownDataFormatError, 
  'file format auto-detection failed?' unless @dbclass
  if @skip_leader_mode and
      ((@firsttime_flag and @skip_leader_mode == :firsttime) or
         @skip_leader_mode == :everytime)
    @splitter.skip_leader
  end
  if raw then
    r = @splitter.get_entry
  else
    r = @splitter.get_parsed_entry
  end
  @firsttime_flag = false
  return nil unless r
  if raw then
    r
  else
    @entry = r
    @entry
  end
end
path() click to toggle source

Pathname, filename or URI (or nil).

# File lib/bio/io/flatfile.rb, line 268
def path
  @stream.path
end
pos() click to toggle source

Returns current position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos. Note that it will not be equal to io.pos, because FlatFile has its own internal buffer.

# File lib/bio/io/flatfile.rb, line 361
def pos
  @stream.pos
end
pos=(p) click to toggle source

(Not recommended to use it.) Sets position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos=. Note that it will not be equal to io.pos=, because FlatFile has its own internal buffer.

# File lib/bio/io/flatfile.rb, line 372
def pos=(p)
  @stream.pos=(p)
end
raw=(bool) click to toggle source

If true is given, the #next_entry method returns a entry as a text, whereas if false, returns as a parsed object.

# File lib/bio/io/flatfile.rb, line 386
def raw=(bool)
  @raw = (bool ? true : false)
end
rewind() click to toggle source

Resets file pointer to the start of the flatfile. (similar to IO#rewind)

# File lib/bio/io/flatfile.rb, line 343
def rewind
  r = (@splitter || @stream).rewind
  @firsttime_flag = true
  r
end
to_io() click to toggle source

IO object in the flatfile object.

Compatibility Note: #io is deprecated.

# File lib/bio/io/flatfile.rb, line 263
def to_io
  @stream.to_io
end