class Bio::FlatFile
Bio::FlatFile
is a helper and wrapper class to read a biological data file. It acts like a IO object. It can automatically detect data format, and users do not need to tell the class what the data is.
Attributes
Returns database class which is automatically detected or given in FlatFile#initialize.
If true, raw mode.
The mode how to skip leader of the data.
- :firsttime
-
(DEFAULT) only head of file (= first time to read)
- :everytime
-
everytime to read entry
- nil
-
never skip
Public Class Methods
Same as Bio::FlatFile.open(nil, filename_or_stream, mode, perm, options)
.
-
Example 1
Bio::FlatFile.auto(ARGF)
-
Example 2
Bio::FlatFile.auto("embl/est_hum17.dat")
-
Example 3
Bio::FlatFile.auto(IO.popen("gzip -dc nc1101.flat.gz"))
# File lib/bio/io/flatfile.rb 122 def self.auto(*arg, &block) 123 self.open(nil, *arg, &block) 124 end
Detects database class (== file format) of given string. If fails to determine, returns false or nil.
# File lib/bio/io/flatfile.rb 460 def self.autodetect(text) 461 AutoDetect.default.autodetect(text) 462 end
Detects database class (== file format) of given file. If fails to determine, returns nil.
# File lib/bio/io/flatfile.rb 440 def self.autodetect_file(filename) 441 self.open_file(filename).dbclass 442 end
Detects database class (== file format) of given input stream. If fails to determine, returns nil. Caution: the method reads some data from the input stream, and the data will be lost.
# File lib/bio/io/flatfile.rb 448 def self.autodetect_io(io) 449 self.new(nil, io).dbclass 450 end
This is OBSOLETED. Please use autodetect_io
(io) instead.
# File lib/bio/io/flatfile.rb 453 def self.autodetect_stream(io) 454 $stderr.print "Bio::FlatFile.autodetect_stream will be deprecated." if $VERBOSE 455 self.autodetect_io(io) 456 end
Executes the block for every entry in the stream. Same as FlatFile.open(*arg)
{ |ff| ff.each { |entry| … }}.
-
Example
Bio::FlatFile.foreach('test.fst') { |e| puts e.definition }
# File lib/bio/io/flatfile.rb 194 def self.foreach(*arg) 195 self.open(*arg) do |flatfileobj| 196 flatfileobj.each do |entry| 197 yield entry 198 end 199 end 200 end
Same as FlatFile.open
, except that ‘stream’ should be a opened stream object (IO, File, …, who have the ‘gets’ method).
-
Example 1
Bio::FlatFile.new(Bio::GenBank, ARGF)
-
Example 2
Bio::FlatFile.new(Bio::GenBank, IO.popen("gzip -dc nc1101.flat.gz"))
Compatibility Note: Now, you cannot specify “:raw => true” or “:raw => false”. Below styles are DEPRECATED.
-
Example 3 (deprecated)
# Bio::FlatFile.new(nil, $stdin, :raw=>true) # => ERROR # Please rewrite as below. ff = Bio::FlatFile.new(nil, $stdin) ff.raw = true
-
Example 3 in old style (deprecated)
# Bio::FlatFile.new(nil, $stdin, true) # => ERROR # Please rewrite as below. ff = Bio::FlatFile.new(nil, $stdin) ff.raw = true
# File lib/bio/io/flatfile.rb 225 def initialize(dbclass, stream) 226 # 2nd arg: IO object 227 if stream.kind_of?(BufferedInputStream) 228 @stream = stream 229 else 230 @stream = BufferedInputStream.for_io(stream) 231 end 232 # 1st arg: database class (or file format autodetection) 233 if dbclass then 234 self.dbclass = dbclass 235 else 236 autodetect 237 end 238 # 239 @skip_leader_mode = :firsttime 240 @firsttime_flag = true 241 # default raw mode is false 242 self.raw = false 243 end
Bio::FlatFile.open(file, *arg) Bio::FlatFile.open(dbclass, file, *arg)
Creates a new Bio::FlatFile
object to read a file or a stream which contains dbclass data.
dbclass should be a class (or module) or nil. e.g. Bio::GenBank
, Bio::FastaFormat
.
If file is a filename (which doesn’t have gets method), the method opens a local file named file with File.open(filename, *arg)
.
When dbclass is omitted or nil is given to dbclass, the method tries to determine database class (file format) automatically. When it fails to determine, dbclass is set to nil and FlatFile#next_entry
would fail. You can still set dbclass using FlatFile#dbclass=
method.
-
Example 1
Bio::FlatFile.open(Bio::GenBank, "genbank/gbest40.seq")
-
Example 2
Bio::FlatFile.open(nil, "embl/est_hum17.dat")
-
Example 3
Bio::FlatFile.open("genbank/gbest40.seq")
-
Example 4
Bio::FlatFile.open(Bio::GenBank, $stdin)
If it is called with a block, the block will be executed with a new Bio::FlatFile
object. If filename is given, the file is automatically closed when leaving the block.
-
Example 5
Bio::FlatFile.open(nil, 'test4.fst') do |ff| ff.each { |e| print e.definition, "\n" } end
-
Example 6
Bio::FlatFile.open('test4.fst') do |ff| ff.each { |e| print e.definition, "\n" } end
Compatibility Note: *arg is completely passed to the File.open
and you cannot specify “:raw => true” or “:raw => false”.
# File lib/bio/io/flatfile.rb 80 def self.open(*arg, &block) 81 # FlatFile.open(dbclass, file, mode, perm) 82 # FlatFile.open(file, mode, perm) 83 if arg.size <= 0 84 raise ArgumentError, 'wrong number of arguments (0 for 1)' 85 end 86 x = arg.shift 87 if x.is_a?(Module) then 88 # FlatFile.open(dbclass, filename_or_io, ...) 89 dbclass = x 90 elsif x.nil? then 91 # FlatFile.open(nil, filename_or_io, ...) 92 dbclass = nil 93 else 94 # FlatFile.open(filename, ...) 95 dbclass = nil 96 arg.unshift(x) 97 end 98 if arg.size <= 0 99 raise ArgumentError, 'wrong number of arguments (1 for 2)' 100 end 101 file = arg.shift 102 # check if file is filename or IO object 103 unless file.respond_to?(:gets) 104 # 'file' is a filename 105 _open_file(dbclass, file, *arg, &block) 106 else 107 # 'file' is a IO object 108 ff = self.new(dbclass, file) 109 block_given? ? (yield ff) : ff 110 end 111 end
Same as FlatFile.auto(filename, *arg)
, except that it only accept filename and doesn’t accept IO object. File format is automatically determined.
It can accept a block. If a block is given, it returns the block’s return value. Otherwise, it returns a new FlatFile
object.
# File lib/bio/io/flatfile.rb 144 def self.open_file(filename, *arg) 145 _open_file(nil, filename, *arg) 146 end
Opens URI specified as uri. uri must be a String or URI object. *arg is passed to OpenURI.open_uri or URI#open.
Like FlatFile#open, it can accept a block.
Note that you MUST explicitly require ‘open-uri’. Because open-uri.rb modifies existing class, it isn’t required by default.
# File lib/bio/io/flatfile.rb 177 def self.open_uri(uri, *arg) 178 if block_given? then 179 BufferedInputStream.open_uri(uri, *arg) do |stream| 180 yield self.new(nil, stream) 181 end 182 else 183 stream = BufferedInputStream.open_uri(uri, *arg) 184 self.new(nil, stream) 185 end 186 end
Same as FlatFile.auto(filename_or_stream, *arg)
.to_a
(This method might be OBSOLETED in the future.)
# File lib/bio/io/flatfile.rb 129 def self.to_a(*arg) 130 self.auto(*arg) do |ff| 131 raise 'cannot determine file format' unless ff.dbclass 132 ff.to_a 133 end 134 end
Public Instance Methods
Performs determination of database class (file format). Pre-reads lines
lines for format determination (default 31 lines). If fails, returns nil or false. Otherwise, returns database class.
The method can be called anytime if you want (but not recommended). This might be useful if input file is a mixture of muitiple format data.
# File lib/bio/io/flatfile.rb 429 def autodetect(lines = 31, ad = AutoDetect.default) 430 if r = ad.autodetect_flatfile(self, lines) 431 self.dbclass = r 432 else 433 self.dbclass = nil unless self.dbclass 434 end 435 r 436 end
Closes input stream. (similar to IO#close)
# File lib/bio/io/flatfile.rb 351 def close 352 @stream.close 353 end
Sets database class. Plese use only if autodetect fails.
# File lib/bio/io/flatfile.rb 400 def dbclass=(klass) 401 if klass then 402 @dbclass = klass 403 begin 404 @splitter = @dbclass.flatfile_splitter(@dbclass, @stream) 405 rescue NameError, NoMethodError 406 begin 407 splitter_class = @dbclass::FLATFILE_SPLITTER 408 rescue NameError 409 splitter_class = Splitter::Default 410 end 411 @splitter = splitter_class.new(klass, @stream) 412 end 413 else 414 @dbclass = nil 415 @splitter = nil 416 end 417 end
Iterates over each entry in the flatfile.
-
Example
include Bio ff = FlatFile.open(GenBank, "genbank/gbhtg14.seq") ff.each_entry do |x| puts x.definition end
# File lib/bio/io/flatfile.rb 334 def each_entry 335 while e = self.next_entry 336 yield e 337 end 338 end
(end position of the last entry) + 1
# File lib/bio/io/flatfile.rb 322 def entry_ended_pos 323 @splitter.entry_ended_pos 324 end
a flag to write down entry start and end positions
# File lib/bio/io/flatfile.rb 307 def entry_pos_flag 308 @splitter.entry_pos_flag 309 end
Sets flag to write down entry start and end positions
# File lib/bio/io/flatfile.rb 312 def entry_pos_flag=(x) 313 @splitter.entry_pos_flag = x 314 end
Returns the last raw entry as a string.
# File lib/bio/io/flatfile.rb 302 def entry_raw 303 @splitter.entry 304 end
start position of the last entry
# File lib/bio/io/flatfile.rb 317 def entry_start_pos 318 @splitter.entry_start_pos 319 end
Returns true if input stream is end-of-file. Otherwise, returns false. (Similar to IO#eof?, but may not be equal to io.eof?, because FlatFile
has its own internal buffer.)
# File lib/bio/io/flatfile.rb 380 def eof? 381 @stream.eof? 382 end
Similar to IO#gets. Internal use only. Users should not call it directly.
# File lib/bio/io/flatfile.rb 395 def gets(*arg) 396 @stream.gets(*arg) 397 end
(DEPRECATED) IO object in the flatfile object.
Compatibility Note: Bio::FlatFile#io
is deprecated. Please use Bio::FlatFile#to_io
instead.
# File lib/bio/io/flatfile.rb 255 def io 256 warn "Bio::FlatFile#io is deprecated." 257 @stream.to_io 258 end
Get next entry.
# File lib/bio/io/flatfile.rb 277 def next_entry 278 raise UnknownDataFormatError, 279 'file format auto-detection failed?' unless @dbclass 280 if @skip_leader_mode and 281 ((@firsttime_flag and @skip_leader_mode == :firsttime) or 282 @skip_leader_mode == :everytime) 283 @splitter.skip_leader 284 end 285 if raw then 286 r = @splitter.get_entry 287 else 288 r = @splitter.get_parsed_entry 289 end 290 @firsttime_flag = false 291 return nil unless r 292 if raw then 293 r 294 else 295 @entry = r 296 @entry 297 end 298 end
Pathname, filename or URI (or nil).
# File lib/bio/io/flatfile.rb 268 def path 269 @stream.path 270 end
Returns current position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos. Note that it will not be equal to io.pos, because FlatFile
has its own internal buffer.
# File lib/bio/io/flatfile.rb 361 def pos 362 @stream.pos 363 end
(Not recommended to use it.) Sets position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos=. Note that it will not be equal to io.pos=, because FlatFile
has its own internal buffer.
# File lib/bio/io/flatfile.rb 372 def pos=(p) 373 @stream.pos=(p) 374 end
If true is given, the next_entry
method returns a entry as a text, whereas if false, returns as a parsed object.
# File lib/bio/io/flatfile.rb 386 def raw=(bool) 387 @raw = (bool ? true : false) 388 end
Resets file pointer to the start of the flatfile. (similar to IO#rewind)
# File lib/bio/io/flatfile.rb 343 def rewind 344 r = (@splitter || @stream).rewind 345 @firsttime_flag = true 346 r 347 end
IO object in the flatfile object.
Compatibility Note: Bio::FlatFile#io
is deprecated.
# File lib/bio/io/flatfile.rb 263 def to_io 264 @stream.to_io 265 end