Incompatible and important changes since the BioRuby 1.2.1 release

A lot of changes have been made to the BioRuby after the version 1.2.1 is released.

New features

Support for sequence output with improvements of Bio::Sequence

The outputting of EMBL and GenBank formatted text are now supported in the Bio::Sequence class. See the document of Bio::Sequence#output for details. You can also create Bio::Sequence objects from many kinds of data such as Bio::GenBank, Bio::EMBL, and Bio::FastaFormat by using the to_biosequence method.

BioSQL support

BioSQL support is completely rewritten by using ActiveRecord.

Bio::Blast

Bio::Blast#reports can parse NCBI default (-m 0) format and tabular (-m 8) format, in addition to XML (-m 7) format.

Bio::Blast::Report now supports XML format with multiple query sequences generated by blastall 2.2.14 or later.

Bio::Blast.remote supports DDBJ, in addition to GenomeNet. In addition, a list of available blast databases on remote sites can be obtained by using Bio::Blast::Remote::DDBJ.databases and Bio::Blast::Remote::GenomeNet.databases methods. Note that the above remote blast methods may be changed in the future to support NCBI.

Bio::Blast::RPSBlast::Report is newly added, a parser for NCBI RPS Blast (Reversed Position Specific Blast) default (-m 0 option) results.

Bio::GFF::GFF2 and Bio::GFF::GFF3

The outputting of GFF2/GFF3-formatted text is now supported. However, many incompatible changes have been made (See below for details).

Bio::Hinv

H-Invitational Database web service (REST) client class is newly added.

Bio::NCBI::REST

NCBI E-Utilities client class is newly added.

Bio::PAML::Codeml and Bio::PAML::Codeml::Report

Bio::PAML::Codeml, wrapper for PAML codeml program, and Bio::PAML::Codeml::Report, parser for codeml result are newly added, though some of them are still under construction and too specific to particular use cases.

Bio::Locations

New method Bio::Locations#to_s is added to support output of features.

Bio::TogoWS::REST

TogoWS REST client class is newly added. Information about TogoWS REST service can be found on togows.dbcls.jp/site/en/rest.html.

Deprecated classes

Bio::Features

Bio::Features is obsoleted and changed to an array of Bio::Feature object with some backward compatibility methods. The backward compatibility methods will soon be removed in the future.

Bio::References

Bio::References is obsoleted and changed to an array of Bio::Reference object with some backward compatibility methods. The backward compatibility methods will soon be removed in the future.

Incompatible changes

Bio::BIORUBY_VERSION

Definition of the constant Bio::BIORUBY_VERSION is moved from lib/bio.rb to lib/bio/version.rb. Normally, the autoload mechanism of Ruby correctly loads the version.rb, but special scripts directly using bio.rb may be needed to be changed.

Bio::BIORUBY_VERSION is changed to be frozen.

New constants Bio::BIORUBY_EXTRA_VERSION and Bio::BIORUBY_VERSION_ID are added. See their RDoc for details.

Bio::Sequence

Bio::Sequence#date is removed. Alternatively, date_created or date_modified can be used.

Bio::Sequence#taxonomy is changed to be an alias of classification, and the data type is changed to an array of string.

Bio::Locations and Bio::Location

A carat in a location (e.g. “123^124”) is now parsed, instead of being replaced by “..”. To distinguish from normal “..”, a new attribute Bio::Location#carat is used.

“order(…)” or “group(…)” are also parsed, instead of being regarded as “join(…)”. To distinguish from “join(…)”, a new attribute Bio::Locations#operator is used. When “order(…)” or “group(…)”, the attribute is set to :order or :group, respectively. Note that “group(…)” is already deprecated in EMBL/GenBank/DDBJ.

Bio::Blast

Return value of Bio::Blast#exec_* is changed to String instead of Report object. Parsing the string is now processed in Bio::Blast#query method.

Bio::Blast#exec_genomenet_tab and Bio::Blast#server=“genomenet_tab” is deprecated.

Bio::Blast#options=() can now change the following attributes: program, db, format, matrix, and filter.

Bio::Blast.reports now supports default (-m 0) and tabular (-m 8) formats. Old implementation (only supports XML) is renamed to Bio::Blast.reports_xml, to keep compatibility for older BLAST XML documents which might not be parsed by the new Bio::Blast.reports nor Bio::FlatFile, although we are not sure whether such documents really exist or not.

Bio::Blast::Default::Report and Bio::Blast::WU::Report

Iteration#lambda, kappa, entropy, gapped_lambda, gapped_kappa, and gapped_entropy, and the same methods in the Report class are changed to return float or nil instead of string or nil.

Bio::Blat

When reading BLAT psl (or pslx) data by using Bio::FlatFile, it checks each query name and returns a new entry object when the query name is changed from previous queries. This is, data is stored to two or more Bio::Blat::Report objects, instead of previous version’s behavior (always reads all data at once and stores to a Bio::Blat::Report object).

Bio::GFF, Bio::GFF::GFF2 and Bio::GFF::GFF3

Bio::GFF::Record#comments is renamed to comment, and comments= is renamed to comment=, because they only allow a single String (or nil) and the plural form “comments” may be confusable. The “comments” and “comments=” methods can still be used, but warning messages will be shown when using in GFF2::Record and GFF3::Record objects.

See below about GFF2 and/or GFF3 specific changes.

Bio::GFF::GFF2 and Bio::GFF::GFF3

Bio::GFF::GFF2::Record.new and Bio::GFF::GFF3::Record.new can also get 9 arguments corresponding to GFF columns, which helps to create Record object directly without formatted text.

Bio::GFF::GFF2::Record#start, end, and frame return Integer or nil, and score returns Float or nil, instead of String or nil. The same changes are also made to Bio::GFF::GFF3::Record.

Bio::GFF::GFF2::Record#attributes and Bio::GFF::GFF3::Record#attributes are changed to return a nested Array, containing [ tag, value ] pairs, because of supporting multiple tags in the same tag names. If you want to get a Hash, use Record#attributes_to_hash method, though some tag-value pairs in the same tag names may be lost. Note that Bio::GFF::Record#attribute still returns a Hash for compatibility.

New methods for getting, setting and manipulating attributes are added to Bio::GFF::GFF2::Record and Bio::GFF::GFF3::Record classes: attribute, get_attribute, get_attributes, set_attribute, replace_attributes, add_attribute, delete_attribute, delete_attributes, sort_attributes_by_tag!. It is recommended to use these methods instead of directly manipulating the array returned by Record#attributes.

Bio::GFF::GFF2#to_s, Bio::GFF::GFF3#to_s, Bio::GFF::GFF2::Record#to_s, and Bio::GFF::GFF3::Record#to_s are added to support output of GFF2/GFF3 data.

Bio::GFF::GFF2

GFF2 attribute values are now automatically unescaped. In addition, if a value of an attribute is consisted of two or more tokens delimited by spaces, an object of the new class Bio::GFF::GFF2::Record::Value is returned instead of String. The new class Bio::GFF::GFF2::Record::Value aims to store a parsed value of an attribute. If you really want to get unparsed string, Bio::GFF::GFF2::Record::Value#to_s can be used.

The metadata (lines beginning with “##”) are parsed to Bio::GFF::GFF2::MetaData objects and are stored to Bio::GFF::GFF2#metadata as an array, except the “##gff-version” line. The “##gff-version” version string is stored to the Bio::GFF::GFF2#gff_version as a string.

Bio::GFF::GFF3

Aliases of columns which are renamed in the GFF3 specification are added to the Bio::GFF::GFF3::Record class: seqid (column 1; alias of “seqname”), feature_type (column 3; alias of “feature”; in the GFF3 spec, it is called “type”, but because “type” is already used by Ruby, we use “feature_type”), phase (column 8; formerly “frame”). Original names can still be used because they are only aliases.

Sequences bundled within GFF3 after “##FASTA” are now supported (Bio::GFF::GFF3#sequences).

GFF3 attribute keys and values are automatically unescaped. Each attribute value is stored as a string, except for special attributes listed below:

The metadata (lines beginning with “##”) are parsed to Bio::GFF::GFF3::MetaData objects and stored to Bio::GFF::GFF3#metadata as an array, except “##gff-version”, “##sequence-region”, “###”, and “##FASTA” lines.

Bio::Pathway

Bio::Pathway#cliquishness is changed to calculate cliquishness (clustering coefficient) for not only undirected graphs but also directed graphs.

In Bio::Pathway#to_matrix, dump_matrix, dump_list, and depth_first_search methods, to avoid dependency to the order of objects in Hash#each (and each_keys etc.), Bio::Pathway#index is used to specify preferences of nodes in a graph.

Bio::SQL and BioSQL related classes

BioSQL support is completely rewritten by using ActiveRecord. See documents in lib/bio/io/sql.rb, lib/bio/io/biosql, and lib/bio/db/biosql for details of changes and usage of the classes/modules.