Treats a FASTA formatted entry, such as:
>id and/or some comments <== comment line ATGCATGCATGCATGCATGCATGCATGCATGCATGC <== sequence lines ATGCATGCATGCATGCATGCATGCATGCATGCATGC ATGCATGCATGC
The precedent ‘>’ can be omitted and the trailing ‘>’ will be removed automatically.
f_str = <<END_OF_STRING >sce:YBR160W CDC28, SRM5; cyclin-dependent protein kinase catalytic subunit [EC:2.7.1.-] [SP:CC28_YEAST] MSGELANYKRLEKVGEGTYGVVYKALDLRPGQGQRVVALKKIRLESEDEG VPSTAIREISLLKELKDDNIVRLYDIVHSDAHKLYLVFEFLDLDLKRYME GIPKDQPLGADIVKKFMMQLCKGIAYCHSHRILHRDLKPQNLLINKDGNL KLGDFGLARAFGVPLRAYTHEIVTLWYRAPEVLLGGKQYSTGVDTWSIGC IFAEMCNRKPIFSGDSEIDQIFKIFRVLGTPNEAIWPDIVYLPDFKPSFP QWRRKDLSQVVPSLDPRGIDLLDKLLAYDPINRISARRAAIHPYFQES >sce:YBR274W CHK1; probable serine/threonine-protein kinase [EC:2.7.1.-] [SP:KB9S_YEAST] MSLSQVSPLPHIKDVVLGDTVGQGAFACVKNAHLQMDPSIILAVKFIHVP TCKKMGLSDKDITKEVVLQSKCSKHPNVLRLIDCNVSKEYMWIILEMADG GDLFDKIEPDVGVDSDVAQFYFQQLVSAINYLHVECGVAHRDIKPENILL DKNGNLKLADFGLASQFRRKDGTLRVSMDQRGSPPYMAPEVLYSEEGYYA DRTDIWSIGILLFVLLTGQTPWELPSLENEDFVFFIENDGNLNWGPWSKI EFTHLNLLRKILQPDPNKRVTLKALKLHPWVLRRASFSGDDGLCNDPELL AKKLFSHLKVSLSNENYLKFTQDTNSNNRYISTQPIGNELAELEHDSMHF QTVSNTQRAFTSYDSNTNYNSGTGMTQEAKWTQFISYDIAALQFHSDEND CNELVKRHLQFNPNKLTKFYTLQPMDVLLPILEKALNLSQIRVKPDLFAN FERLCELLGYDNVFPLIINIKTKSNGGYQLCGSISIIKIEEELKSVGFER KTGDPLEWRRLFKKISTICRDIILIPN END_OF_STRING f = Bio::FastaFormat.new(f_str) puts "### FastaFormat" puts "# entry" puts f.entry puts "# entry_id" p f.entry_id puts "# definition" p f.definition puts "# data" p f.data puts "# seq" p f.seq puts "# seq.type" p f.seq.type puts "# length" p f.length puts "# aaseq" p f.aaseq puts "# aaseq.type" p f.aaseq.type puts "# aaseq.composition" p f.aaseq.composition puts "# aalen" p f.aalen
FASTA format (WikiPedia) en.wikipedia.org/wiki/FASTA_format
Entry delimiter in flatfile text.
(Integer) excess read size included in DELIMITER.
Stores the comment and sequence information from one entry of the FASTA format string. If the argument contains more than one entry, only the first entry is used.
# File lib/bio/db/fasta.rb, line 119 def initialize(str) @definition = str[/.*/].sub(/^>/, '').strip # 1st line @data = str.sub(/.*/, '') # rests @data.sub!(/^>.*/, '') # remove trailing entries for sure @entry_overrun = $& end
Returens the length of Bio::Sequence::AA.
# File lib/bio/db/fasta.rb, line 209 def aalen self.aaseq.length end
Returens the Bio::Sequence::AA.
# File lib/bio/db/fasta.rb, line 204 def aaseq Sequence::AA.new(seq) end
Returns accession number with version.
# File lib/bio/db/fasta.rb, line 265 def acc_version identifiers.acc_version end
Returns an accession number.
# File lib/bio/db/fasta.rb, line 253 def accession identifiers.accession end
Parsing FASTA Defline (using identifiers method), and shows accession numbers. It returns an array of strings.
# File lib/bio/db/fasta.rb, line 260 def accessions identifiers.accessions end
Returns the stored one entry as a FASTA format. (same as to_s)
# File lib/bio/db/fasta.rb, line 127 def entry @entry = ">#{@definition}\n#{@data.strip}\n" end
Parsing FASTA Defline (using identifiers method), and shows a possibly unique identifier. It returns a string.
# File lib/bio/db/fasta.rb, line 239 def entry_id identifiers.entry_id end
Parsing FASTA Defline (using identifiers method), and shows GI/locus/accession/accession with version number. If a entry has more than two of such IDs, only the first ID are shown. It returns a string or nil.
# File lib/bio/db/fasta.rb, line 248 def gi identifiers.gi end
Parsing FASTA Defline, and extract IDs. IDs are NSIDs (NCBI standard FASTA sequence identifiers) or “:”-separated IDs. It returns a Bio::FastaDefline instance.
# File lib/bio/db/fasta.rb, line 229 def identifiers unless defined?(@ids) then @ids = FastaDefline.new(@definition) end @ids end
Returns sequence length.
# File lib/bio/db/fasta.rb, line 189 def length seq.length end
Returns locus.
# File lib/bio/db/fasta.rb, line 270 def locus identifiers.locus end
Returens the length of Bio::Sequence::NA.
# File lib/bio/db/fasta.rb, line 199 def nalen self.naseq.length end
Returens the Bio::Sequence::NA.
# File lib/bio/db/fasta.rb, line 194 def naseq Sequence::NA.new(seq) end
Executes FASTA/BLAST search by using a Bio::Fasta or a Bio::Blast factory object.
#!/usr/bin/env ruby
require 'bio'
factory = Bio::Fasta.local('fasta34', 'db/swissprot.f')
flatfile = Bio::FlatFile.open(Bio::FastaFormat, 'queries.f')
flatfile.each do |entry|
p entry.definition
result = entry.fasta(factory)
result.each do |hit|
print "#{hit.query_id} : #{hit.evalue}\t#{hit.target_id} at "
p hit.lap_at
end
end
# File lib/bio/db/fasta.rb, line 150 def query(factory) factory.query(entry) end
Returns a joined sequence line as a String.
# File lib/bio/db/fasta.rb, line 157 def seq unless defined?(@seq) unless /\A\s*^\#/ =~ @data then @seq = Sequence::Generic.new(@data.tr(" \t\r\n0-9", '')) # lazy clean up else a = @data.split(/(^\#.*$)/) i = 0 cmnt = {} s = [] a.each do |x| if /^# ?(.*)$/ =~ x then cmnt[i] ? cmnt[i] << "\n" << $1 : cmnt[i] = $1 else x.tr!(" \t\r\n0-9", '') # lazy clean up i += x.length s << x end end @comment = cmnt @seq = Bio::Sequence::Generic.new(s.join('')) end end @seq end
Returns sequence as a Bio::Sequence object.
Note: If you modify the returned Bio::Sequence object, the sequence or definition in this FastaFormat object might also be changed (but not always be changed) because of efficiency.
# File lib/bio/db/fasta.rb, line 220 def to_biosequence Bio::Sequence.adapter(self, Bio::Sequence::Adapter::FastaFormat) end
Generated with the Darkfish Rdoc Generator 2.
Returns comments.