class Bio::PAML::Codeml::Report
Description¶ ↑
Run PAML
codeml and get the results from the output file. The Codeml::Report
object is returned by Bio::PAML::Codeml.query
. For example
codeml = Bio::PAML::Codeml.new('codeml', :runmode => 0, :RateAncestor => 1, :alpha => 0.5, :fix_alpha => 0) result = codeml.query(alignment, tree)
where alignment and tree are Bioruby objects. This class assumes we have a buffer containing the output of codeml.
References
¶ ↑
Phylogenetic Analysis by Maximum Likelihood (PAML
) is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained and distributed for academic use free of charge by Ziheng Yang. Suggestion citation
Yang, Z. 1997 PAML: a program package for phylogenetic analysis by maximum likelihood CABIOS 13:555-556
abacus.gene.ucl.ac.uk/software/paml.html
Examples¶ ↑
Invoke Bioruby’s PAML
codeml parser, after having read the contents of the codeml result file into buf (for example using File.read)
>> c = Bio::PAML::Codeml::Report.new(buf)
Do we have two models?
>> c.models.size => 2 >> c.models[0].name => "M0" >> c.models[1].name => "M3"
Check the general information
>> c.num_sequences => 6 >> c.num_codons => 134 >> c.descr => "M0-3"
Test whether the second model M3 is significant over M0
>> c.significant => true
Now fetch the results of the first model M0, and check its values
>> m0 = c.models[0] >> m0.tree_length => 1.90227 >> m0.lnL => -1125.800375 >> m0.omega => 0.58589 >> m0.dN_dS => 0.58589 >> m0.kappa => 2.14311 >> m0.alpha => nil
We also have a tree (as a string)
>> m0.tree => "((((PITG_23265T0: 0.000004, PITG_23253T0: 0.400074): 0.000004, PITG_23257T0: 0.952614): 0.000004, PITG_23264T0: 0.445507): 0.000004, PITG_23267T0: 0.011814, PITG_23293T0: 0.092242);"
Check the M3 and its specific values
>> m3 = c.models[1] >> m3.lnL => -1070.964046 >> m3.classes.size => 3 >> m3.classes[0] => {:w=>0.00928, :p=>0.56413}
And the tree
>> m3.tree => "((((PITG_23265T0: 0.000004, PITG_23253T0: 0.762597): 0.000004, PITG_23257T0: 2.721710): 0.000004, PITG_23264T0: 0.924326): 0.014562, PITG_23267T0: 0.000004, PITG_23293T0: 0.237433);"
Next take the overall posterior analysis
>> c.nb_sites.size => 44 >> c.nb_sites[0].to_a => [17, "I", 0.988, 3.293]
or by field
>> codon = c.nb_sites[0] >> codon.position => 17 >> codon.probability => 0.988 >> codon.dN_dS => 3.293
with aliases
>> codon.p => 0.988 >> codon.w => 3.293
Now we generate special string ‘graph’ for positive selection. The following returns a string the length of the input alignment and shows the locations of positive selection:
>> c.nb_sites.graph[0..32] => " ** * * *"
And with dN/dS (high values are still an asterisk *)
>> c.nb_sites.graph_omega[0..32] => " 3* 6 6 2"
We also provide the raw buffers to adhere to the principle of unexpected use. Test the raw buffers for content:
>> c.header.to_s =~ /seed/ => 1 >> m0.to_s =~ /one-ratio/ => 3 >> m3.to_s =~ /discrete/ => 3 >> c.footer.to_s =~ /Bayes/ => 16
Finally we do a test on an M7+M8 run. Again, after loading the results file into buf
Invoke Bioruby’s PAML
codeml parser
>> c = Bio::PAML::Codeml::Report.new(buf78)
Do we have two models?
>> c.models.size => 2 >> c.models[0].name => "M7" >> c.models[1].name => "M8"
Assert the results are significant
>> c.significant => true
Compared to M0/M3 there are some differences. The important ones are the parameters and the full Bayesian result available for M7/M8. This is the naive Bayesian result:
>> c.nb_sites.size => 10
And this is the full Bayesian result:
>> c.sites.size => 30 >> c.sites[0].to_a => [17, "I", 0.672, 2.847] >> c.sites.graph[0..32] => " ** * * *"
Note the differences of omega with earlier M0-M3 naive Bayesian analysis:
>> c.sites.graph_omega[0..32] => " 24 3 3 2"
The locations are the same, but the omega differs.
Attributes
Public Class Methods
Parse codeml output file passed with buf
, where buf contains the content of a codeml result file
# File lib/bio/appl/paml/codeml/report.rb 222 def initialize buf 223 # split the main buffer into sections for each model, header and footer. 224 sections = buf.split("\nModel ") 225 model_num = sections.size-1 226 raise ReportError,"Incorrect codeml data models=#{model_num}" if model_num > 2 227 foot2 = sections[model_num].split("\nNaive ") 228 if foot2.size == 2 229 # We have a dual model 230 sections[model_num] = foot2[0] 231 @footer = 'Naive '+foot2[1] 232 @models = [] 233 sections[1..-1].each do | model_buf | 234 @models.push Model.new(model_buf) 235 end 236 else 237 # A single model is run 238 sections = buf.split("\nTREE #") 239 model_num = sections.size-1 240 raise ReportError,"Can not parse single model file" if model_num != 1 241 @models = [] 242 @models.push sections[1] 243 @footer = sections[1][/Time used/,1] 244 @single = ReportSingle.new(buf) 245 end 246 @header = sections[0] 247 end
Public Instance Methods
Give a short description of the models, for example ‘M0-3’
# File lib/bio/appl/paml/codeml/report.rb 250 def descr 251 num = @models.size 252 case num 253 when 0 254 'No model' 255 when 1 256 @models[0].name 257 else 258 @models[0].name + '-' + @models[1].modelnum.to_s 259 end 260 end
Return a PositiveSites
(naive empirical bayesian) object
# File lib/bio/appl/paml/codeml/report.rb 273 def nb_sites 274 PositiveSites.new("Naive Empirical Bayes (NEB)",@footer,num_codons) 275 end
Return the number of condons in the codeml alignment
# File lib/bio/appl/paml/codeml/report.rb 263 def num_codons 264 @header.scan(/seed used = \d+\n\s+\d+\s+\d+/).to_s.split[5].to_i/3 265 end
Return the number of sequences in the codeml alignment
# File lib/bio/appl/paml/codeml/report.rb 268 def num_sequences 269 @header.scan(/seed used = \d+\n\s+\d+\s+\d+/).to_s.split[4].to_i 270 end
If the number of models is two we can calculate whether the result is statistically significant, or not, at the 1% significance level. For example, for M7-8 the LRT statistic, or twice the log likelihood difference between the two compared models, may be compared against chi-square, with critical value 9.21 at the 1% significance level.
Here we support a few likely combinations, M0-3, M1-2 and M7-8, used most often in literature. For other combinations, or a different significance level, you’ll have to calculate chi-square yourself.
Returns true or false. If no result is calculated this method raises an error
# File lib/bio/appl/paml/codeml/report.rb 294 def significant 295 raise ReportError,"Wrong number of models #{@models.size}" if @models.size != 2 296 lnL1 = @models[0].lnL 297 model1 = @models[0].modelnum 298 lnL2 = @models[1].lnL 299 model2 = @models[1].modelnum 300 case [model1, model2] 301 when [0,3] 302 2*(lnL2-lnL1) > 13.2767 # chi2: p=0.01, df=4 303 when [1,2] 304 2*(lnL2-lnL1) > 9.2103 # chi2: p=0.01, df=2 305 when [7,8] 306 2*(lnL2-lnL1) > 9.2103 # chi2: p=0.01, df=2 307 else 308 raise ReportError,"Significance calculation for #{descr} not supported" 309 end 310 end
Return a PositiveSites
Bayes Empirical Bayes (BEB) analysis
# File lib/bio/appl/paml/codeml/report.rb 278 def sites 279 PositiveSites.new("Bayes Empirical Bayes (BEB)",@footer,num_codons) 280 end