Skip to main content

Table 5 Comparison of the alignment-free distances and the benchmark MSA distance for 70 Gammaproteobacteria genomes

From: Comparison of next-generation sequencing samples using compression-based distances and its application to phylogenetic reconstruction

   d NCD d d CDM CVTree d 2 S co-phylog
  parsimony score 17 18 18 17 18 25
16s rRNA sequences tree symmetric difference 50 52 52 50 62 108
  distance correlation 0.93 0.90 0.93 0.92 0.92 0.65
  parsimony score 22 22 21 21 31 26
Genome sequences tree symmetric difference 80 78 76 84 110 110
  distance correlation 0.47 0.46 0.47 0.67 0.50 0.45
  parsimony score 21 19 23 24 32 28
NGS short reads tree symmetric difference 90 70 84 88 114 116
  distance correlation 0.60 0.58 0.53 0.63 0.48 0.42
  1. The NGS short reads were simulated from the whole genome sequences using the Exact model of MetaSim at 1 × sampling depth. The two smallest parsimony scores, the two smallest tree symmetric differences and the two highest correlation coefficients are highlighted in boldface. For CVTree, we used k = 7 for the 16S rRNA data set and k = 12 for the whole genome and NGS data sets. For d 2S, we used k = 6 for the 16S rRNA data set and k = 8 for the whole genome and NGS data sets.