Skip to main content
Figure 1 | BMC Research Notes

Figure 1

From: BMX: a tool for computing bacterial phyletic composition from orthologous maps

Figure 1

A flow-chart describing the pipeline implemented in BMX. (A) The pipeline consists of three mains steps that involve (i) reorganisation and quality assessment of the data, (ii) random sampling of genomes for each computation event, and (iii) iterative computation of the core and accessory genome size and composition. These are marked as step 1 through 3 respectively. (B) A quality assessment heatmap is generated at step 1, with genes represented in red and the absence of genes represented in blue. The core genome is represented by solid red columns of genes. Corrupted or poor quality genomes can be visually identified, for instance as having missing genes in the core genomes. (C) During the core genome analysis, genomes in the dataset are selected at random from the dataset and each genome (represented by a number) is selected only once. The calculation of the core genome size of N genomes is performed N times, starting with determination of the size determination of one genome up to N genomes, with an increment of 1 genome is each subsequent calculation. This is known as an event and is implemented as an arithmetic progression. 100 such events are performed and the average value is used to establish the core genome size of N genomes. (D) A graph of the core genome size is generated. The files showing the core genome sizes (“CG.txt”) and accessory clusters (“not_CG.txt”), which do not form part of the core genome, are also generated.

Back to article page