In general, one will only find those SNPs that exist among the genomic samples used in the comparisons and novel SNPs will remain undiscovered [21]. This discovery bias can strongly affect taxonomic interpretation of results [22, 23].
Although discovery bias is often less consequential for genotyping efforts, the effects of our choice of strains for SNP discovery are clearly apparent in our phylogenetic tree. The discovery strains are distinguished by their positions at terminal branches in the phylogeny. There is greater diversity observed in B. abortus simply because two strains were part of the #EVP4593 cell line randurls[1|1|,|CHEM1|]# discovery panel. Furthermore, although isolates on a branch will be grouped by the SNPs they share (or do not share), additional structure exists in the “true” phylogeny that is not apparent in the
genotype tree. Branch lengths are also highly affected by the SNP discovery process. Species that are basal within this phylogeny, such as B. ceti B. pinnipedialis B. ovis, and B. neotomae have short branch lengths merely because these genomes were not part of SNP discovery. It must also be noted that B. suis biovar 5 is part of this basal group. SNPs that should group it with the rest of the B. suis clade were not present in our MIP assay, which is not surprising since this branch is extremely short, even with whole genome analysis [JTF unpubl. data, [24]. We did not observe differentiation of these and the other Brucella species, nor this website did we expect it because genomes from these groups were not a part of SNP discovery. Whole genome resequencing at the Broad Institute of MIT/Harvard recently generated genomes for over 100 additional Brucella strains and these genomes should provide a broad basis for future genotyping efforts, with canonical SNPs developed for each of the important isolates and clades. Future genotyping
efforts should include SNPs from all of the recognized species and biovars. Comparative work using some of these genomes has already been fruitful, demonstrating the emergence of Silibinin the marine Brucella from within the terrestrial Brucella and showing a methodology for whole genome analysis [24]. A trade-off exists in current genotyping efforts between throughput and genomic sampling. Does one aim for a maximum amount of potentially informative loci through approaches such as whole genome sequencing but having to sacrifice the number of isolates that can be evaluated? Or does one aim for more complete sampling of large numbers of isolates but with a limited set of loci using individual SNP assays such as CUMA? Of course the ultimate answer depends on your research interest or clinical application as well as the amount of resources at hand. MIP assays provide phylogenetic resolution for an intermediate number of samples and intermediate number of SNPs.