Reproducibility and reliability of SNP analysis using human cellular DNA at or near nanogram levels
© Okitsu et al.; licensee BioMed Central Ltd. 2013
Received: 20 September 2013
Accepted: 2 December 2013
Published: 6 December 2013
Illumina SNP arrays have been routinely used for genome-wide association studies to identify potential biomarkers for various diseases. The recommended 200 ng of DNA for high-quality results is a roadblock to utilizing this assay when such quantities of DNA are not available. The goal of this study is to determine the reproducibility and reliability of the assay when reduced amounts of DNA are used for the SNP arrays.
A serial 3-fold reduction of DNA from 200 ng to 0.8 ng was used for an Illumina SNP array in duplicates (200 ng, 66.7 ng, 22,2 ng, and 7.4 ng) or triplicates (2.47 ng and 0.8 ng). The reproducibility of the assay was determined by comparing allele calls (genotypes) at each locus within the duplicates or triplicates. The reliability of samples of reduced quantity was determined by comparing allele calls from samples of different quantities. As expected, the reproducibility and reliability both decrease with decreasing amounts of DNA used for the arrays. However, results of comparable quality to the 200 ng DNA recommended by Illumina can be obtained with much reduced amounts of DNA.
Reasonably reproducible and reliable results can be obtained with quantities of DNA, as low as 0.8 ng (equivalent to 133 human cells), well below the manufacturer’s recommendation. Results of nearly equal quality to that of using 200 ng DNA can be obtained with 22.2 ng of DNA reliably, and clearly acceptable data can be obtained using 7.4 ng of DNA for Illumina SNP arrays.
Illumina SNP arrays have been routinely used for genome-wide association studies (GWAS) to identify potential biomarkers for various diseases. The recommended 200 ng of DNA for high quality results prohibits the application of this assay when such quantities of DNA are not available. Under conditions where only about 2000 cells (equivalent to 12 ng of DNA) were available for each SNP array analysis, we carried out a study of controls to assess the reproducibility and reliability of the results using decreasing amounts of DNA from the 200 ng recommended by Illumina. The design of this study is to include six different quantities of DNA as a serial 3-fold reduction from 200 ng to 0.8 ng for each Illumina SNP array with a total of 730,525 SNPs. Each of the four higher quantities was done in duplicate and triplicates were done for the two lowest quantities. Allele calls among the duplicate or the triplicate samples of the same DNA quantity were compared to determine the reproducibility of the assay. The reliability of allele calls from samples of reduced quantity was evaluated by comparing calls from samples of lower quantities and from samples of higher quantities. It is the goal of this study to determine 1) whether decreased quantities of DNA used reduce the reproducibility of the assay; 2) the reliability of the results when reduced amounts of DNA are used for the assay; and 3) the lowest amount of DNA that permits equally reliable results as the recommended amount for the assay.
Genomic DNA from Nalm6, a human pre-B cell line, was purified using the proteinase K/phenol/chloroform extraction method. Two samples each with 200 ng, 66.7 ng, 22.2 ng, and 7.4 ng, and three samples each with 2.47 ng, and 0.8 ng of DNA from the same extraction of Nalm6 cells were processed and hybridized to the same lot of HumanOmniExpress SNP microarrays using the standard Illumina protocol. All current Illumina SNP arrays utilize Infinium genotyping chemistry .
The SNPs with no calls or discordant calls from the duplicate or the triplicate samples are identified for the analysis.
Failure rate of the assay increases only slightly with decreasing quantity of input DNA
Overall call rate of Illumina SNP arrays utilizing different quantities of starting DNA
Integrity of the allelic intensity ratio remains high even with much decreased quantities of DNA used
Summary of allele intensity ratio from arrays utilizing various quantities of starting DNA
Analysis of failed calls
The no-call (NC) SNPs may include the intrinsically low quality assays in the array, inadequate amplification of DNA in the whole genome amplification step of the assay due to the low amount of DNA used, and randomly occurring poor hybridization. The intrinsically low quality assays most likely would lead to consistent failure of genotype calls in all or most of the arrays in the current experiment and would give rise to similar numbers of NC in each array, regardless of the quantity of DNA used. The NC resulting from inadequate amplification of DNA would lead to the increased number of NC in the arrays utilizing lower quantities of DNA. In addition, random poor hybridization can occur and lead to differences, even in the duplicate or triplicate arrays using the same quantities.
Summary of problem allele calls in arrays utilizing different quantities of starting DNA
A. duplicate samples
Type of problems
DNA quantity (ng)
NC in both samples
NC in one sample, call in the other
Different calls in two samples
Total problem calls
Problem call %
B. triplicate samples
Type of problems
DNA quantity (ng)
NC in all samples
NC in two samples, call in the third one
NC in one sample, different calls in the other two
NC in one sample, same calls in the other two
Calls in all three, 1 incorrect call*
Calls in all three, 2 incorrect calls*
Calls in all three, correctness uncertain*
Total problem calls
Problem call %
A total of 587 SNPs (0.08%) had consistent NC results in all 14 samples. These SNPs are not contiguous SNPs and are not likely to be sites of homozygous deletion. These consistently failed SNPs are most likely the result of the very small percentage of intrinsically poor quality assays, even though we cannot completely rule out sporadic homozygous deletion events in the cells.
Analysis of inconsistent calls
While the rate of NC reflects the overall data quality, the lack of information at these NC SNPs would not lead to any conclusion. However, incorrect allele calls would potentially lead to misleading conclusions. Examining inconsistent calls of alleles from different arrays of the same DNA sample can reveal the reproducibility of the assay as well as how reliable the assay is when samples with suboptimal quantities of DNA are used. Therefore, it is important to know the rate of false call changes with various quantities of DNA used. There are fewer than 50 inconsistent calls when 7.4 ng or more of DNA was used for the array (0.002% to 0.007% of total SNPs), and the number of inconsistent calls increased to 543 (0.07%) when 0.8 ng of DNA was used (Table 3). The rate of discordant calls within the duplicates is consistent with the 0.1% to 0.15% reported previously using 500 ng DNA for the Illumina 1MDuo chip . These false calls are not at the same SNPs across all arrays utilizing different quantities of DNA, even though few occurred in more than one array. This finding indicates that the false calls are more likely random in nature and not due to a hybridization artifact intrinsic to the specific SNP assay. The total number of inconsistent calls remains low even though it increased more than 10-fold in DNA samples of 0.8 ng compared with DNA samples of 7.4 ng. While the total number of problem calls, including NC in any samples and inconsistent calls, increases from 3,030 in DNA samples of 200 ng to 47,675 in DNA samples of 0.8 ng (Table 3), it is clear that the cause of this increase is mostly due to NCs (41,256 SNPs involve a NC read out in at least one of the three 0.8 ng samples). These findings suggest that the reproducibility within the duplicates using 7.4 ng of DNA is nearly as good as using 200 ng of DNA. Also, the reliability of the results remain high in 0.8 ng DNA samples because the rapid rise of problem calls is the consequence of failure to make an allele call and not the consequence of making an incorrect or inconsistent calls.
Capability of detecting large deletions is not compromised by a decreased quantity of input DNA
Based on conventional karyotypic analysis, the Nalm-6 cells have a known deletion that spans several chromosomal bands of more than 8 Mb DNA in the long arm of a chromosome 5. This deletion is clearly detected in all of the arrays in the current study, and the number of SNPs with NC or conflicting calls in the region of the deletion is very small and not any higher than other parts of the genome. This would indicate that the quality of data and the capability of detecting allele loss are not compromised even at the lowest quantity of 0.8 ng of DNA used (equivalent to 133 cells).
The goal of this study was to explore the feasibility, reproducibility, and reliability of SNP analysis using much less input DNA than the recommended 200 ng for an Illumina SNP array. The findings in our study will facilitate studies that only very small quantity of DNA is available for analysis. In general, the quality of data from Illumina SNP array remains high despite the fact that the quantity of DNA used was reduced to as low as 0.8 ng from the 200 ng recommended by the manufacturer. The overall failed calls increased to just above 3.7% for the lowest quantity of 0.8 ng DNA from just over 0.3% for the 200 ng DNA. Although the variation in the intensity ratio of the allele detection increases (more scatter) with lower quantities of DNA used, the overall reproducibility and reliability of the data do not appear to be compromised. In addition to some SNPs with intrinsic assay problems and some randomly occurring failed allele calls, a small number of incorrect calls occur at random loci, even when the recommended 200 ng of DNA is used. We conclude that a nearly equal quality of data can be obtained using 22.2 ng of DNA as using 200 ng of DNA, and reproducibility as well as reliability of the data is clearly acceptable when 7.4 ng or more of DNA is used for Illumina SNP arrays. A small deletion of 1.8 Mb was detected consistently in all 14 arrays indicating the reliability of using suboptimal quantity of DNA, as low as 0.8 ng, in detecting small deletion. However, regardless of the quantity of DNA used, caution should still be exercised and confirmatory studies should certainly be done using independent assays and other types of assays when important associations to the SNP are believed to occur.
We would like to thank R. Mosteller for critical reading of the manuscript.
- Steemers FJ, Chang W, Lee G, Barker DL, Shen R, Gunderson KL: Whole-genome genotyping with the single-base extension assay. Nat Methods. 2006, 3: 31-33. 10.1038/nmeth842.PubMedView ArticleGoogle Scholar
- Staaf J, Vallon-Christersson J, Lindgren D, Juliusson G, Rosenquist R, Höglund M, Borg A, Ringnér M: Normalization of Illumina Infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios. BMC Bioinforma. 2008, 9: 409-10.1186/1471-2105-9-409.View ArticleGoogle Scholar
- Wang K, Bucan M: Copy number variation detection via high-density SNP genotyping. Cold Spring Harb Protoc. 2008, doi:10.1101/pdb.top46Google Scholar
- Hong H, Xu L, Liu J, Jones WD, Su Z, Ning B, Perkins R, Ge W, Miclaus K, Zhang L, Park K, Green B, Han T, Fang H, Lambert CG, Vega SC, Lin SM, Jafari N, Czika W, Wolfinger RD, Goodsaid F, Tong W, Shi L: Technical reproducibility of genotyping SNP arrays used in genome-wide association studies. PLoS One. 2012, 7: e44483-10.1371/journal.pone.0044483.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.