Preview only show first 10 pages with watermark. For full document please download

Whole Genome Genotyping Technologies On The Beadarray™ Platform

The ability to simultaneously genotype hundreds of thousands of single-nucleotide polymorphisms (SNPs) in a single assay has recently become feasible due to innovative combinations of assay and array platform multiplexing. In this review, we describe

   EMBED


Share

Transcript

  © 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim41Biotechnol. J. 2007, 2, 41–49DOI 10.1002/biot.200600213www.biotechnology-journal.com 1Introduction Advances in the field of genetics are highly dependent onenabling technologies to perform accurate high-resolu-tion genomic analysis. These technologies have the po-tential to revolutionize our ability to identify disease-as-sociated loci and loci involved in mediating clinical re-sponse and potential toxicity to drug therapy [1].Whole genome genotyping (WGG) technologies haverecently emerged as attractive tools to genotype hun-dreds of thousands of SNP markers on a genome-widescale [2]. These SNP markers can be used in linkage dis-equilibrium-based (LD) association studies to find ge-nomic regions harboring variants associated with in-creased disease incidence. LD is the nonrandom associa-tion between two or more alleles such that certain combi-nations of alleles are more likely to occur together on achromosome than other combinations of alleles.The WGG approach is sensitive in detecting multiplesmall gene effects often found in complex diseases anddoes not rely on prior identification of candidate genes orregions. Currently, a fixed set of genome-wide SNP mark-ers appears attractive. SNP content for genome-wide as-sociation studies can be categorized based on randomselection [3], tag SNPs, and functional (focused sets of genes or non-synonymous SNPs) marker sets. The Inter-national HapMap project has provided the haplotypeblock structure of the human genome, and enabled se-lection of tag SNPs for several populations [4]. Tag SNPsserve as proxies for a much larger set of genetically re-dundant SNPs, and essentially capture a major fraction of the “variation” present within a population. The haplo-type map and corresponding tag SNPs provide a frame-work for discovering associations between genes anddisease, and may enable SNP characterization to play arole in personalized medicine. Other findings from theHapMap project are: variation in local recombinationrates is a major determinant of LD, breakdown of LD isvariable and can appear as “block-like” structure, and atypical SNP is highly correlated with many neighboringSNPs in any given population. Technical Report Whole genome genotyping technologies on theBeadArray™platform Frank J. Steemers and Kevin L. Gunderson Illumina, Inc., San Diego, CA, USAThe ability to simultaneously genotype hundreds of thousands of single-nucleotide polymor-phisms (SNPs) in a single assay has recently become feasible due to innovative combinationsofassay and array platform multiplexing. In this review, we describe the development of theInfinium ® whole genome genotyping technology and the BeadArray TM platform. We discuss theautomated use and performance of a series of genotyping BeadChips, including data quality, tech-nology scalability, and flexibility in designing array content. We describe high-density tag SNP-based BeadChips and various multi-sample BeadChip configurations with their respective appli-cations. These technologies are enabling large-scale whole genome association studies that havethe potential to revolutionize our ability to detect common genetic variants with a significant rolein identifying disease-associated loci, proteins, biomarkers, and pharmacogenomic responses. Keywords : Whole genome genotyping · Bead Array · Association studies · Infinium assay · Illumina Correspondence : Dr. Frank J. Steemers, Illumina, Inc., 9885 Towne CentreDrive, San Diego, CA 92121, USA E-mail : [email protected] Fax : +1-858-202-4680 Abbreviations : ASPE , allele-specific primer extension; CGH , comparativegenomic hybridization; LD , linkage disequilibrium; LOH , loss of heterozy-gosity; MAF , minor allele frequency; SBE , single-base extension; SNP , sin-gle-nucleotide polymorphism; WGA , whole genome-amplified; WGG ,whole genome genotypingReceived13October 2006Revised22November 2006Accepted23November 2006  BiotechnologyJournal Biotechnol. J. 2007, 2, 41–4942© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Our recent data indicate that approximately 550 000tag SNPs capture about 90% of the variation (  r  2 >0.7) pres-ent in the European and Asian populations, and 650 000SNPs capture 75% of the variation present in the Yorubanpopulations (unpublished data).  r  2 is a measure of LD be-tween two SNP markers within a population. An  r  2 of zeroindicates that the phase of one SNP marker is completelyindependent of the other; in contrast, an  r  2 of 1 indicatescomplete linkage between two markers (only two haplo-types present in the population). Cordon et al  . [5] showedthat the content of the HumanHap300 has comparablepower to the Affymetrix 500K providing substantial LD-based coverage of common variation in non-African pop-ulations; the precise extent is strongly dependent on thefrequencies of alleles of interest and on specific consider-ations of study design.From a WGG technology perspective, tag SNPs can beused to reduce the number of genotyped loci while pro-ducing the same amount of information as a larger num-ber of randomly chosen loci. Hinds et al  . [6] conclude that Tag SNPs in WGG genotyping products economizes ongenotyping demands by providing an equivalent poweras threefold more randomly chosen SNPs (up to onemillion SNPs). The use of fewer redundant SNPs alsominimizes data handling and computation time and re-duces the false-positive errors from multiple hypothesistesting.Based on recent HapMap data, the key technologyneeds for genome-wide association studies are: (i) theability to accurately and economically genotype hundredsof thousands of loci across thousands of well-phenotypedcase and matched-control population samples [6, 7], (ii) arobust means of processing these samples easily and effi-ciently, (iii) a technician-friendly automatable processthat reduces sample tracking errors, and (iv) a genotypingplatform enabling unconstrained SNP selection allowingaccess to tag SNPs throughout the genome. This last re-quirement is rather essential since around 48% of theHapMap blocks are singletons (an SNP allele that is ob-served only once in the sample being analyzed), and ob-viously not well covered if the SNPs are randomly chosen(CEU,  r  2 threshold of 0.7, HapMap release 20). In thisreview, the Illumina’s WGG approach and associatedtechnologies are described, along with several product examples to illustrate the versatility and reach of thetechnology. 2Infinium I and II WGG assays Array technology has been successfully applied to theanalysis of the entire transcriptome [8]. However, the suc-cessful implementation of an array-based WGG assay hasseveral fundamental challenges: (i) the much greater com-plexity of the human genome, (ii) the low partial concen-tration of any given locus, and (iii) the requirement for anaccurate, single-base readout of the SNP locus.The concept of the Infinium WGG assay is based ondirect hybridization of whole genome-amplified (WGA)genomic DNA to a bead array of 50-mer locus-specificprimers [9–13]. After locus-specific hybridization captureof each individual target to their cognate bead, each SNPlocus is “scored” by an enzymatic-based extension assayusing labeled nucleotides. After extension, these labelsare visualized by staining with a sandwich-based im-munohistochemistry assay that increases the overall sen-sitivity of the assay (Fig. 1). Inherent to the WGG designis the ability for virtually unlimited scalability, dependent only on the physical constraint of the number of arrayelements, and not by loci multiplexing constraints. Addi-tionally, the uniformity of locus representation in the WGAsamples enables access to almost any SNP in the genome,and the high DNA yield increases the partial concentra-tion of any given locus. We observed high correlation(  r  2 >0.98) of SNP signal intensities between amplificationreplicates, and between the same loci of different sam-ples, exemplifying the robustness of the amplification re-action (Steemers et al  ., unpublished results). The relative-ly high target concentrations generated drive the hy-bridization capture of the target loci to the probes on thearray. Specificity of the WGG assay was achieved by com-bining three different design elements into the assay: (i)highly specific hybridization capture using long full-length 50-mer oligonucleotide probes, (ii) an enzymaticarray-based allelic scoring step, and (iii) removal of non-specific target-target primer extension products beforearray readout. Enzymatic SNP scoring assays have beenextensively reviewed [14]. We employ two different primer extension assays: allele-specific primer extension(ASPE) for our Infinium I assay, and single-base extension(SBE) for our Infinium II assay (Figs. 1b, 1c). The ASPE as-say, a one-color format, is specifically designed to allowthe detection of all SNP classes by employing two identi-cal probes differing only at their 3’ base. One probe is theperfect match hybrid for allele A, and the other is the per-fect match hybrid for allele B. The ability to detect all SNPclasses is useful for genotyping custom SNPs. The SBEassay format uses a single probe per SNP with a two-col-or readout. This characteristic reduces the required num-ber of oligonucleotides by half as compared to the ASPEassay, allowing WGG probe sets to be made more eco-nomically. The limitation of the two-color SBE assay de-sign is that only 83% of common bi-allelic SNPs can bescored on a single slide. The A and T nucleotides arestained in one color and C and G in another, therefore ATand GC polymorphisms cannot be detected. However, theremaining 17% of SNPs can be simultaneously genotypedon the same slide using a two-probe ASPE design withSBE biochemical scoring [10].  © 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim43 3High-density array platforms; BeadChips WGG technology is currently deployed on the Sentrix ® BeadChip platform. BeadChips are micro-electro-me-chanical systems in which wells are created through acombination of photo lithography and plasma etching onsilicon wafers. The main concept of array manufacturingis that DNA-immobilized beads are randomly dispersedand assembled into wells of a slide [15, 16]. A decodingprocess maps the location and identity of each beadonthe array [17]. Currently, 3- µ m-diameter beads (with~5- µ m center-to-center spacing) are utilized, allowingpacking of over ~13 million beads on a single slide(Fig.2).With current bead density, and a redundancy of ~20beads per bead type, this yields over 700 000 possibleunique bead types per slide.In addition to array technology, Illumina has imple-mented Oligator ® technology to complement its array andassay technologies [18]. This high-throughput oligo syn-thesis facility is instrumental to the production of high-quality, low-cost, genome-wide probe sets. Oligos are im-mobilized to chemically activated beads in a high-throughput fashion using 384-well microtiter plates [19].After oligo attachment, individual bead types are pooledand arrayed. Beads as array elements have several ad-vantages: (i) monodisperse beads yield uniform “spot sizes”, and (ii) oligos are immobilized in bulk surface re-actions on millions of beads, as compared to functional-ization as an independent event, e.g  ., spotting or synthe-sizing each oligo on arrays. This feature contributes to ar-ray-to-array feature consistency. Additionally, the abilityto immobilize oligos through their 5’-end attachment moi-ety generates 3’ full-length probes.The BeadChip substrate provides a versatile arrayplatform that can be formatted into various sample lay-outs. In Fig. 3 several exemplar layouts are shown: a sin-gle sample and two multi-sample iSelect  ™ configurationsThe modularity of the layout is enabled by the use of indi-vidual bead pools and separate loading stripes on theBeadChip. Given 12 stripes, one can load 12 different bead pools for up to 720 000 assays across a single sam-ple, or alternatively one can load a single bead pool 12times for up to 60 000 assays across 12 different samples.These multi-sample offerings are typically used for finemapping studies by researchers who have already per-formed WGG and/or have particular genomic regionswith SNPs of interest. Simple gasket layouts allow differ-ent samples to be applied to the different stripes on the Biotechnol. J. 2007, 2, 41–49www.biotechnology-journal.com Figure 1. Infinium assays. (a) A schematic of the Infinium assay, (b) Infinium I probe design (ASPE), and (c) Infinium II probe design (SBE). Two-colorSBE assay biochemistry can be used to simultaneously score both assay designs.  BiotechnologyJournal Biotechnol. J. 2007, 2, 41–4944© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim BeadChip. Additionally, these removable chamberedplastic gaskets, with inlet and outlet ports, create a low-volume gap (~10 µ L per stripe) minimizing the amount of sample required (see Fig. 3). An additional advantage of this customizable approach is that standard products canbe configured with several stripes reserved for customSNPs. For example, the iSelect Infinium concept allows afocused set of markers relevant to a particular disease,candidate gene set, or population to be added to theBeadChip. Products that employ this concept are the Hu-manHap550+, the HumanHap550Y, and the Human-Hap300-Duo. The HumanHap550+ product design con-sists of 10 stripes loaded with standard product beadpools with two stripes reserved for two bead pools con-taining up to 120 000 custom SNP markers. The Human-Hap650Y genotyping BeadChip uses the base Human-Hap550 BeadChip and includes two additional bead poolsencompassing over 100 000 Yoruban SNPs to extend thecoverage for this population. The recent introduction of the HumanHap300-Duo BeadChip enables researchers toanalyze two samples across 317 000 tag SNP markers ona single BeadChip. This configuration allows the pairedanalysis of two matched DNA samples, for example as inDNA copy/loss of heterozygosity (LOH) analysis of tumorgenomic DNA and matched normal genomic DNA in can-cer studies [20]. 4Infinium workflow and assay automation The Infinium workflow can be divided in three main seg-ments: (i) sample preparation, (ii), sample fragmentationand hybridization, (iii) extension, staining, and scanning(Scheme 1). On day 1, the WGA process generates hun-dred of micrograms of amplified genomic DNA startingfrom an initial input of several hundred nanograms(250–750 ng) of relatively intact, quantified, DNA (>1 kbfragment size). For a BeadChip with over 650 000 SNPs,this translates into approximately 1 pg genomic DNA perSNP. The quality of the DNA sample (average length, de-gree of cross-linking, and lesion frequency) can affect theamplification efficiency. For example, we have seen pooramplification from low-quality DNA harvested from for-malin-fixed, paraffin-embedded samples. On day 2, theamplified genomic DNA is fragmented to an average sizeof around 300 bp using an endpoint enzymatic fragmen-tation protocol. DNA fragmentation improves the hy-bridization of the DNA sample. To facilitate plate-basedamplification and isopropanol precipitation, the genomicDNA sample is amplified in four separate wells. After frag-mentation, precipitation, and resuspension in hybridiza-tion buffer, the four wells are recombined. This material ishybridized overnight to a BeadChip placed in a humidi-fied chamber. On day 3, the BeadChips undergo primerextension, immunohistochemical staining, and are im-aged using a two-color confocal laser system with 0.8- µ mresolution. The Bead intensities are extracted, and geno-types are calculated using an Illumina-supplied clusterfile, which is based on a set of reference samples. The nor-malization algorithm adjusts for nominal offset, cross-talk,and intensity variations observed in the two-color chan-nels. The behavior of each locus is modeled using a cus-tom-made clustering algorithm that incorporates severalbiological heuristics for calling SNP genotypes. In caseswere fewer then three clusters are observed, locations andshapes of the missing clusters are estimated using artifi-cial neural networksThe Infinium WGG assay was automated using theTecan GenePaint™ system, which combines roboticswith a simple capillary-gap-based fluidics system (Te-Flow chambers) (Fig. 4). In this system, BeadChips are as-sembled in Te-Flow chambers creating a ~70- µ m capil-lary gap that enables easy fluidics processing. Gravity Figure 2. BeadArray technology: beads in wells concept. Beads are ran-domly dispersed in wells on silicon wafers. Figure 3. Examples of (a) HumanHap300-Duo+ with gasket technology,(b) iSelect BeadChips with gasket technology (the sample access ports arethe alternating left and right areas adjacent to each stripe), and (c) Hu-manHap550+. abc  © 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim45 flow allows reagent exchange (wash solutions, blockingmixes, extension reagents), and capillary action retainsreagents within the gap after the reservoir is emptied.Reagents are delivered robotically by dispensing reagentsinto the flow cell reservoir. For single-sample BeadChips,amplified DNA samples are hybridized in the Te-Flowchambers. For the multi-sample BeadChip configuration,the individual samples are robotically dispensed to the in-let ports of the gasket seals (Fig. 3) and hybridized. Afterhybridization, the seals are removed and the BeadChipsare assembled in the Te-Flow chambers. Subsequently,the workflow proceeds similarly as a single-sample Bead-Chip. We have formulated the reagents in aliquoted, sin-gle-use tubes, ensuring ease of use, consistency, and min-imizing reagent preparation errors. Ninety-six BeadChipscan be processed in parallel on a temperature-controlledTecanchamber rack. The automation contributes to theease of use, robustness of the system, and the laboratoryinformation management system tracking capability of the DNA samples in the Infinium assay. 5High-density Infinium BeadChips for genotyping and SNP-comparative genomichybridization applications The ability to select almost any SNP of interest, especial-ly tag SNPs, allows for the design of information-richWGG products as exemplified in the HumanHap550BeadChip product. The >550 000 tag SNPs were selectedfrom over 2 million common SNPs discovered in the re-cently completed Phase I and II International HapMapProject using an algorithm for the LD statistics, “r 2 ”, to se-lect tagSNPs in all HapMap populations [1, 12, 21]. Thetag SNP content was supplemented with additional SNPsto achieve even spacing across the genome. On average,there is one common SNP [minor allele frequency (MAF)>0.05] every 5.5, 6.5 and 6.3 kb on the autosomes in the Biotechnol. J. 2007, 2, 41–49www.biotechnology-journal.com Scheme 1. Infinium assay workflow. Details of the assay protocols over 3 days are displayed. Figure 4. (a) BeadChip in slide holder. (b) Capillary gap allows the flow of reagents over the BeadChip surface. (c) A 96-slide processing station withtemperature control. abc