Transcript
European consortia building integrated resources for Arabidopsis functional genomics Pierre Hilson , Ian Small y and Martin TR Kuiper z European laboratories specializing in functional genomicstechnologies collaborate in several consortia to build resourcesthat facilitate gene function discovery in Arabidopsis thaliana .These resources include CATMA (a repertoire of gene-specificsequence tags), CAGE (a compendium of transcript profiles), AGRIKOLA(whichconsistsofplasmidsandmutantlinesforgenesilencing), ORFEUS (a collection of open reading frames) andSAP (a collection of promoter regions). Addresses z Department of Plant Systems Biology, Flanders Interuniversity Institutefor Biotechnology, Ghent University, Technologiepark 927, B-9052Zwijnaarde, Belgium e-mail:
[email protected] z e-mail:
[email protected] y Unite´ de Recherches en Ge´ nomique Ve´ ge´ tale (INRA-URGV), 2 rueGaston Cre´ mieux, CP 5708, 91057 Evry cedex, Francee-mail:
[email protected] Current Opinion in Plant Biology 2003, 6 :426–429This review comes from a themed issue onCell signalling and gene regulationEdited by Kazuo Shinozaki and Elizabeth Dennis1369-5266/$ – see front matter 2003 Elsevier Ltd. All rights reserved. DOI 10.1016/S1369-5266(03)00086-4 AbbreviationsFP5 5th Framework Program GST gene sequence tag MASC Multinational Arabidopsis Steering Committee ORF open reading frame PTGS post-transcriptional gene silencing RNAi RNA interference TF transcription factor Introduction Withthepublicationofthe Arabidopsis genomesequence,focus has shifted from structural to functional annotation.It is estimated that the functions of some 2500 genes havebeen successfully deduced from experimental data todate. The goal set by the scientific community anddocumented by the Multinational Arabidopsis SteeringCommittee (MASC) [1] is to unambiguously determinethe function of the >25 000 Arabidopsis genes by the year2010. Traditional genetic approaches will not suffice inthis endeavour, because mutants often display no distinctphenotype or phenotypes are difficult to interpret, andtherefore phenotypical analyses must increasingly becomplemented with exhaustive molecular characteriza-tion of biological samples. In addition, functional geno-mics tools applied on a wide scale enable the constructionof local relationship maps (revealing e.g. protein–proteininteractions, subcellular localization and co-expressionclusters), which are important for inferring functionallinks and a prerequisite for developing a systems-biologyapproach to understanding how plants operate.Function validation will be greatly facilitated by theavailability of genomics resources; this will benefit abroad community of Arabidopsis researchers and be use-ful for genome-wide studies as well as forindepth ‘gene-per-gene’ analyses. Such resources should be versatileand together should include genome and cDNAsequences, sequence repertoires (open reading frames[ORFs], promoters, and vectors designed for specificfunctional assays), related-clone repositories and trans-genic plants, as well as repositories for functional gen-omics data such as transcription-profiling datasets.Resource building is time-consuming and costly, and istherefore best carried out by coordinated multinationalconsortia; such initiatives help to spread the workloadand provide the necessary critical mass to establish opti-mizedprotocolsandtostandardizetheiruse.AppropriateMASC subcommittees track progress in resource build-ing (see http://www. Arabidopsis .org/info/2010_projects/MASC_Info.html).Thisshortreviewfocusesonresourcebuilding efforts currently ongoing in European labora-tories specializing in high-throughput functional gen-omics approaches. For all the initiatives discussed here,please see Table 1 for the relevant databases. Transcriptomics In early 2000, Unite´ de Recherches en Ge´ nomiqueVe´ ge´ tale (INRA-URGV) and Flanders InteruniversityInstitute for Biotechnology/Department of Plant Sys-tems Biology — Ghent University (VIB-PSB/RUG)launched a joint initiative to build a versatile sequencerepertoire for transcription-profiling experiments:CATMA. Its main aim is to design and amplify (usingPCR) gene-sequence tags (GSTs) for all the annotated Arabidopsis genes [2]. The GSTs are designed to be asspecific as possible in order to avoid cross-hybridizationbetweenrelatedgenesinmicroarrayexperiments,takinginto consideration the entire genome sequence. Withmore groups joining over time, the current consortiumunites laboratories in eight countries. The present GSTfragment collection covers >20 000 genes, with 3500additional amplicons in the pipeline and a new GSTset being designed that takes advantage of the latestgenomeannotations.Asforeseensrcinally,theinitiativeoffers the power and flexibility to regularly update the 426 Current Opinion in Plant Biology 2003, 6 :426–429 www.current-opinion.com GST collection according to evolving knowledge aboutthe gene repertoire. Also, the GST amplicons can easilybe reamplified and shared, subsets can be picked at willtoprintdedicatedarrays,andtheGSTscanbeclonedandused for other functional studies (see below). All GSTinformation is available through the public CATMAdatabase [3] and cloned GST amplicons will be distrib-uted via international stock centers.Transcription-profiling datasets generated by separatelaboratories are difficult to integrate; microarray formatand protocols may differ, growing conditions and samplesare rarely appropriately documented, and biological vari-ability may significantly affect the data. To investigate towhat extent cross-laboratory comparisonsarerelevant andas an attempt to provide a solid basis for such compar-isons, the CAGE project was initiated. It is funded as ademonstration project of the European Commission, aspart of the 5th Framework Program (FP5). 4000 CATMAmicroarrays will be used to analyze 2000 independentlygenerated biological samples from Arabidopsis wild-typeecotypes and mutant lines, including samples obtainedunder a variety of stress conditions, with different labora-tories duplicating samples. A key objective in the projectis the standardization of the data-production procedure,with particular attention being paid to sample ontology,RNA extraction, hybridization protocols and data proces-sing. Samples are harvested at specified growth stagesrather than at fixed times [4]. The chosen experimentalset-up is the ‘reference design’, in which each sample ishybridized to a common standard [5,6]. The resultingMIAME-compliant data compendium (MIAME standsfor ‘minimal information about a microarray experiment’and is a set of guidelines to facilitate the archiving andexploitation of microarray data) will be made availablethrough the microarray database ArrayExpress (www.ebi.ac.uk/arrayexpress). To broaden CAGE’s applicabil-ity, the necessary benchmarking will be performed tomake the CATMA-array results comparable with thoseobtained with other microarray platforms. Reverse genetics Transcriptomestudiesgivevaluableinformationongeneexpression, but do not provide direct data on gene func-tion. Other high-throughput methods for determininggene function are therefore crucial. In many modelorganisms it is possible to experimentally knock out agene by homologous recombination. The resultingmutant lines give essential clues to the function of thedisrupted gene. In flowering plants, routine targetedgene disruption has never been achieved and ‘reversegenetics’approachescurrentlyrelyonlargecollectionsof insertion mutants containing randomly inserted trans-genes or transposons [7–12]. Several large Europeaninitiatives in this field are providing very valuableresources for the Arabidopsis community; these includeExotic, GABI-KAT (www.mpiz-koeln.mpg.de/GABI-Kat/), Ge´ noplante (www.genoplante-info.infobiogen.fr/FLAGdb/), and ATIS. MASC estimates that to date60 000 sequenced insertion sites out of 230 000 world-wide are publicly available from the European projects.These insertional-mutagenesis programs are now famil-iar and widely used but the approach has limitationsthat make it difficult or impossible to obtain or studyinsertion mutants for certain types of genes; theseinclude genes that are essential or whose mutationleads to major pleiotropic effects, genes that are mem-bers of multigene families with overlapping or redun-dant functions, genes or alleles not present in thegenotypes used for insertion collections, and small, com-pact genes that are rarely mutated in insertional muta-genesis programs.Most of these limitations can be overcome by using therelatively new technique of post-transcriptional genesilencing (PTGS) triggered by RNA interference(RNAi). Several groups have demonstrated that it ispossible to artificially trigger PTGS with carefullydesigned transgenes, the most efficient being those thatproduce ‘hairpin’ RNA [reviewed in 13]. The FP5 AGRIKOLA project is creating a set of 50 000 binary Table 1Resources for Arabidopsis functional genomics. Resource Meaning of acronym Content of resource WebsiteREGIA Regulatory Gene Initiative in Arabidopsis Functional characterization of Arabidopsis TFs Not yet availableORFEUS Open Reading Frames for EU Scientists Collection of ORFs leading to completeORFeomewww.orfeome.orgCATMA Complete Arabidopsis TranscriptomeMicroArrayRepertoire of GSTs www.catma.orgCAGE Compendium of Arabidopsis Gene Expression Compendium of transcript profiles www.psb.ugent.be/CAGE/ EXOTIC EXOn Trapping Insert Consortium Collections of insertion mutants containingrandomly inserted transgenes or transposonswww.jic.bbsrc.ac.uk/science/ cdb/exotic/ ATIS Arabidopsis Transposon Insertion Sequence www.jicgenomelab.co.uk/atis/ AGRIKOLA Arabidopsis Genomic RNAi Knock-OutLine AnalysisCollection of plasmids for targetedgene silencingwww.agrikola.org/ SAP Specific Analysis of Promoters Genome-scale promoter amplicon collection www.psb.ugent.be/SAP/ European consortia building integrated resources for Arabidopsis functional genomics Hilson, Small and Kuiper 427 www.current-opinion.com Current Opinion in Plant Biology 2003, 6 :426–429 plasmids for performing constitutive or inducible tar-geted gene-silencing using RNAi on almost every Arabidopsis gene. The plasmids contain CATMA GSTsto provide the gene-targeting specificity. About 5000 of these plasmids will be used to transform Arabidopsis plants to produce lines in which specific genes havebeen post-transcriptionally silenced. Several hundred of these silenced lines will be analyzed in detail to demon-strate the usefulness of this approach and to gaininformation on the function of important Arabidopsis genes. The plasmids and mutant lines generated duringthe course of the project will be made available to thescientific community for use in other research projects.Such silenced lines will be an invaluable resource fordetermining the function of Arabidopsis genes and,by extrapolation, the function of homologous genesin other organisms, including economically importantcrop species. Clone-based functional proteomics The most practical way to handle a protein for a varietyof research purposes is to use plasmid clones containingthe corresponding ORF. From that basic intermediatematerial, the protein can be synthesized in any speciesand cell type provided that the appropriate expressioncassettes and transformation procedures exist. Theprotein can also be expressed as a translational fusionwith any chosen peptide moiety (e.g. a purification ormarker tag) by positioning its ORF in frame with thedesired coding sequence, and is therefore amenable to awide range of analytical methods, including structuralgenomics techniques (including crystallography andNMR), phenotypic (knock-in and knock-down) anal-yses, protein localization studies (using fluorescent-tagfusions and epitope tagging), molecular interactionmapping (using yeast-n-hybrid, TAP/MS and plasmon-resonance techniques) and biochemical genomics (involv-ing tagged protein purification, protein chips and antibodylibraries).It is now possible to envision the construction of full ORFcomplements representing all the proteins expressed in agiven organism (ORFeomes). With the development of novel recombinational cloning technologies, ORFeomescan be built in a versatile format allowing the shuttling of any ORF from a reference-sequence-validated plasmidclone to a destination vector [14] designed for a specificfunctional assay, via robust automated high-throughputprotocols [15]. In the context of the ORFEUS FP5 coordinationproject,European laboratorieshaveinitiatedthe collaborative construction of a unified Arabidopsis ORFeome. All ORFs generated within this frameworkareclonedinthesameGateway TM donorvectorandsharethe same configuration: a short 5 0 UTR with Shine–Delgarno and Kozak sequences for expression in eucar-yotic as well as procaryotic systems, and no STOP codon,allowing C-terminal fusions.In parallel, the REGIA FP5 project, which focuses on thefunctional characterization of Arabidopsis transcriptionfactors (TFs), has built a collection of >800 TF ORFs(which are also Gateway TM -compatible),primarily for thesystematic analysis of protein–protein interactions usingthe yeast two-hybrid system [16]. Complementary to this,another consortium has started the construction of agenome-scale promoter amplicon collection, SAP. Thisresource is designed for the printing of microarrays usedfor the analysis of TF target sites by chromatin immu-noprecipitation experiments, and also for subsequentcloning and functional studies, for example via reporterfusions or transactivation assays. Together, the TF andpromoter-sequence repertoires will provide excellenttools for investigating Arabidopsis transcriptional net-works systematically. Conclusions and perspectives Versatile functional-genomics resources need to be avail-ableif we are to completethe functional annotation of the Arabidopsis genome by 2010. In addition, methods com-bining these resources with innovative strategies shouldbe developed to broaden the potential of reverse geneticapproaches. For example, altered-phenotype or enhancer/suppressor screens that so far have been conducted usingrandom insertions (via knock-out techniques or activationtagging) or chemical mutagenesis could be replaced bysystematic knock-in and knock-down studies. Soon, wewill be able to shift from chance- to sequence-drivengenetic screens that will accelerate the identification of genetic interactions.Functional genomics will unfold in three phases: the firstinvolves the building of resources, the second the imple-mentation of analytical methods integrating parallelismand increased throughput, and the third the standardiza-tion and integration of diverse datasets. Since integrationisthekeytothesuccesswithresearchprojectsinthisarea,we should already be investing in tools in preparation forthe third, most crucial, phase. In particular, we mustcreate and implement common data standards (forinstance, MIAME) and exchange formats.Most importantly, the existing resources should bemade readily accessible to the scientific community viameasures including the creation or strengthening of appropriate stock centers, the adoption of streamlineddistribution procedures stripped of unnecessary usagerestrictions, and the development and maintenance of databases disseminating resource information and, even-tually, storing functional genomics datasets. Indeed, thenecessity of maintaining resources and knowledge in thelong term is often overlooked or underfunded in consor-tium projects, leaving responsibility to individual labora-tories, which is a financially precarious strategy. Supportfrom either local governments (the European ResearchArea) or the EU should be allocated for that purpose. 428 Cell signalling and gene regulation Current Opinion in Plant Biology 2003, 6 :426–429 www.current-opinion.com References and recommended reading Papers of particular interest, published within the annual period of review, have been highlighted as: of special interest of outstanding interest1. The Multinational Arabidopsis Steering Committee (MASC): TheMultinational Coordinated Arabidopsis thaliana functionalgenomics project — beyond the whole genome sequence. URL: http://www.nsf.gov/pubsys/ods/getpub.cfm?bio02022. Thareau V, De´ hais P, Serizet C, Hilson P, Rouze´ P, Aubourg S: Automatic design ofgene-specific sequence tags for genome-wide functional studies. Bioinformatics 2003, in press.3. Crowe ML, Serizet C, Thareau V, Aubourg S, Rouze´ P, Beynon JL,HilsonP,WeisbeekP,VanHummelenP,ReymondP etal. : CATMA — a complete Arabidopsis GST database . Nucleic Acids Res 2003, 31 :156-158.4. Boyes DC, Zayed AM, Ascenzi R, McCaskill AJ, Hoffman NE,Davis KR, Gorlach J: Growth-stage-based phenotypic analysisof Arabidopsis : a model for high-throughput functionalgenomics in plants . Plant Cell 2001, 13 :1499-1510.5. Dudley AM, Aach J, Steffen MA, Church GM: Measuring absoluteexpression with microarrays with a calibrated referencesample and an extended signal intensity range . Proc Natl Acad Sci USA 2002, 99 :7554-7559.6. Sterrenburg E, Turk R, Boer JM, van Ommen GB, den Dunnen JT: A common reference for cDNA microarray hybridizations . Nucleic Acids Res 2002, 30 :116-118.7. Szabados L, Kovacs I, Oberschall A, Abraham E, Kerekes I,Zsigmond L, Nagy R, Alvarado M, Krasovskaja I, Gal M et al. : Distribution of 1000 sequenced T-DNA tags in the Arabidopsis genome . Plant J 2002, 32 :233-242.8. Sessions A, Burke E, Presting G, Aux G, McElver J, Patton D,Dietrich B, Ho P, Bacwaden J, Ko C et al. : A high-throughput Arabidopsis reverse-genetics system . Plant Cell 2002, 14 :2985-2994.9. Krysan PJ, Young JC, Jester PJ, Monson S, Copenhaver G,Preuss D, Sussman MR: Characterization of T-DNA insertionsitesin Arabidopsisthaliana andtheimplicationsforsaturationmutagenesis . OMICS 2002, 6 :163-174.10. Galbiati M, Moreno MA, Nadzan G, Zourelidou M, Dellaporta SL: Large-scale T-DNA mutagenesis in Arabidopsis forfunctional genomic analysis . Funct Integr Genomics 2000, 1 :25-34.11. Bhatt AM, Page T, Lawson EJ, Lister C, Dean C: Use of Ac asan insertional mutagen in Arabidopsis . Plant J 1996, 9 :935-945.12. Marsch-Martinez N, Greco R, Van Arkel G, Herrera-Estrella L,Pereira A: Activation tagging using the En-I maizetransposon system in Arabidopsis . Plant Physiol 2002, 129 :1544-1556.13. Waterhouse PM, Helliwell CA: Exploring plant genomes by RNA-induced gene silencing . Nat Rev Genet 2003, 4 :29-38.14. Karimi M, Inze´ D, Depicker A: GATEWAY TM vectors for Agrobacterium -mediatedplanttransformation . TrendsPlantSci 2002, 7 :193-195.15. Reboul J, Vaglio P, Rual J-F, Lamesch P, Martinez M, ArmstrongCM, Li S, Jacotot L, Bertin N, Janky R et al. : C. elegans ORFeomeversion1.1:experimentalverificationofthegenomeannotationandresourceforproteome-scaleproteinexpression . NatGenet 2003, 34 :35-41.16. Paz-Ares J and the REGIA Consortium: REGIA, an EU project onfunctional genomics of transcription factors from Arabidopsisthaliana . Comp Funct Genom 2002, 3 :102-108. European consortia building integrated resources for Arabidopsis functional genomics Hilson, Small and Kuiper 429 www.current-opinion.com Current Opinion in Plant Biology 2003, 6 :426–429