Preview only show first 10 pages with watermark. For full document please download

Tmp4e25

Current Biology 21, 328–333, February 22, 2011 ª2011 Elsevier Ltd All rights reserved DOI 10.1016/j.cub.2011.01.037 Report Red and Green Algal Monophyly and Extensive Gene Sharing Found in a Rich Repertoire of Red Algal Genes Cheong Xin Chan,1,5 Eun Chan Yang,2,5 Titas Banerjee,1 ´ Hwan Su Yoon,2,* Patrick T. Martone,3 Jose M. Estevez,4 and Debashish Bhattacharya1,* 1Department of Ecology, Evolution, and Natural Resources and Institute of Marine and Coastal Sciences, Rutgers University, New Br

   EMBED


Share

Transcript

  Current Biology 21 , 328–333, February 22, 2011 ª 2011 Elsevier Ltd All rights reserved DOI 10.1016/j.cub.2011.01.037 ReportRed and Green Algal Monophylyand Extensive Gene Sharing Foundin a Rich Repertoire of Red Algal Genes Cheong Xin Chan, 1,5 Eun Chan Yang, 2,5 Titas Banerjee, 1 Hwan Su Yoon, 2, *Patrick T. Martone, 3 Jose´ M. Estevez, 4 and Debashish Bhattacharya 1, * 1 Department of Ecology, Evolution, and Natural Resourcesand Institute of Marine and Coastal Sciences, RutgersUniversity, New Brunswick, NJ 08901, USA  2 Bigelow Laboratory for Ocean Sciences, West BoothbayHarbor, ME 04575, USA  3 Department of Botany, University of British Columbia, 6270University Boulevard, Vancouver, BC V6T 1Z4, Canada 4 Instituto de Fisiologı´a,Biologı´aMolecular y Neurociencias(IFIBYNE UBA-CONICET), Facultad de Ciencias Exactas yNaturales, Universidad de Buenos Aires, 1428 Buenos Aires, Argentina SummaryThe Plantae comprising red, green (including land plants),and glaucophyte algae are postulated to have a singlecommon ancestor that is the founding lineage of photosyn-thetic eukaryotes [1, 2]. However, recent multiproteinphylogenies provide little [3, 4] or no[5, 6] support for this hypothesis. This may reflect limited complete genome dataavailable for red algae, currently only the highly reducedgenome of Cyanidioschyzon merolae  [7], a reticulate geneancestry [5], or variable gene divergence rates that misleadphylogenetic inference [8]. Here, using novel genome datafrom the mesophilic Porphyridium cruentum  and Calliar- throntuberculosum  ,weanalyze60,000novelredalgalgenesto test the monophyly of red + green (RG) algae and theirextent of gene sharing with other lineages. Using a gene-by-gene approach, we find an emerging signal of RGmonophyly (supported by w 50% of the examined proteinphylogenies) that increases with the number of distinctphyla and terminal taxa in the analysis. A total of 1,808phylogenies show evidence of gene sharing betweenPlantaeandotherlineages.Wedemonstratethatarichmeso-philic red algal gene repertoire is crucial for testing contro-versial issues in eukaryote evolution and for understandingthe complex patterns of gene inheritance in protists.Results and DiscussionAssessing Red and Green Algal Monophyly Basedon Exclusive Gene Sharing Here,with36,167expressedsequence-tagged(EST)unigenesfrom Porphyridium cruentum and 23,961 predicted proteinsfrom Calliarthron tuberculosum , we report analyses of >60,000 novel genes from mesophilic red algae. Of the 36,167 P. cruentum unigenes (6.7-fold greater than the gene number [5,331] from Cyanidioschyzon merolae [7]), 13,632 encodeproteins with significant BLASTp hits (e value % 10 2 10  ) tosequences in our local database, in which we included the23,961 predicted proteins from C. tuberculosum (seeTableS1available online). Of these hits, 9,822 proteins (72.1%,includingmany P.cruentum paralogs)werepresentin C.tuber-culosum and/or other red algae, 6,392 (46.9%) were sharedwith C.merolae ,and1,609werefoundonlyinredalgae.Atotalof 1,409 proteins had hits only to red algae and one other phylum. Using this repertoire, we adopted a simplified recip-rocal BLAST best-hits approach to study the pattern of exclu-sive gene sharing between red algae and other phyla (seeExperimentalProcedures ).Wefoundthat644proteinsshowedevidence of exclusive gene sharing with red algae. Of these,145 (23%) were found only in red + green algae (hereafter,RG) and 139 (22%) only in red + Alveolata ( Figure 1 A).Incomparison,wefoundonly34(5%)proteinsinred+Glauco-phyta,likelyasaresultofthelimitedavailabilityofglaucophytedata in the database. As we restricted this search by requiringa larger number of hits per query (x) from both phyla, theproportion of RG proteins increased relative to other taxa.Forinstance, thenumber ofred +Alveolataand red+Metazoaproteins was reduced from 139 / 1 / 0 and from 55 / 3 / 0 when x R 2 (644 proteins), x R 10 (96 proteins), and x R 20(22 proteins), respectively ( Figures 1 A–1C). This BLASTpanalysis is based on the implicit assumption that significantsimilarity among a group of sequences indicates a putativehomologous relationship (i.e., a shared common ancestry).This approach could potentially be misled by convergence atthe amino acid level that results in high similarity among non-homologous sequences (i.e., homoplasy [9, 10]). Alternatively,because RG are primarily photoautotrophs, exclusive genesharing could be explained by these lineages having retainedacommonsetofancestralgenesthatwerelostinothereukary-otes. With these potential issues in mind, we suggest thatexclusive gene sharing (as defined by significant reciprocalBLASTp hits) provisionally favors the RG grouping. Gene Sharing between RG and Other Lineages Using a phylogenomic approach, we generated maximum-likelihood (ML) trees for each of the 13,632 P. cruentum proteins with significant hits to the local database. One of the major confounding issues in phylogenomic analysis isinadequate and/or biased taxon sampling. To reduce suchbiases in our inference of gene phylogeny, we restricted our analysis to trees that contain R 3 phyla (per tree) and analyzedthese phylogenies based on the minimum number of terminaltaxa per tree (n), ranging from 4 to 40 ( Figure 1D). The expec-tation was that the impact of inadequate taxon samplingon our interpretation of the data would be minimal in treeswith large n. Applying these restrictions, n R 4 returned1,367 trees that contained red algae positioned withina strongly supported (bootstrap R 90%) monophyletic clade( Figure 1D); the majority of these trees (1,129 of 1,367; 83%)had n R 10. Among the 1,367 trees, 329 showed exclusiveRG monophyly, of which 53 trees defined RG + glaucophytes(i.e., were putatively Plantae-specific). The number of treesthat recovered the RG remained similar between cases of n R 4 and n R 10,with only 71 trees having n between 4 and9. As n increased, the proportion of RG groupings remained *Correspondence:[email protected](H.S.Y.),bhattacharya@aesop. rutgers.edu(D.B.) 5 These authors contributed equally to this work  similar across all categories, although the number of treessupporting this clade gradually decreased. These estimatesreflect our current database and will change as more genomedata become available.Figure 2 A shows the phylogeny of a putative Plantae-specific gene (of unknown function) thatappears to have undergone an ancient gene duplication inthe Plantae ancestor followed by subsequent duplications(particularly among land plants). ACBD Figure 1. Analysis of Predicted Proteins from the Red Alga Porphyridiumcruentum (A–C) The distribution of phyla with exclusive BLASTp hits to P. cruentum proteins where the number of hits per query (x) is as follows, (A) x R 2, (B)x R 10, and (C) x R 20. The colors indicate the different phyla that shareproteins exclusively with P. cruentum .(D) The percentage of maximum-likelihood (ML) protein trees (raw numbersshown in thebarsforthe fivemost frequentlyfoundgroupings)that supportthe monophyly of red algae with other eukaryote phyla (bootstrap R 90%).The impact of increasing the number of terminal taxa in each tree (n) onthese proportions is shown for the progression from 4 / 10 / 20 / 30 / 40. The total number of trees for each category is shown on top of eachbar. Thecategory‘‘Red-Green(RG)exclusive’’referstotreesin whichthesetwophylaformanexclusiveclade,whereas‘‘Red-Green(RG)shared’’referstotrees inwhichred-greenmonophylyis well supported but other phyla arefound within this clade (i.e., due to gene sharing). See alsoFigure S1. Polysphondylium pallidum gi281208101Dictyostelium discoideum AX4 gi66819463Dictyostelium purpureum jgi44467  99 Lachancea thermotolerans CBS 6340 gi255714072 Phaeosphaeria nodorum SN15 gi169617722 Gibberella zeae PH-1 gi46127145 Schizosaccharomyces pombe gi19113907  609999 Viridiplantae-Chlorella vulgaris jgi25303Viridiplantae-Chlorella NC64A jgi32681 Rhodophyta-Porphyridium cruentum Contig11360 4Rhodophyta-Porphyridium cruentum GCDJ7DB01DXL12 5  Viridiplantae-Chlamydomonas reinhardtii gi159480824Viridiplantae-Volvox carteri jgi67793 100 100 Viridiplantae-Micromonas CCMP1545 jgi6463Viridiplantae-Micromonas sp. RCC299 gi255086013Viridiplantae-Micromonas RCC299 jgi103549 100 Viridiplantae-Ostreococcus RCC809 jgi42833Viridiplantae-Ostreococcus tauri jgi22643Viridiplantae-Ostreococcus lucimarinus CCE9901 gi145356458 Viridiplantae-Ostreococcus lucimarinus jgi42284 10071 100100 Viridiplantae-Physcomitrella patens subsp. patens gi167999572 Viridiplantae-Physcomitrella patens jgi112456 Viridiplantae-Zea mays gi226508476 Viridiplantae-Oryza sativa Japonica Group gi115445815 Viridiplantae-Ricinus communis gi255540591Viridiplantae-Arabidopsis lyrata jgi496486 Viridiplantae-Arabidopsis thaliana gi79546079 100 Viridiplantae-Populus trichocarpa gi224135861 6769 741009672 Viridiplantae-Populus trichocarpa gi224129280 Viridiplantae-Ricinus communis gi255540033Viridiplantae-Arabidopsis lyrata jgi470113Viridiplantae-Arabidopsis thaliana gi15223439Viridiplantae-Zea mays gi226496273 98 Viridiplantae-Sorghum bicolor gi242062504Glaucophyta-Cyanophora paradoxa Contig23 4 Rhodophyta-Cyanidioschyzon merolae CMT608C Rhodophyta-Galdieria sulphuraria Gs42470.1 Fragilariopsis cylindrus jgi206207 Thalassiosira pseudonana gi219125311Phaeodactylum tricornutum gi219125311 60 88100699110096 10095 990.2 substitutions/site STRAMENOPILESPLANTAEFUNGIAMOEBOZOA B Rhodophyta-Porphyra haitanensis 61663183 2 Rhodophyta-Calliarthron tuberculosum g6851t1Rhodophyta-Cyanidioschyzon merolae CMX001C Rhodophyta-Porphyridium cruentum Contig9125 2  9166 Glaucophyta-Glaucocystis nostochinearum GNL00002567 1Viridiplantae-Physcomitrella patens jgi20837 Viridiplantae-Zea mays gi212274441Viridiplantae-Arabidopsis lyrata jgi483095 Viridiplantae-Arabidopsis thaliana gi22331818  91 90 Viridiplantae-Chlorella vulgaris jgi81458 Viridiplantae-Volvox carteri jgi79235 Viridiplantae-Chlamydomonas reinhardtii gi159490938  100 Viridiplantae-Vitis vinifera gi225450009Viridiplantae-Oryza sativa Japonica Group gi115442333Viridiplantae-Sorghum bicolor gi242059969Viridiplantae-Zea mays gi226533441 100 Viridiplantae-Oryza sativa Japonica Group gi115463661Viridiplantae-Zea mays gi226529786 Viridiplantae-Sorghum bicolor gi242087801 100 98 Viridiplantae-Ricinus communis gi255544908 Viridiplantae-Populus trichocarpa gi224059548 Viridiplantae-Populus trichocarpa gi224104171Viridiplantae-Arabidopsis lyrata jgi909228 Viridiplantae-Arabidopsis thaliana gi18416870  91 100 Viridiplantae-Vitis vinifera gi225435391 93 93 69 82 Viridiplantae-Populus trichocarpa gi224059600 Viridiplantae-Ricinus communis gi255553448  62 63 Viridiplantae-Physcomitrella patens subsp. patens gi168034841Viridiplantae-Physcomitrella patens subsp. patens gi168028204 9267100991001009097 Rhodophyta-Cyanidioschyzon merolae CMM247C  Glaucophyta-Cyanophora paradoxa Contig638 4Viridiplantae-Ostreococcus lucimarinus CCE9901 gi145349444Viridiplantae-Micromonas sp. RCC299 gi255076153Viridiplantae-Micromonas pusilla CCMP1545 gi303278506  92 Viridiplantae-Chlorella vulgaris jgi84560  100 Viridiplantae-Ostreococcus tauri gi116055893Viridiplantae-Chlorella NC64A jgi137672 Viridiplantae-Chlorella vulgaris jgi72495 Viridiplantae-Volvox carteri jgi106345 Viridiplantae-Chlamydomonas reinhardtii gi159473677  619896 Viridiplantae-Arabidopsis thaliana gi15242979Viridiplantae-Arabidopsis lyrata jgi326573Viridiplantae-Populus trichocarpa gi224103009Viridiplantae-Vitis vinifera gi225436365 Viridiplantae-Arabidopsis lyrata jgi482793Viridiplantae-Arabidopsis thaliana gi79324637 Viridiplantae-Vitis vinifera gi225440504Viridiplantae-Oryza sativa Japonica Group gi115471329Viridiplantae-Sorghum bicolor gi242043424Viridiplantae-Ricinus communis gi255577977  100 100 Viridiplantae-Sorghum bicolor gi242073740 Viridiplantae-Zea mays gi226491452 Viridiplantae-Oryza sativa Japonica Group gi115459498  97 68 97 Viridiplantae-Physcomitrella patens subsp patens gi168035809 927210093100 Viridiplantae-Chlorella vulgaris jgi77486 Viridiplantae-Chlorella NC64A jgi23974Viridiplantae-Chlorella NC64A jgi20786 Viridiplantae-Chlamydomonas reinhardtii gi159472571 Rhodophyta-Calliarthron tuberculosum g9362t1Rhodophyta-Cyanidioschyzon merolae CMT574C  100 81100 Rhodophyta-Cyanidioschyzon merolae CMT191C  87 0.5 substitutions/site A Figure 2. Plantae Evolution and Gene Sharing(A) Phylogeny of a gene of unknown function that is putatively specific toPlantae.(B) Phylogeny of a gene encoding a putative phosphoglyceride transfer protein, SEC14, with a well-supported monophyly (bootstrap 95%) of plants, red algae, the glaucophytes, and diatoms and a monophyly (boot-strap 100%) between Porphyridium cruentum and green algae (includingother plants). RAxML [30] bootstrap support values R 60% based on100 nonparametric replicates are shown at the nodes. Red algae are shownin boldface and glaucophytes in gray. The unit of branch length is thenumber of substitutions per site. See alsoFigure S2.  Assessing Red and Green Algal Monophyly 329  In these analyses, we also examined instances of RGmonophyly in which other taxa interrupted this clade, e.g.,Stramenopiles, presumably resulting from endosymbiotic/ horizontal gene transfer (E/HGT). We referred to suchinstances as ‘‘RG shared’’ ( Figure 1D), whereby there wasa strongly supported monophyly (i.e., bootstrap R 90%) of RG algae with other non-Plantae lineages. We applied thecondition that RG shared clades include R 75% of all terminaltaxa in a tree, and within this clade, a majority (>50%) of thetips defined red and green algae. Using this definition, wefoundanadditional413treesthatsupportRGmonophyly( Fig-ure 1D). Therefore, at n R 4, a total of 742 (54%) of 1,367 treesreturned by our pipeline supported the RG union (bootstrap R 90%). At a less stringent bootstrap threshold of  R 70%, 997(46%) of 2,167 trees showed support for RG monophyly ( Fig-ure S1 ). An example of a phylogeny showing nonexclusivegene sharing with Plantae lineages is shown inFigure 2B for a putative phosphoglyceride transfer protein, SEC14. Thephylogeny shows a well-supported monophyly (bootstrap95%) of plants, red algae (  Galdieria sulphuraria and C. mero- lae  ), the glaucophyte Cyanophora paradoxa , and diatoms.The diatom gene likely arose via secondary endosymbioticgene transfer from a red algal donor [11]. In addition, a diver-gent red algal-derived gene copy is present in P. cruentum that groups with green algae and other gene copies found inplants (bootstrap 100%). Although complete genome datafrom glaucophytes and other red algae are required to delin-eate the extent of gene duplication and convergence betweenthese two lineages, this phylogeny illustrates two key proper-tiesofprotistgeneandgenomeevolutionthatposechallengesto the inference of lineage relationships: ancient gene duplica-tion (e.g., multiple copies in plants) and loss (i.e., putatively of a gene copy in green algae, e.g., Ostreococcus spp.), andnonlineal gene sharing involving algal lineages.The next most frequently found positions of red algaein these trees were as sister to Stramenopiles (168, 12%), Alveolata (91, 7%), Excavata (68, 5%), and Cryptophyta (66,5%). Increasing the minimum number of terminal taxa per tree (n) from 4 through 40 (while maintaining R 3 phyla) didnot affect the relative proportion of trees that supportRG monophyly, but the number of cases with other well-supported phylogenetic affiliations (e.g., red + Metazoa,red + Fungi) fell sharply ( Figure 1D). When we relaxed thebootstrap threshold to R 70%, the patterns reported heregenerally remained unchanged ( Figure S1 ) but allowed theidentification of single-protein markers that may prove usefulfor delineating the eukaryote tree of life (e.g., V-type ATPaseI 116 kDa subunit family;Figure S2; see also [12]). We found 1,808 trees that showed strong support (at boot-strap R 90%) for the monophyly of RG with other ‘‘foreign’’taxa.Figure 3shows the number of these trees that containdifferent foreign phyla within the well-supported RG clade.The sources of the foreign genes are depicted in a schematicrepresentation of the putative tree of life. The most commonpartners of gene sharing with RG (i.e., barring significantphylogenetic artifacts in our approach) are Stramenopiles(e.g., the diatoms; 1,264 proteins), bacteria other than Cyano-bacteria (1,108), Haptophyta (839), Cyanobacteria (827), Alveolata(622), and Metazoa (473). The majority of theseproteins (1,322 of 1,808) are shared between RG and two or more other phyla, demonstrating the complex evolutionaryhistory of the algal genes. We recognize that our results arebiased by the unbalanced contribution of available genomedata from microbial eukaryotes in our database (e.g., diatomsare gene rich, cryptophytes are gene poor). In addition, thedetection of gene transfer using phylogenetic approachesis susceptible to a number of technical limitations such asmodularity [13, 14] and amelioration [15, 16] of the transferred genes, which result in underestimation of the extent of HGT ingene-by-gene surveys. Nevertheless, our findings indicatethat single-gene or multigene analysis of Plantae should takeinto account extensive gene sharing vis-a `-vis other eukaryotelineages (e.g., nongreen affiliation in nearly one-half of  P. cruentum proteins shown inFigure 1D).Lastly, we examined whether the observed signal of RGmonophyly was contributed primarily or solely by nuclear-encoded plastid-targeted proteins (i.e., whether they reflectthe evolution of the organelle rather than the host cell). To dothis, we analyzed all RG-exclusive and RG-shared proteins( Figure 1D) with n ranging from 4 through 40. Using anintegrated pipeline that incorporates three independenttarget-prediction approaches, we found that circa 40% of the proteins that support RG monophyly at bootstrap R 90%may be plastid targeted (253 of 742, 34.1% at n R 4; 119 of 283, 42.1% at n R 40; seeSupplemental Experimental Proce-dures ). Although bioinformatic predictions of organelle target-ing are clearly provisional, these results suggest that inaddition to the expected significant contribution to plastidfunction by proteins that unite the RG (i.e., the vast majorityof these taxa are photoautotrophs), over one-half of themmay not be destined for the plastid. Enrichment of Red Algal Genes EnhancesOur Understanding of Eukaryote Evolution To investigate the impact of increasing the number of genesavailable from mesophilic red algae in comparison to use of the genes of  C. merolae alone, we applied the reciprocalBLASTp best-hits approach using C. merolae proteins as thequery against our database. In this case, however, weexcluded P. cruentum and C. tuberculosum from the data-base. With this approach, we found 127 proteins that showedexclusive gene sharing with red algae, of which 39 (31%)  Archaea 153 Other Bacteria 1108  Cyanobacteria 827  Stramenopiles 1264  Alveolata 622 401 Haptophyta 839 Cryptophyta 379378  Euglenozoa 295  Fungi 400  Metazoa 473      B    a    c     t    e    r     i    a PROKARYOTES RhodophytaViridiplantaeGlaucophyta PlantaeRhizariaExcavataAmoebozoaOpisthokonta      E     U     K     A     R     Y     O     T     E     S 20040060080010001200Color key Other Excavates 275  Figure3. SchematicRepresentationofthePutativeTreeofLifeShowingtheExtent of RG Gene Sharing with Other Eukaryote and Prokaryote PhylaThe branch shown as a dashed line represents ambiguous relationshipsamong the lineages to the right. The color key indicates the number of treesfound for each ‘‘foreign’’ (non-RG) phylum. Current Biology Vol 21 No 4330  providedevidencefortheRGgrouping.Therefore,inclusionof our novel red algal genome data results in a nearly 4-foldincrease in the number of red algal genes (145 versus 39)that support exclusive gene sharing among RG taxa.Our findings also show that red algal genes are distributedamong diverse eukaryote lineages that in many instances(e.g., Stramenopiles, Cryptophyta, Haptophyta) are mostcertainly explained by endosymbiotic gene transfer becausethese taxa contain a red algal-derived plastid [5, 17, 18]. Of the 474 proteins that show strong support for RG monophyly(145 found only in RG [Figure 1 A]; 329 with RG showingexclusive monophyly at bootstrap R 90% [Figure 1D]), only129 (27.2%) have homologs in C. merolae . Therefore, withrespect to testing the RG or Plantae hypothesis, the red algalgene repertoire from P. cruentum and C. tuberculosum contributes an almost 4-fold increase in the number of redalgal genes useful for phylogenomic analysis as comparedwith C. merolae alone. In addition, only 1,207 (67% of 1,808)genes with a history of gene sharing include homologs from C. merolae , suggesting that the extent of gene transfer ineukaryotes has been significantly underestimated in previousphylogenetic analyses that relied on a more limited sampleof red algal genes.In summary, we have uncovered clear evidence of RGmonophyly in our analysis of reciprocal BLASTp hits and indi-vidual protein trees. No competing hypothesis rises to thelevel of support that we found for the RG clade. Testing thecoherence of the Plantae hypothesis will require the additionof complete genome data from Cyanophora paradoxa and other glaucophytes. This is of great interest, becausethe Plantae lineages provide an important opportunity toadvance our knowledge of the tree of life, the intricacies of genome evolution among protists, and the srcin of photosyn-thesis ineukaryotes.Forexample, ithasbeenknown forsometime that Plantae share key traits that are usually, but notexclusively, associated with photosynthesis and other plastidfunctions, which strongly supports their union [17–20].However, reliance on plastid characters (e.g., trees inferredfrom organelle genes or nuclear-encoded plastid-targetedproteins) may mislead phylogenetic inference if there hasbeenacomplexgain-and-losspatternofplastids(withassoci-atedintracellulargenetransfers)amongPlantaelineages[6,8].Therefore, finding evidence of RG and ultimately Plantaemonophylycouldgreatlyimproveourunderstandingofplastidendosymbiosis by tying together the lineages that sharea primary plastid, and therefore the innovations underlyingorganellogenesis [21]. In contrast, Plantae polyphyly will leadto more complex explanations of how primary plastids andtheir supporting nuclear genes have been distributed amongalgal lineages. In either case, what has become clear is thatconcatenated protein data sets often fail to provide resolutionof ‘‘deep’’ nodes in the tree of life, including the Plantae (e.g.,[3–6, 22]). We suggest that in light of our data, reliance onthe standard vertical inheritance model of gene evolutionto infer the eukaryote tree of life (e.g., [6, 22]) may need tobe critically reassessed on a gene-by-gene basis using an ex-panded collection of protist genomes. For instance, althoughproviding support for phylum-level relationships, the V-type ATPase I tree ( Figure S2 ) reveals a complex history of geneduplications that makes it a poor marker for species relation-ships. More problematic is the recent finding of hundreds of green algal-derived genes (that likely arose via ancient genetransfers) in diatoms and other chromalveolates [23] thatplay key roles in the cell [24]. These studies demonstratehow much still remains to be understood about the evolu-tionary history of protist genomes. Once a comprehensiveknowledge of gene history is gained, then the rapidly accumu-latinggenomedatacanbeincorporated withmoreconfidenceinto multigene tree-of-life analyses. In summary, our workdemonstrates the importance of a rich mesophilic red algalgenerepertoire in testing controversial aspects of eukaryoteevolution and in enhancing our understanding of the complexpatterns of gene inheritance among protists. Experimental ProceduresGeneration of Expressed Sequence Tags from Porphyridium cruentum  Total RNA from Porphyridium cruentum CCMP1328 (Provasoli-GuillardNational Center for Culture of Marine Phytoplankton, Boothbay Harbor,ME) was extracted (TRIzol, Invitrogen) and purified (QIAquick PCR Purifica-tion Kit, QIAGEN) according to the manufacturer’s instructions. The cDNAswere generated (Mint cDNA Synthesis Kit, Evrogen) from 2 m g of total RNA andnormalized(TrimmercDNANormalizationKit,Evrogen).ThenormalizedcDNAs were sequenced (GS FLX Titanium, Roche/454 Life Sciences) atthe University of Iowa (Iowa City, IA), resulting in 386,903 EST reads. Wefound no obviousevidence of contamination in thedata setfrom other algalsources or from bacteria based on a sequence similarity search aimed atnontarget taxa (BLAST e value % 10 2 5 [25]). We assembled the ESTs into atotalof56,490sequences with CAP3[26]using thedefaultsettings,yielding16,651 contigs and 39,839 singlets. To ensure that the phylogenetic signalderived from these sequences was significant and biologically meaningful,weexcludedcontigsoflength<150basesandsingletsoflength<296bases(median length for singlets) from subsequent analysis, resulting in 36,167unigenes for phylogenomic analysis. The assembled ESTs are available athttp://dblab.rutgers.edu/home/downloads/ . We generated six-frame trans-lations for each of these EST unigenes for the phylogenomic analysis. Partial Genome Data from Calliarthron tuberculosum  Fresh thalli of the coralline red alga Calliarthron tuberculosum werecollected from the low intertidal zone at Botanical Beach Provincial Parkon Vancouver Island, British Columbia, Canada (48  31 0 43.468 00 N, 124  27 0 12.485 00 W). Genomic DNA from the algal cells was extracted (DNeasyPlant Mini Kit, QIAGEN) and sequenced (GS FLX Titanium, Roche/454 LifeSciences) at the McGill University and Ge´ nome Que´ bec Innovation Centre(Montre´ al), resulting in circa 750 Mbp of data. These reads were assembledusing gsAssembler (Newbler) version 2.3 (Roche/454 Life Sciences) withdefault parameters, resulting in 169,975 contigs totaling 51.1 Mbp. Theassembled mitochondrial (25,515 bases) and plastid (178,624 bases)DNAs were removed prior to phylogenomic analysis. Proteins encoded bythese genome contigs were predicted using a machine learning approachunder a generalized hidden Markov model as implemented in AUGUSTUS[27], in which protein models of  Arabidopsis thaliana were used as thetraining set. These predicted proteins were incorporated into our in-housesequence database for subsequent phylogenomic analysis. All genomecontigs and predicted proteins of  C. tuberculosum used in this work areavailable athttp://dblab.rutgers.edu/home/downloads/ . Analysis of Exclusive Gene Sharing Forthisandallfollowingphylogenomicanalyses,weusedanin-housedata-base consisting of all annotated protein sequences from RefSeq release37 at GenBank ( http://www.ncbi.nlm.nih.gov/RefSeq/  ), predicted proteinmodels available from the Joint Genome Institute (ftp://ftp.jgi-psf.org/ pub/JGI_data/), and six-frame translated proteins from EST data sets of all publicly available algal and unicellular eukaryote sources, i.e., dbEST atGenBank ( http://www.ncbi.nlm.nih.gov/projects/dbEST/  ) and TBestDB( http://tbestdb.bcm.umontreal.ca/  ), as well as data from P. cruentum and C. tuberculosum (see above), totaling 10,469,787 sequences ( Table S1 ).Using 36,167 unigenes of  P. cruentum as a query platform against thedatabase (BLASTp, e value % 10 2 10  ), we found 1,409 genes to have hitsonly in red algae and one other phylum. For each of the top five BLASTphits (or fewer, if there were fewer than five hits) for a P. cruetum protein(among 1,409), we generated a list of hits via BLASTp searches againstour database. The sequence hits that were found in all of these lists(including the P. cruentum protein) were grouped into a set. A protein setconsisting only of red algae and one other phylum represented a putativecase of exclusive gene sharing between the two phyla.  Assessing Red and Green Algal Monophyly 331