Plant Genome IV, Arabidopsis Workshop

Plant Genome IV was held in San Diego, January 14-18 1996.

This is a large meeting covering the status of plant genome research. During this meeting a workshop was held devoted to Arabidopsis. What follows is a short report of the workshop by the participating speakers.

The session was chaired by Renate Schmidt.

Physical Mapping of Arabidopsis Chromosomes 4 and 5: Renate Schmidt

Physical mapping on Arabidopsis Chromosome 2: Howard Goodman

Sequencing Chromosome 4 of Arabidopsis: Mike Bevan

Genomic sequencing: Strategy and Technology :Ron Davis

Genetic and physical mapping of Nucleolus Organizer Regions and their associated telomeres on Arabidopsis thaliana chromosomes 2 and 4. :Craig Pikaard

Informatics of ESTs :Ernie Retzel

Site Selected Mutagenesis of Actin Genes in Arabidopsis :Ken Feldmann

Physical mapping of Arabidopsis Chromosomes 4 and 5

Renate Schmidt
Max-Delbrueck-Laboratorium in der MPG, Carl-von-Linne-Weg 10, 50829 Koeln, Fed. Rep. Germany

Four YAC contigs made up of 563 clones, covering >90% of the chromosome and ca. 17 Mb, have been produced using a total of 263 probes. YAC clones were positioned relative to each other and to markers by taking into account marker and end-fragment hybridization data and the sizes of all YAC clones. This analysis made it possible to estimate physical distances between the majority of chromosome 4 markers. The availability of duplicate cover over most genomic intervals means that the map is very reliable despite the fact that more than 18% of the clones forming the map are chimaeric. The detailed arrangement of the YAC clones mapping to chromosome 4 is available on the World Wide Web (WWW) at URLs: http://nasc.nott.ac.uk/JIC-contigs/JIC-contigs.html and
http://genome-www.stanford.edu/Arabidopsis/JIC-contigs.html. 148 molecular markers mapping to chromosome 5 have been used in colony hybridization experiments with four YAC libraries. This resulted in 660 YAC clones - 35 YAC contigs - being anchored on chromosome 5.

Physical mapping of Arabidopsis Chromosome 2

Howard Goodman
Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts 02114, USA

A yeast artificial chromosome (YAC) physical map of chromosome 2 of Arabidopsis thaliana has been constructed by hybridization of 69 DNA markers and 61 YAC end probes to gridded arrays of YAC clones. Thirty-four YACs in four contigs define the chromosome. Complete closure of the map was not attained because some regions of the chromosome were repetitive or were not represented in the YAC library. Based on the sizes of the YACs and their coverage of the chromosome, the length of chromosome 2 is estimated to be at least 18Mb. Work is in progress to obtain clones bridging the gaps and to obtain a higher density map using BACs. A large-scale sequencing effort on chromsome 2 has also been initiated. These data provide the means for immediately identifying the YACs containing a genetic locus mapped on Arabidopsis chromosome 2.

Sequencing Chromosome 4 of Arabidopsis

Mike Bevan
Department of Molecular Genetics, JICPSR, Colney Lane, Norwich, NR4 7UH, UK

The first stage of Arabidopsis genome sequencing began nearly 2 years ago in the EC, when a consortium of 17 labs began systematic sequencing chromosome 4 and in other regions. So far cosmids have been sequenced, and most of these have been derived from existing Goodman contigs of Lorist clones, and from subclone libraries of YACs. BAC libraries, made from either Hind3 or EcoR1 digested DNA, which contain 100kb inserts, are now available. These are more suited for larger scale shotgun sequencing, and are being assembled into contigs. The combined 20x coverage indicates large regions may be contigged, with relatively few gaps to be filled by cosmids. Currently 3 BACs are being sequenced in the FCA region.

Sequence in the central region of the lower arm of chromosome 4 is very information-rich, with a putative or real gene every 5 kb on average, and intergenic distances ranging from 0 (one gene within another) to over 5 kb.The sequence of nearly 200 genes in a 1000 kb section of chromosome 4 is now known. Clustering of functionally related genes has been observed, in the case of the heat shock transcription factor and Peronospora resistance gene (RPP5) cluster. About 60% of the putative genes have homologs in other species, and half of these homologs have not previously been observed in plants. But it must be borne in mind that some of these homologies are to genes with no known function in other organisms. Nearly 25% of the genes whose functions may be predicted based on homology to known genes are involved in metabolism, which is remarkable if found to extend to a larger set of genes. The 40% of putative genes with no known homologs may represent a plant-specific set of genes. It is likely that a significant proportion of these genes are bona fide as some of them match sequenced ESTs.

Progress is now sufficient to carry the programme forward to complete 1800 kb of sequence from the FCA region, 500 kb from other regions and 450kb from the AP2 region on chr 4 by the end of 1996. The high information content of the genome, the near absence of repetitive structures, and the complex structure of the genes themselves indicates that an approach that yields the most accurate sequence is required in order to interpret the sequence to an extent commensurate with the high potential information content.

The scope of sequence work has extended to the US where plans to sequence 10Mb are being considered. Similarly, in the EU plans to sequence the remainder of chromosome 4 (14Mb) are being considered for funding. Together this work should complete sequencing the 100Mb coding region by about 2004, if not sooner.

Genomic sequencing: Strategy and Technology

Ron Davis
Stanford DNA Sequence and Technology Center, 855 California Avenue, Palo Alto, CA 94304, USA

A Consortium has been established for the purpose of contributing to the sequencing of the Arabidopsis genome. Members of the consortium include Ron Davis' group at the Stanford Center for DNA Sequencing and Technology, Dr. Sakis Theologis' group at the Plant Gene Expression Center/USDA at Albany, CA, and Dr. Joe Ecker's group at the University of Pennsylvania. The consortia have initiated sequencing a YAC clone from chromosome 1, and plan to compare the efficiency of sequencing YAC vs. BAC clones.

New technologies relating to sequencing have been developed and are currently in use at the Stanford Center. They include a simple shearing device for random breakage of high molecular weight DNA, an automated plaque/colony picker, a DNA template
purification instrument, and a 96-well oligonucleotide synthesizer.
These technologies and others still under development will be available
for the Arabidopsis genome project.

Genetic and physical mapping of Nucleolus Organizer Regions and their associated telomeres on Arabidopsis thaliana chromosomes 2 and 4.

Craig Pikaard
Biology Department, Washington University, Campus Box 1137, One Brookings Drive
St. Louis, MO 63130

The work of Gregory Copenhaver, an exceptional Ph.D. student in my lab, was presented at the workshop. Greg has mapped two difficult regions of the Arabidopsis thaliana genome on the northern tips of chromosomes 2 and 4. These regions are the large ribosomal RNA gene arrays comprising genetic loci known as nucleolus organizer regions (NORs). These chromosomal loci are difficult to map by conventional RFLP analyses because of the concerted evolution of the rRNA genes within a species which limits heterogeneity even between strains. The loci are also refractory to physical mapping due to their size, which we estimate to be 3.5-4.0 Mbp each. However, a careful search identified several restriction endonucleases that cleave a small percentage of the rRNA genes, allowing the NORs to be digested into large fragments that can be resolved by pulsed-field gel electrophoresis (specifically, CHEF). The NOR digestion profiles are strain-specific, allowing conventional RFLP mapping with the caveat that it must be done on CHEF gels. Using this technique Greg mapped the NORs, NOR2 and NOR4, to the distal northern tips of chromosome 2 and 4. From this starting point, Greg then used an rRNA gene- specific endonuclease, I-Ppo I to map associated loci and showed that the telomeres on the northern ends of chromosomes 2 and 4 directly adjoin the rRNA genes. Finally, using a combination of CHEF and conventional gel electrophoresis, two-dimensional RFLP analyses were used to deduce the fine structure of the NORs and the locations of rRNA gene variants.

Four classes of gene variants in the strain Landsberg could be defined by the presence or absence of restriction endonuclease sites or by differences in the size of the spacer sequences that separate adjacent genes. Each class of variant was shown to be highly clustered and segregated from other classes, providing a clear snapshot of the two co-evolving loci. Greg's data suggest that rRNA gene homogenization during concerted evolution occurs by local spreading of new variants.

These insights set the stage for Arabidopsis thaliana, like Drosophila, to be a powerful model system for the study of rRNA gene molecular evolution. For additional details, readers can consult three of Greg's papers published in The Plant Journal, one in 1995 and two in the February 1996 issue.

Informatics of ESTs

Ernie Retzel
Medical School, University of Minnesota, Minneapolis, MN 55455, USA

In the course of the NSF-sponsored Arabidopsis cDNA sequencing project, we have developed a variety of tools and methods for highly automated processing, analyzing and distribution of cDNA/EST sequence and analysis information. These developments include tools for the handling of raw sequence data with little or no individual intervention, the automated and high performance analysis of the raw data, the creation of Web-based tools for accessing WAIS-indexed similarity information, the interactive, Web-based visualization of Blast results, the development of motif exploration tools for datasets from Arabidopsis thaliana, Brassica napus, corn [maize], loblolly pine and rice, and the ability to search on these datasets, individually or in concert, using both suffix tree and Blast tools on our server.

As the Arabidopsis cDNA sequencing project progresses, a relatively large dataset has begun to accumulate, with more than 22,000 ESTs processed and analyzed. We have entered a phase beyond the archiving of sequences and their respective similarity results. Specifically, we have begun the extraction of derived information in these sequences, as well as analyses by comparative genome analysis, by cluster analysis, and by the development of complex queries on finely structured data loaded into an object-relational database management system [RDBMS]. The RDBMS includes information from the public databases, specifically GenBank, GenInfo and PIR. Among the information which is being developed are those sequences which are uniquely represented in the dataset, and those which presently appear to be unique to a variety of species. A powerful caveat from this information is that the comparative species datasets are not presently sufficiently large to draw biological conclusions; however the results are more than enticing.

Site-slected mutagenesis of Actin genes in Arabidopsis

Ken Feldmann
Dept. of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA

Worldwide, more than 20,000 independent T-DNA generated transformants of Arabidopsis, containing an average of 1.5 T-DNA insertions each, have been generated. These T-DNAs have been shown to insert randomly at the locus and chromosome level. With a genome of 120 Mbp, an average gene length of 5kb and 30,000 random insertions there is an ~75% P of an insert in any average gene. To extend the utility of this population beyond forward genetics, our group together with Rich Meager's group have developed a PCR-based reverse genetics approach to identify T-DNA insertions in actin genes. DNA, isolated from pools of 100 transformants, representing a total of 5,300 individual transformants, served as the substrate in PCR reactions with T-DNA border-specific primers and a degenerate actin primer. With the actin degenerate primer, insertions into all 10 actin genes were screened simultaneously. PCR products were transferred to filters and probed for products homologous to actins. Insertion mutants were isolated for both ACT2 and ACT4. We are now employing this approach for other gene families and having similar success. see McKinney et al. 1995. Sequence-based identification of T-DNA insertion mutations in Arabidopsis: actin mutants act2-1 and act4-1. Plant J.8:613-622.