Plant Genome IV, Arabidopsis Workshop
Plant Genome IV was held in San Diego, January 14-18 1996.
This is a large meeting covering the status of plant genome research.
During this meeting a workshop was held devoted to Arabidopsis. What
follows is a short report of the workshop by
the participating speakers.
The session was chaired by Renate Schmidt.
Physical Mapping of Arabidopsis Chromosomes 4 and 5: Renate Schmidt
Physical mapping on Arabidopsis Chromosome 2: Howard Goodman
Sequencing Chromosome 4 of Arabidopsis: Mike Bevan
Genomic sequencing: Strategy and Technology :Ron Davis
Genetic and physical mapping of Nucleolus Organizer Regions and their associated telomeres on Arabidopsis thaliana chromosomes 2 and 4. :Craig Pikaard
Informatics of ESTs :Ernie Retzel
Site Selected Mutagenesis of Actin Genes in Arabidopsis :Ken Feldmann
Renate Schmidt
Max-Delbrueck-Laboratorium in der MPG, Carl-von-Linne-Weg 10, 50829 Koeln,
Fed. Rep. Germany
Four YAC contigs made up of 563 clones, covering >90% of the chromosome
and ca. 17 Mb, have been produced using a total of 263 probes. YAC clones
were positioned relative to each other and to markers by taking into
account marker and end-fragment hybridization data and the sizes of all
YAC
clones. This analysis made it possible to estimate physical distances
between the majority of chromosome 4 markers. The availability of duplicate
cover over most genomic intervals means that the map is very reliable
despite the fact that more than 18% of the clones forming the map are
chimaeric. The detailed arrangement of the YAC clones mapping to chromosome
4 is available on the World Wide Web (WWW) at URLs:
http://nasc.nott.ac.uk/JIC-contigs/JIC-contigs.html and
http://genome-www.stanford.edu/Arabidopsis/JIC-contigs.html.
148 molecular markers mapping to chromosome 5 have been used in colony
hybridization experiments with four YAC libraries. This resulted in 660
YAC
clones - 35 YAC contigs - being anchored on chromosome 5.
Howard Goodman
Department of Molecular Biology, Massachusetts General Hospital, Boston,
Massachusetts 02114, USA
A yeast artificial chromosome (YAC) physical map of chromosome 2 of
Arabidopsis thaliana has been constructed by hybridization of 69
DNA
markers and 61 YAC end probes to gridded arrays of YAC clones. Thirty-four
YACs in four contigs define the chromosome. Complete closure of the map
was not attained because some regions of the chromosome were repetitive
or
were not represented in the YAC library. Based on the sizes of the YACs
and their coverage of the chromosome, the length of chromosome 2 is
estimated to be at least 18Mb. Work is in progress to obtain clones
bridging the gaps and to obtain a higher density map using BACs. A
large-scale sequencing effort on chromsome 2 has also been initiated.
These data provide the means for immediately identifying the YACs
containing a genetic locus mapped on Arabidopsis chromosome 2.
Mike Bevan
Department of Molecular Genetics, JICPSR, Colney Lane, Norwich, NR4 7UH,
UK
The first stage of Arabidopsis genome sequencing began nearly 2 years ago
in the
EC, when a consortium of 17 labs began systematic sequencing chromosome
4 and in
other regions. So far cosmids have been sequenced, and most of these have
been derived from
existing Goodman contigs of Lorist clones, and from subclone libraries of
YACs.
BAC libraries, made from either Hind3 or EcoR1 digested DNA, which contain
100kb
inserts, are now available. These are more suited for larger scale shotgun
sequencing,
and are being assembled into contigs. The combined 20x coverage indicates
large regions
may be contigged, with relatively few gaps to be filled by cosmids. Currently
3 BACs
are being sequenced in the FCA region.
Sequence in the central region of the lower arm of chromosome 4 is very
information-rich, with a putative or real gene every 5 kb on average, and
intergenic
distances ranging from 0 (one gene within another) to over 5 kb.The sequence
of nearly 200
genes in a 1000 kb section of chromosome 4 is now known. Clustering of functionally
related genes has been observed, in the case of the heat shock transcription
factor and
Peronospora resistance gene (RPP5) cluster. About 60% of the putative genes
have
homologs in other species, and half of these homologs have not previously
been
observed in plants. But it must be borne in mind that some of these homologies
are to
genes with no known function in other organisms. Nearly 25% of the genes
whose
functions may be predicted based on homology to known genes are involved
in
metabolism, which is remarkable if found to extend to a larger set of genes.
The
40% of putative genes with no known homologs may represent a plant-specific
set of
genes. It is likely that a significant proportion of these genes are bona
fide as some
of them match sequenced ESTs.
Progress is now sufficient to carry the programme forward to complete 1800
kb
of sequence from the FCA region, 500 kb from other regions and 450kb from
the
AP2 region on chr 4 by the end of 1996. The high information content of
the
genome, the near absence of repetitive structures, and the complex structure
of the
genes themselves indicates that an approach that yields the most accurate
sequence
is required in order to interpret the sequence to an extent commensurate
with the
high potential information content.
The scope of sequence work has extended to the US where plans to sequence
10Mb
are being considered. Similarly, in the EU plans to sequence the remainder
of
chromosome 4 (14Mb) are being considered for funding. Together this work
should
complete sequencing the 100Mb coding region by about 2004, if not sooner.
Ron Davis
Stanford DNA Sequence and Technology Center, 855 California Avenue, Palo Alto, CA 94304, USA
A Consortium has been established for the purpose of contributing to the
sequencing of the Arabidopsis genome. Members of the consortium include
Ron Davis'
group at the Stanford Center for DNA Sequencing and Technology, Dr. Sakis
Theologis' group at the Plant Gene Expression Center/USDA at Albany, CA,
and Dr. Joe Ecker's group at the University of Pennsylvania. The consortia
have
initiated sequencing a YAC clone from chromosome 1, and plan to compare
the efficiency of sequencing YAC vs. BAC clones.
New technologies relating to sequencing have been developed and are currently
in use
at the Stanford Center. They include a simple shearing device for random
breakage of high
molecular weight DNA, an automated plaque/colony picker, a DNA template
purification instrument, and a 96-well oligonucleotide synthesizer.
These technologies and others still under development will be available
for the Arabidopsis genome project.
Craig Pikaard
Biology Department, Washington University, Campus Box 1137, One Brookings
Drive
St. Louis, MO 63130
The work of Gregory Copenhaver, an exceptional Ph.D. student in my lab,
was
presented at the workshop. Greg has mapped two difficult regions of the
Arabidopsis thaliana genome on the northern tips of chromosomes 2
and 4. These
regions are the large ribosomal RNA gene arrays comprising genetic loci
known as
nucleolus organizer regions (NORs). These chromosomal loci are difficult
to map
by conventional RFLP analyses because of the concerted evolution of the
rRNA
genes within a species which limits heterogeneity even between strains.
The
loci are also refractory to physical mapping due to their size, which we
estimate to be 3.5-4.0 Mbp each. However, a careful search identified several
restriction endonucleases that cleave a small percentage of the rRNA genes,
allowing the NORs to be digested into large fragments that can be resolved
by
pulsed-field gel electrophoresis (specifically, CHEF). The NOR digestion
profiles are strain-specific, allowing conventional RFLP mapping with the
caveat
that it must be done on CHEF gels. Using this technique Greg mapped the
NORs,
NOR2 and NOR4, to the distal northern tips of chromosome 2 and 4. From this
starting point, Greg then used an rRNA gene- specific endonuclease, I-Ppo
I to
map associated loci and showed that the telomeres on the northern ends of
chromosomes 2 and 4 directly adjoin the rRNA genes. Finally, using a
combination of CHEF and conventional gel electrophoresis, two-dimensional
RFLP
analyses were used to deduce the fine structure of the NORs and the locations
of
rRNA gene variants.
Four classes of gene variants in the strain Landsberg could be defined by
the presence
or absence of restriction endonuclease sites or by differences in the size
of the spacer
sequences that separate adjacent genes. Each class of variant was shown
to be highly
clustered and segregated from other classes, providing a clear snapshot
of the two
co-evolving loci. Greg's data suggest that rRNA gene homogenization during
concerted evolution occurs by local spreading of new variants.
These insights set the stage for Arabidopsis thaliana, like Drosophila,
to be a
powerful model system for the study of rRNA gene molecular evolution.
For additional details, readers can consult three of Greg's papers published
in
The Plant Journal, one in 1995 and two in the February 1996 issue.
Ernie Retzel
Medical School, University of Minnesota, Minneapolis, MN 55455, USA
In the course of the NSF-sponsored Arabidopsis cDNA sequencing project, we
have developed a variety of tools and methods for highly automated processing,
analyzing and distribution of cDNA/EST sequence and analysis information.
These developments include tools for the handling of raw sequence data with
little or no individual intervention, the automated and high performance
analysis of the raw data, the creation of Web-based tools for accessing
WAIS-indexed similarity information, the interactive, Web-based visualization
of Blast results, the development of motif exploration tools for datasets
from Arabidopsis thaliana, Brassica napus, corn [maize], loblolly
pine and rice, and the ability to search on these datasets, individually
or in concert, using both suffix tree and Blast tools on our server.
As the Arabidopsis cDNA sequencing project progresses, a relatively
large dataset has begun to
accumulate, with more than 22,000 ESTs processed and analyzed. We have
entered a phase beyond the archiving of sequences and their respective similarity
results.
Specifically, we have begun the extraction of derived information in these
sequences, as well as
analyses by comparative genome analysis, by cluster analysis, and by the
development of complex
queries on finely structured data loaded into an object-relational database
management system
[RDBMS]. The RDBMS includes information from the public databases, specifically
GenBank,
GenInfo and PIR. Among the information which is being developed are those
sequences which are
uniquely represented in the dataset, and those which presently appear to
be unique to a variety of
species. A powerful caveat from this information is that the comparative
species datasets are not
presently sufficiently large to draw biological conclusions; however the
results are more than
enticing.
Ken Feldmann
Dept. of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
Worldwide, more than 20,000 independent T-DNA generated transformants of
Arabidopsis, containing an average of 1.5 T-DNA insertions each,
have been generated. These T-DNAs have been shown to insert randomly at
the locus and chromosome level. With a genome of 120 Mbp, an average gene
length of 5kb and 30,000 random insertions there is an ~75% P of an insert
in any average gene. To extend the utility of this population beyond
forward genetics, our group together with Rich Meager's group
have developed a PCR-based reverse genetics approach to identify T-DNA
insertions in actin genes. DNA, isolated from pools of 100 transformants,
representing a total of 5,300 individual transformants, served as the substrate
in PCR reactions with T-DNA border-specific primers and a degenerate actin
primer. With the actin degenerate primer, insertions into all 10 actin genes
were screened simultaneously. PCR products were
transferred to filters and probed for products homologous to actins.
Insertion mutants were isolated for both ACT2 and ACT4. We are now
employing this approach for other gene families and having similar success.
see McKinney et al. 1995. Sequence-based identification of T-DNA
insertion mutations in Arabidopsis: actin mutants act2-1 and act4-1.
Plant J.8:613-622.