Multinational Coordinated
Arabidopsis thaliana Genome
Research Project
Progress Report:
Year Seven and Eight
December, 1998

HTML Markup provided by AtDB (now TAIR)



The Multinational Science Steering Committee:

Committee Chair: Gerd Jürgens, University of Tübingen, Germany
Michael Bevan, John Innes Centre, Norwich, United Kingdom
Michel Caboche, Lab. Biol. Cellulaire, INRA, Versailles, France
Daphne Preuss, University of Chicago, Chicago, IL, USA
Joseph Ecker, University of Pennsylvania, Philadelphia, PA USA
Fernando Migliaccio, CNR, Monterotondo, Italy
Kiyotaka Okada, Kyoto University, Kyoto, Japan
David Smyth, Monash University, Clayton, Australia
Marc Van Montagu, University of Ghent, Belgium


Table of Contents


Preface
Overview of Genome Analysis
Stock Center Resources and Data Bases
National and Transnational Projects
Appendix 1: NSF Arabidopsis Genome Meeting Report
Appendix 2: Summary of December 1998 AGI Meeting at CSHL
Appendix 3: Database Workshop (Madison, WI 1998) Report
Appendix 4: "Arabidopsis thaliana Information Resource Project" Announcment

Preface


The "Multinational Coordinated Arabidopsis thaliana Genome Research Project" was established in 1990 to promote international cooperation in basic and applied research with Arabidopsis, a model plant species amenable to experimental manipulation in the laboratory. The primary objective of this project has been to understand the molecular basis of plant growth and development and to address fundamental questions in plant genetics, physiology, biochemistry, cell biology, and pathology. Initial plans were outlined in a publication (NSF #90-80) drafted nine years ago by an ad hoc committee of nine scientists from the United States, Europe, Japan, and Australia. In recent years, this project has become a model for widespread participation and effective coordination of multinational research efforts in modern biology.

Arabidopsis thaliana, a small plant in the mustard family, was chosen for this large-scale research effort because it offers many advantages for detailed genetic and molecular studies. Among these features are its small size, short life cycle, small genome, ability to be transformed, availability of numerous mutations, and prolific seed production. By concentrating research efforts on a single model organism, detailed information on specific genes and cellular processes can be readily obtained and rapidly applied to a wide range of plants relevant to agriculture, health, energy, manufacturing, and the environment.

Each year since 1990, the scientific steering committee for the Arabidopsis Genome Project has prepared a progress report summarizing recent advances in Arabidopsis research. This is the seventh annual progress report published by the steering committee in conjunction with the U.S. National Science Foundation. Three years ago the report was a color brochure designed to explain the value and significance of Arabidopsis research to a wide audience. Two years ago the report presented a detailed overview of recent advances in research with Arabidopsis, along with technical information for use by members of the Arabidopsis community. The sixth report presented an updated vision statement for the future to stimulate further advances in the use of Arabidopsis as a model system for the analysis of complex organisms.

This report covers progress for the seventh and eighth years of the project. It is focused on the large-scale analysis of the Arabidopsis genome. Specifically, this report is designed to make the available information accessible to the scientific community in a hands-on format. At the current rate of progress, the genome sequencing project can be expected to be completed within two years. The 1998 genome issue of Science (Meinke et al. 1998) featured Arabidopsis prominently.

Multinational cooperation and communication continue to be an important feature of the Arabidopsis genome project. A brief overview of Arabidopsis research efforts in a number of participating countries is therefore included in this report. Additional information can be obtained through recent publications, electronic news groups and databases, and biological resource centers devoted to Arabidopsis research. As with any document that attempts to summarize the contributions of many individuals, this report may fail to include or misrepresent some significant achievements. The steering committee hopes that members of the Arabidopsis community will overlook such shortcomings and will communicate any concerns to committee members so that future reports will be as accurate as possible. We thank all members of the Arabidopsis community for their many contributions to the success of the initial phase of the Multinational Coordinated Arabidopsis thaliana Genome Research Project.

Overview of Genome Analysis

A historical perspective


1983Publication of first genetic map
1988-89Publication of RFLP maps
1990Multinational Coordinated Arabidopsis thaliana Genome Research Project initiated
1991Arabidopsis Stock Centers at Ohio State (USA) and Nottingham (UK), as well as the Arabidopsis Data Base (AtDB), were established
1991First YAC libraries and anchoring of YAC clones to RFLP map
1992Publication of first chromosome walk (local contig)
1993Recombinant inbred (RI) map
1994-8Collections of cDNA (EST) clones sequenced linking up genetic and cytogenetic with physical maps
1995-6CIC-YACs, TAMU-BACs, IGF-BACs, Mitsui-P1, Kazusa-P1 libraries
1995-8Physical map of all 5 chromosomes delineated
Jan 98Publication of 1.9 Mb of contiguous DNA sequence from chromosome 4
June 9829 Mb of genomic DNA sequenced
Oct. 98Arabidopsis featured in genome issue of "Science"
Dec 98>46 Mb of genomic DNA sequenced and annotated 90 Mb of genomic DNA in edited BAC contigs >41,000 (of 44,000) BAC ends sequenced >11,000 non-redundant (of >37,000) EST clones
2000Completion of genome sequencing (expected date)

Integration of Genetic and Physical Maps


Two genetic maps were independently developed: a classic map of mutations (Koornneef et al., 1983) and a recombinant inbred (RI) map of molecular markers (Lister and Dean, 1993). As an increasing number of genes originally identified by mutation has been cloned and converted to molecular markers mapped onto the RI map, the two maps are beginning to merge into a unified genetic map. Map distances differ between the two maps, presumably because of the different genetic backgrounds. In addition, map distances are calculated with the Mapmaker program, resulting in local inaccuracies, such as relative order of closely linked markers. These problems will eventually be resolved by physical mapping.

The RI map is now commonly used as the standard reference, enabling new genes identified by mutation to be easily mapped by PCR markers (SSLP, CAPS). The current RI map (November 1998) contains ca. 800 markers which fall into 3 different categories: "framework" (fixed reference location), "unique" (defined location on the map) and "multiple" (several possible locations). RI markers were also used to map a collection of YAC, BAC and P1 clones from which physical maps of the 5 chromosomes were initiated, thus linking genetic and physical maps from the very beginning.

Several physical maps have been established for all 5 chromosomes. Initially, contigs of large YAC clones were assembled and anchored to RI markers (e.g. Schmidt et al., 1997; Bouchez et al., 1998). Corresponding BAC and P1 clones were identified by hybridisation with YAC clones. For chromosome 5, a nearly complete physical map was established by P1 and TAC clone contigs (Kazusa homepage; Kotani et al., 1997). BAC contigs have also been established at the global scale by fingerprinting and by hybridisation with BAC endprobes. For example, 9 Mb constituting the bottom arm of chromosome 3 have been covered by a single BAC contig (see http://www.genoscope.cns.fr/externe/English/Projets/projetsindex.html). In addition to whole-chromosome physical mapping with YAC, BAC and P1 clones, chromosome walks in several chromosome regions have yielded local contigs up to 2 Mb long (e.g. Hardtke & Berleth, 1996; Wang et al., 1997; Thorlby et al., 1997), and several hundred EST clones have been PCR-mapped onto YAC clones (Agyare et al., 1997).

Fingerprinting data of BAC clones were used to assemble contigs with FPC software, followed by manual editing to join the initial contigs. At present, ca. 70 BAC contigs encompass ca. 90 Mb of estimated 121 Mb total sequence (M. Marra & M. Sekhon, Washington University, St. Louis; M.A. Marra et al.,1997). High throughput BAC-endprobe hybridization was used as a complementary approach to assemble contigs (Mozo et al., 1998). Information gathered from 2995 hybridization data (including 272 mapped markers) was manually edited after application of the probeorder computer program and integrated with the fingerprint data to generate a complete BAC-based physical map consisting of 27 contigs distributed over the 10 chromosome arms that covers approximately 124 Mb (see: http://www.mpimp-golm.mpg.de/101/bac.html). As the genome sequencing project is progressing, many RI markers are mapped physically, resulting in an excellent alignment of genetic and physical maps (see AtDB; see also integrated contig tables by Daphne Preuss and colleagues at the CSHL website). This integration will undoubtedly facilitate gene isolation by map-based cloning.

In addition to the unique-sequence regions of the chromosome arms, both rDNA repeats (NORs on chromosomes 2 and 4) and centromeric regions have been mapped genetically and physically. The centromeric regions were mapped by tetrad analysis (Copenhaver et al., 1998) and localized by in situ hybridization (Brandes et al., 1997). Thus, an outline of the physical organisation of the nuclear genome has emerged.

Agyare FD, Lashkari DA, Lagos A, Namath AF, Lagos G, Davis RW, Lemieux B (1997) Mapping expressed sequence tag sites on yeast artificial chromosome clones of Arabidopsis thaliana DNA. Genome Res. 7: 1-9.

Brandes A, Thompson H, Dean C, Heslop-Harrison JS (1997) Multiple repetitive DNA sequences in the paracentromeric regions of Arabidopsis thaliana L. Chromosome Res. 5: 238-246.

Camilleri C, Lafleuriel J, Macadre C, Varoquaux F, Parmentier Y, Picard G, Caboche M, Bouchez D (1998) A YAC contig map of Arabidopsis thaliana chromosome 3. Plant J. 14:633-642.

Copenhaver GP, Browne WE, Preuss D (1998) Assaying genome-wide recombination and centromere functions with Arabidopsis tetrads. Proc. Natl. Acad. Sci. USA 95: 247-252.

Hardtke CS, Berleth T (1996) Genetic and contig map of a 2200-kb region encompassing 5.5 cM on chromosome 1 of Arabidopsis thaliana. Genome 39: 1086-1092.

Kotani H, Sato S, Fukami M, Hosouchi T, Nakazaki N, Okumura S, Wada T, Liu YG, Shibata D, Tabata S (1997) A fine physical map of Arabidopsis thaliana chromosome 5: construction of a sequence-ready contig map. DNA Res. 4:371-378.

Marra MA, Kucaba TA, Dietrich NL, Green ED, Brownstein B, Wilson RK, McDonald KM, Hillier LW,

McPherson JD, Waterston RH (1997) High throughput fingerprint analysis of large-insert clones. Genome Res. 7: 1072-1084.

Meinke, DW, Cherry JC, Dean C, Rounsley SD, Koornneef M (1998) Arabidopsis thaliana: A model plant for genome analysis. Science 282: 662-682.

McPherson JD, Waterston RH (1997) High throughput fingerprint analysis of large-insert clones. Genome Res. 7:1072-1084.

Mozo T, Fischer S, Maier-Ewert S, Lehrach H, Altmann T (1998) Use of the IGF BAC library for physical mapping of the Arabidopsis thaliana genome. Plant J. 16, 377-384.

Round EK, Flowers SK, Richards EJ (1997) Arabidopsis thaliana centromere regions: genetic map positions and repetitive DNA structure. Genome Res 1997 Nov;7(11):1045-53

Sato S, Kotani H, Hayashi R, Liu YG, Shibata D, Tabata S (1998) A physical map of Arabidopsis thaliana chromosome 3 represented by two contigs of CIC YAC, P1, TAC and BAC clones. DNA Res.5:163-168.

Schmidt R, Love K, West J, Lenehan Z, Dean C (1997) Description of 31 YAC contigs spanning the majority of Arabidopsis thaliana chromosome 5. Plant J. 11: 563-572.

Thorlby GJ, Shlumukov L, Vizir IY, Yang CY, Mulligan BJ, Wilson ZA (1997) Fine-scale molecular genetic (RFLP) and physical mapping of a 8.9 cM region on the top arm of Arabidopsis chromosome 5 encompassing the male sterility gene, ms1. Plant J. 12: 471-479.

Wang ML, Huang L, Bongard-Pierce DK, Belmonte S, Zachgo EA, Morris JW, Dolan M, Goodman HM (1997) Construction of an approximately 2 Mb contig in the region around 80 cM of Arabidopsis thaliana chromosome 2. Plant J. 12: 711-730.

Useful web sites

RI maps by chromosome (text or graphic), including access to marker mapping data.http://nasc.nott.ac.uk/new_ri_map.html
http://genome-www.stanford.edu/Arabidopsis/ww/Nov98RImaps/index.html
CAPS markers: http://genome-www.stanford.edu/Arabidopsis/aboutcaps.html
SSLP markers:http://genome.bio.upenn.edu/SSLP_info/SSLP.html
Classic map:http://mutant.lse.okstate.edu/
Display of genetic and physical maps (includes classic map)http://genome-www3.stanford.edu/Arabidopsis/chromosomes/
Physical map of chromosome 1: http://genome.bio.upenn.edu/physical-mapping/physmaps.html
Physical map of chromosome 2:http://weeds.mgh.harvard.edu/goodman.html
Physical map of chromosome 3:http://www.kazusa.or.jp/arabi/
http://genome-www3.stanford.edu/Arabidopsis/chromosomes/ (genetic&physical map)
Physical map of chromosome 3 bottom arm (a combination of YAC map, Wash U BAC contigs and location of BAC end sequences):http://www.genoscope.cns.fr/externe/English/Projets/projetsindex.html
Physical map of chromosome 4:http://nucleus.cshl.org/protarab/ (top arm)
http://websvr.mips.biochem.mpg.de/proj/thal/chrIV_pic.html (bottom arm)
Physical map of chromosome 5: http://www.kazusa.or.jp/arabi/
Physical map of genome overview:http://genome-www4.stanford.edu:8300/cgi-bin/Pchrom
BAC contigs by fingerprinting:http://nucleus.cshl.org/protarab/edited_bac_contigs.htm (overview)
http://genome.wustl.edu/gsc/cgi-bin/findgif.pl (physical map contigs display)
BAC contigs by endprobe hybridisation:http://www.mpimp-golm.mpg.de/101/bac.html
ESTs to YACs:http://genome-www.stanford.edu/Arabidopsis/EST2YAC.html
ESTs to CIC-YACs:http://genome-www.stanford.edu/Arabidopsis/EST2CIC.html
chromosome 4&5 YAC contigs:http://nasc.nott.ac.uk/JIC-contigs/JIC-contigs.html
integrated contig tables (by Daphne Preuss and colleagues):http://nucleus.cshl.org/arabmaps/get_started.htm
Arabidopsis thaliana links:http://www.nsf.gov/cgi-bin/getpub?nsf9950

Sequencing of ESTs and Genomic Regions


More than 37,000 partial cDNA (EST) sequences have been deposited in the public databases while the total number of genes is most likely about 20,000. Building EST "contigs", i.e. larger cDNA sequences from overlapping ESTs, reduces the number of ESTs to those representing different genomic sequences (Rounsley et al., 1996; Cooke et al., 1997). The current estimate of non-redundant ESTs is about 11,000 or approximately half the total number of genes.

Large-scale high-throughput genomic sequencing makes use of the physical maps and the available BAC (TAMU, IGF), P1 and TAC (Mitsui, Kazusa) libraries (see AGI). BAC, TAC and P1 clones are mapped onto YAC, and their ends are sequenced to determine minimum tiling paths for sequencing large regions. More than 41,000 BAC ends (of a total of 22,000 BAC clones) have been sequenced, yielding stretches of ca. 400 bp every 4 kb on average (total sequence ca. 14 Mb). The largest contiguous region sequenced to date is nearly 1.9 Mb long (Bevan et al., 1998). This region around FCA on chromosome 4 contains 389 genes of which 46% could not be assigned a putative function by sequence comparisons with the databases. On average, one gene (ORF) was found every 4.8 kb, and similar values were observed for other genomic regions (Quigley et al., 1996; Sato et al., 1997; Kotani et al., 1997). For many ORFs no corresponding EST was found in the databases. To identify expressed genes within contig regions, a novel cDNA selection method has been proposed (Seki et al., 1997).

Bevan M et al. (1998) Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of Arabidopsis thaliana. Nature 391: 485-488.

Cooke R, Raynal M, Laudie M, Delseny M (1997) Identification of members of gene families in Arabidopsis thaliana by contig construction from partial cDNA sequences: 106 genes encoding 50 cytoplasmic ribosomal proteins. Plant J. 11: 1127-1140.

Kotani H, Nakamura Y, Sato S, Kaneko T, Asamizu E, Miyajima N, Tabata S (1997) Structural analysis of Arabidopsis thaliana chromosome 5. II. Sequence features of the regions of 1,044,062 bp covered by thirteen physically assigned P1 clones. DNA Res. 4: 291-300.

Quigley F, Dao P, Cottet A, Mache R (1996) Sequence analysis of an 81 kb contig from Arabidopsis thaliana chromosome III. Nucl. Acids Res. 24: 4313-4318.

Rounsley SD, Glodek A, Sutton G, Adams MD, Somerville CR, Venter JC (1996) The construction of Arabidopsis expressed sequence tag assemblies. Plant Phys. 112: 1177-1183.

Sato S, Kotani H, Nakamura Y, Kaneko T, Asamizu E, Fukami M, Miyajima N, Tabata S (1997) Structural analysis of Arabidopsis thaliana chromosome 5. I. Sequence features of the 1.6 Mb regions covered by twenty physically assigned P1 clones. DNA Res. 4:215-230

Seki M, Hayashida N, Kato N, Yohda M, Shinozaki K (1997) Rapid construction of a transcription map for a cosmid contig of Arabidopsis thaliana genome using a novel cDNA selection method. Plant J. 12: 481-487.

Useful web sites

EST database:http://www.cbc.umn.edu/ResearchProjects/Arabidopsis/index.html
http://www.tigr.org/tdb/agi/index.html
BAC end sequences:http://www.tigr.org/tigr_home/tdb/at/atgenome/bac_end_search/bac_end_search.html
http://www.genoscope.cns.fr/externe/English/Projets/projetsindex.html
P1, TAC, CIC-YAC end sequences:http://www.kazusa.or.jp/arabi/endseq/

Arabidopsis Genome Initiative (AGI): Current State of High-Throughput Genome Sequencing


The AGI was established on August 20-21, 1996 when representatives of six research groups (3 from USA and one each from EU, Japan and France) committed to sequencing the Arabidopsis genome met in Arlington, VA to discuss strategies for facilitating international cooperation in completing the genome project. In order to avoid duplication of efforts, the six groups of the Arabidopsis Genome Initiative (AGI) agreed to focus on different regions of the genome (Bevan et al., 1997, Plant Cell 9:476-487). In July 1998, the members of the AGI met again in Arlington, VA to discuss progress to date, to anticipate barriers to timely completion, and to establish an oversight committee for the U.S.-based labs (see Appendix).

At present, the major sequencing domains of the AGI groups have been assigned as follows:
Chromosome 1 (30 Mb)SPP group (Stanford, PennU, PGEC)
Chromosome 2 (14 Mb)TIGR group
Chromosome 3 (top - 13.5 Mb)Kazusa group
Chromosome 3 (top - 5 Mb)TIGR group*
Chromosome 3 (bottom - 9 Mb)EU project chrom3 (coordinated by Genoscope)
Chromosome 4 (top - 4 Mb)CSHSC (CSH-WU-ABI group)
Chromosome 4 (bottom - 13 Mb)EU group (ESSA I, II, III)
Chromosome 5 (top - 9 Mb)EU group (ESSA III)
Chromosome 5 (top + middle - 4 Mb)CSHSC (CSH-WU-ABI group)
Chromosome 5 (top + bottom - 17 Mb)Kazusa group
* TIGR will begin sequencing this region in spring 1999.

Sequencing is being done on BAC and P1 clones. Two different strategies are pursued. Both the SPP group and the TIGR group have selected nucleating sites ("seed BACs") around which BAC contigs have been established by using BAC end sequences to select adjacent clones with minimum overlap. This sequential sequencing procedures involves 32 and 16 starting points on chromosomes 1 and 2, respectively. The other sequencing strategy adopted by CSHSC, ESSA and Kazusa involves building of BAC or P1/TAC tiling paths with minimum overlap of adjacent clones ("sequence ready maps"). This procedure requires more preparative work but once established, large regions can be sequenced in parallel, e.g. by the several sequencing groups within the ESSA group.

Lists of clones selected for sequencing can be found on the web sites of the sequencing groups. Start dates for sequencing are indicated and it is agreed that the finished sequences will be released within 4-6 month after the start of sequencing (for details, see Appendix). The current state of genome sequencing is as follows (for overview by chromosome region, see AtDB / Arabidopsis Sequencing View and the homepages of the AGI groups):

AGI Progress at the end of 1998


Chr.Est. Size Completed
In Progress
In Preparation
Total Sequence
 (Target) ClonesMb ClonesMb ClonesMb ClonesMb
130 Mb 525.5216 2.0216 1.784 9.25
217 Mb 10710.2945 4.3227 2.64179 17.25
322.2 Mb 131.09 292.5 271.269 5.8
418.5 Mb 15311.7151 5.228 2.623119.4
529.2 Mb 20814.6215 1.238 2.7261 18.54
Total 120 Mb 53443.32 15615.2136 11.882670.3

Note that the total sequence entered into AtDB and summarised above includes overlaps between adjacent clones (except for those submitted by ESSA and WashU, which have overlaps almost all removed). For this reason the total number of clones sequenced is a better estimate of progress. With 10% overlap, 120 Mb will require 1,390 BAC clones. On 31 December 1998 the following finished clones had been deposited in Genbank:

SPP 50 BAC clones
TIGR 105 BAC clones
CSHSC 60 BAC clones
ESSA 117 BAC and cosmid clones
Kazusa 202 P1, TAC and BAC clones
Total 534 clones (approx. 39% of the total genome)

As of 31 December 1998, the AtDB Sequencing View displays 46 Mb (39% of estimated 120 Mb genome size) of complete sequence. This figure is 17 Mb higher than that given at the end of June 1998, indicating that the current rate of sequencing is close to 3 Mb per month for the entire AGI project. Taking into account the sequences that have not been released, the actual amount of sequence information is close to 55 Mb (almost 50% of the unique sequences). It is thus a realistic goal to finish the sequence of the Arabidopsis genome (excluding telomeric and centromeric regions as well as NORs) by the end of the year 2000.

Completion of the sequence is defined as each chromosome arm between subtelomeric repeats and centromeric repeats consisting of a single fully sequenced contig. This excludes the rDNA repeats (NORs on chromosomes 2 and 4 each of which accounts for ca. 3.5 Mb) and other internal tandem repeat regions. For these regions, it will be sufficient to sequence one repeat unit and to estimate the repeat number at each site. By these criteria, sequencing of chromosomes 2 (14 Mb) and 4 (17 Mb) can be expected to be complete before the end of 1999.

As sequencing is reaching the closing phase, boundaries between sequencing domains have to be defined precisely to avoid duplication of efforts by different sequencing groups. This difficulty has already been encountered by all the sequencing groups, resulting in duplication of sequences and mismapped clones (see table). For example, on chromosome 4 both CSHSC and ESSA sequenced two different but overlapping clones and had to reassign remaining projects in a common region of ca. 900 kb. TIGR and SPP have abandoned or mismapped at least 4 BACs and a chimaeric YAC, while Kazusa has sequenced several duplicate clones on chromosome 5. Depending on different rates of progress, it may seem advisable, in the interest of the Arabidopsis community, to reallocate genomic regions between the sequencing groups (see Appendix 1 and 2). The fingerprint map constructed at Washington University and the hybridisation-based map constructed by T. Altmann have the potential for delineating these regions before they are sequenced, and will probably be used for this purpose.

Useful Web Sites

Information on sequencing (lists of BAC and P1 clones, maps)
chromosome 1:http://sequence-www.stanford.edu/ara/by_locus.html
http://pgec-genome.pw.usda.gov/
http://cbil.humgen.upenn.edu/~atgc/ATGCUP.html
chromosome 2:http://www.tigr.org/tigr_home/tdb/at/atgenome/atgenome.html
chromosome 3:http://www.genoscope.cns.fr/
http://www.inra.fr/Versailles/BIOCEL/CHR3-INRA/chromosome3.html
chromosome 4&5:http://nucleus.cshl.org/protarab/
chromosome 3,4&5:http://websvr.mips.biochem.mpg.de/proj/thal/
chromosome 3&5:http://www.kazusa.or.jp/arabi/
entire genome:http://genome-www3.stanford.edu/cgi-bin/AtDB/Schromosomes


Stock Center Resources and Data Bases

The Arabidopsis Biological Resource Center (ABRC), Ohio State University, USA and the Nottingham Arabidopsis Stock Centre (NASC), University of Nottingham, UK have been providing stocks and information to the Arabidopsis community, since 1991. They have endeavored to accumulate the broadest possible range of stocks to provide the best platform of genetic diversity and genetic tools for Arabidopsis researchers and the genome project. Combined, they have sent 35,000 samples of seeds, 12,000 individual DNA clones and additional thousands of clones as gridded large-insert libraries to researchers world-wide during the last year.

The seed stocks currently available from the two centers include mutant lines (600), T-DNA lines and pools (30,000+), mapping strains, the G. P. Rédei collection of mutants and research lines (300+), the A. R. Kranz collection of mutants and ecotypes (700+), transposon/transposase lines (100+), RI lines (3 populations), ecotypes (400+), transgene lines and related species. The genetic mapping resources of the centers and the T-DNA and transposon resources complement the AGI sequencing efforts and the current research focus on functional genomics.

DNA stocks of ABRC include cloned genes (200), RFLP mapping clones (300+), expressed sequence tagged (EST) clones (30,000+), cDNA libraries (7), a phage genomic library, YAC libraries (6), BAC and P1 libraries used in genome sequencing (3) and two-hybrid libraries (2). In addition, filters of BACs, P1s and YACs for hybridization and isolated DNA from T-DNA populations (12,000 lines) are available.

The EST collection has been organized so that a set of 11,000, non-redundant based on the sequences available to TIGR, is being used by AGI. The 3' sequences of these clones are being analyzed by the MSU EST project to further eliminate redundancy. Copies of BAC and P1 clones, for which sequences have been published, are being sent to many research laboratories. In this connection, ABRC requests that all sequencing projects adhere, if at all possible, to the agreed clone-naming conventions when publishing sequences so that researchers can identify, without confusion, the proper clones to obtain.

NASC and ABRC are working to enlarge the collections of characterized mutants and clones. In addition, it is expected that large numbers of T-DNA lines will be received so that, within the next year, the available T-DNA lines will represent essential saturation of the genome. In connection with the accumulating genomic and cDNA sequence information, these resources will prove invaluable to the research community. In addition, new transposon-tagged populations, recombinant inbred mapping populations, a tetrad mapping populations and GFP lines are being incorporated into the collections.

RI mapping


The Nottingham Arabidopsis Stock Centre (NASC) curates the Lister and Dean RI maps that were originally developed and maintained by Clare Lister and Caroline Dean (JIC, Norwich). NASC also offers a weekly community mapping service. Anyone can submit data to NASC for mapping using the specially designed data submission form. The positions of all markers mapped at NASC are made publicly available through the NASC WWW server, the Arabidopsis Genome Resource and AtDB. For private mapping, all the marker scores are available from NASC. However, the aim for the community is to have as many markers as possible placed on the canonical map and so the submission of mapping data for inclusion on the RI map is appreciated.

Linking maps and sequence for comparative analysis in the Arabidopsis Genome Resource


The Arabidopsis node of the BBSRC funded UK-Crop Plant Bioinformatics Network (UK-CropNet) based at NASC has established the Arabidopsis Genome Resource (AGR). AGR is being developed as a repository of Arabidopis data of value in the comparative analysis of plant genomes and as an essential tool to aid in the cloning of homeologous genes of agronomic importance.

Comparative analysis in plants relies upon genetic and physical mapping of common probes between species. To this end AGR has made available the YAC physical maps of chromosomes IV and V (from C.Dean, R.Schmidt, M. Stammers). AGR also includes the Recombinant Inbred Maps from NASC integrated with the AGI sequence template clones (locations provided through AtDB). Arabidopsis nucleotide sequences are also included within AGR.

Integrating these data sets is the next key step in the development of AGR. Sequence overlap between completed AGI clones define contigs of BACs and P1s. These contigs will be fixed to the YAC physical maps using the results of BAC-YAC hybridisations. Contigs may be anchored on the RI maps through the nearest marker information from individual clones. RI maps and YAC physical maps are to some extent integrated through the use of some RI markers as probes in YAC physical mapping.

In collaboration with Martin Trick (John Innes Center), these data will be used to generate comparative map displays between Arabidopsis and the Brassicas.

Contact Persons

Randy Scholl, ABRC email: scholl.1@osu.edu
Mary Anderson, NASC email: arabidopsis@nottingham.ac.uk

Web sites

ABRC:http://aims.cps.msu.edu/aims/
NASC:http://nasc.nott.ac.uk/
AtDB:http://genome-www.stanford.edu/Arabidopsis/

Database Issues

The Arabidopsis Data Base (AtDB) is, at this time, located at Stanford University, Mike Cherry, P.I. The explosion of data, both genomic and biological, makes it clear that the data base as it now exists is operating at a minimal, not an optimal, level. The recognition that the community had to express its needs in a more concrete way resulted in two workshops addressing the issues of database composition and management. One was held in 1993 in Dallas, TX and that report can be accessed at http://genome-www.stanford.edu/Arabidopsis/db/dallas.report.html.

However, a more recent workshop on the same topic was held at the international meeting at Madison, WI in 1998 and that report is attached as an appendix. The needs are for a central database with links to other useful databases and information which is organized in a user-friendly fashion. Recognition of the needs of the Arabiopsis community as well as other interested communities has resulted in a call for proposals to the NSF titled "Arabidopsis thaliana Information Resource Project (AtIR)" The deadline date is March 22, 1999 and a copy of that announcement is attached to this report as an appendix.

Recommendation on information management

Large-scale genomic sequencing has reached a critical stage, with about half the genome in hand. Although the AGI sequencing groups provide information for specific regions of chromosomes, it is difficult and time-consuming for the Arabidopsis community to retrieve the relevant information. To take full advantage of all the progress that has been made in the analysis of the Arabidopsis genome, it will be necessary to establish a well-funded unified genome database that displays sequence and related features together with biological information in a user-friendly way.

National and Transnational Research Activities

Australia

Arabidopsis research in Australia is focused on building an understanding of fundamental aspects of plant biology. There is no direct commitment to large scale genome sequencing at this stage.

Among recent highlights, Liz Dennis, Jim Peacock and colleagues from CSIRO Division of Plant Industry in Canberra have discovered a second nonsymbiotic leghemoglobin gene from Arabidopsis (Proc. Nat. Acad. Sci. US 94, 12230-12234, 1997). They propose that all plants have two classes of leghemoglobins, as exemplified by the two genes in Arabidopsis. In the evolution of symbiosis, the product of one or other of the genes has been recruited on different occasions to play a new role in association with the symbiont. In most cases class 1 gene products have been involved, but the newly discovered class 2 proteins are also potentially symbiotic.

Another highlight has been the discovery of a gene encoding the catalytic subunit of cellulose synthase (Science 279, 717-720, 1998). Tony Arioli and colleagues in Richard Williamson's research group in the Research School of Biological Sciences at ANU in Canberra have walked to the locus of a temperature sensitive mutant that leads to root swelling (RSW1). The gene that complements the mutant phenotype is related to a cellulose synthase subunit gene from cotton. In the mutant there is widespread accumulation of beta-1,4-glucan but it is not crystallised into microfibrils, suggesting such assembly is a role of the RSW1 gene product.

Other active programs include studies of various aspects of flowering, from induction through floral organ morphogenesis to fertilisation and seed development. Also topics as diverse as aspects of photosynthesis, analysis of effects of abiotic stresses including heavy metals and UV, epigenetic effects of cytosine methylation, and the roles of the MYB gene family are being actively investigated.

A major commitment is being made to host the 10th International Conference on Arabidopsis Research in Melbourne from 4-8 July 1999. A Regional Advisory Committee, with colleagues from Japan, South Korea, Singapore and New Zealand, has been set up to give the meeeting a Western Pacific focus. This will be the first time the Arabidopsis community has met outside Europe and North America, and we look forward to welcoming scientists and students to Australia where plant science continues to thrive.

Contact Person: David Smyth, Monash University, Melbourne

E-mail Address: David.Smyth@sci.monash.edu.au

Belgium

As Belgium is a federal country we have both federal and Flemish initiatives to support research using Arabidopsis thaliana as the experimental organism.

A Flemisch project is running on the isolation and characterization of new ethylene mutants in Arabidopsis thaliana. This project aims at the isolation of a new series of mutants in the ethylene signal transduction pathway. A combined morphological, physiological and molecular-genetical approach will elucidate a number of previously unknown elements and will provide a better insight in the control of plant development by this hormone.

Belgian governement also stimulates interactions between the different universities. In this frame a project is running between the universities of Gent, Antwerp, Brussels and Liège on the growth and development of higher plants. Many external factors such as light intensity, light quality, temperature, the availability of nutrients and the interaction with pathogenic organisms influence to a great extent, growth and development of higher plants. The current knowledge on the molecular processes that control growth and development is still very limited. The national network aims at making a contribution to developmental biology by studying a limited number of aspects of plant development. Wherever possible, Arabidopsis thaliana will be used as a model plant. Keyprojects include the identification and cloning of key regulatory genes involved in leaf morphogenesis, the molecular analysis of the formation of syncytia (=large feeding cells) in nematode infected Arabidopsis roots. The Flemish community also supports these projects.

Contact Person: Nancy Terryn /Marc Van Montagu, University of Ghent

E-mail Address: nater@gengenp.rug.ac.be

China

Research using Arabidopsis as a model system was further established in China at national research institutes and universities in the past year. The research areas mainly include biosynthesis of amino acids, signal transduction and metabolism of plant hormones, cell wall formation, seed storage proteins, response to environmental stresses, isolation of various mutants affecting growth and development, and characterization of transposable elements. Interests in reverse genetics and functional genomics are also greatly increased with the focuses on gene-targeting, constructing a large transgenic population with mapped Ds randomly distributed at a high density, developing an expression library to transform in planta and establishing cDNA array to monitor gene expression and identify functional genes. Grants to support the research projects mentioned above are mainly from National Natural Science Foundation of China, Chinese Academy of Sciences and Hong Kong Research Grant Council/UPGC Grant HKU.

Contact Person: Jiayang Li, Institute of Genetics, Chinese Academy of Sciences

E-mail Address: jyli@ss10.igtp.ac.cn


France

Genome sequencing

During the last year, three French laboratories (M. Delseny/Perpignan, M. Kreis/Orsay and R. Mache/Grenoble) have systematically sequenced three BACS (300 kb) as part of the EU-ESSA II Program. Delseny's group has also continued to sequence cDNA clones corresponding to the 60kbp locus, Em1, on chromosome 3. A French sequencing center, Genoscope CNS has been created and part of its activity is devoted to sequencing the Arabidopsis genome. In collaboration with TIGR and Upenn, Genoscope is generating end sequences from all 23,000 BAC clones from the TAMU and IGF libraries to expedite the selection of clones with minimal overlap with those already sequenced. They are also coordinating a new EU project aimed at sequencing the lower arm of chromosome 3 (9Mb). This project involves 16 sequencing groups. The goal for Genoscope and three academic French laboratories is about 2 Mb.

Synteny with other genomes

A program was developed between INRA Rennes and Versailles groups to identify consensus markers between rapeseed and Arabidopsis for a number of agronomically relevant genes. A collaboration between laboratories in Perpignan, Davis and Poznan has found synteny between five adjacent genes in the chromosome 3 Em locus of Arabidopsis and genes in B. oleracea, B. nigra and B. rapa. The EU program EuDicotMap has started to select highly conserved ESTs of rice and Arabidopsis and to map them in Arabidopsis as well as important European crops in order to identify synteny blocks between different families.

Generation of insertion lines and reverse genetics screenings

INRA-Versailles has now generated more than 38,000 T-DNA mutant lines. Screening of the collection is being done via a coordinated effort between INRA, CNRS and various European laboratories. Out of approximately a hundred target genes selected for the screen insertions were identified in 50% of them. The systematic characterization of flanking sequences tags of insertions in over a thousand mutants has now begun. About 11,000 lines will be donated to NASC by the beginning of next year.

A summary of Arabidopsis genes under study

Research in many areas of plant genetics and biology is being actively pursued in French laboratories. Plant hormone and signal transduction, cell wall, secreted and membrane proteins, metabolism, development, and plant pathogen interactions are being investigated in laboratories throughout France.

Contact Person: Michel Caboche, INRA Versailles

E-mail Address: caboche@tournesol.versailles.inra.fr

Germany

Arabidopsis research is still increasing in scope at universities and research institutions. The national research program on "Arabidopsis as a model for analysing plant development" is in its final two-year funding period. Because its tremendous success, an initiative has been made by Arabidopsis researchers to establish a new program focusing on plant cell biology. Another six-year national research program on plant hormones to start in 1999 includes several groups working on Arabidopsis. Beside these programs, Arabidopsis research is funded within European projects and by DFG grants on an individual basis or as part of local research programs.

Several Arabidopsis projects are related to genome research. ZIGIA, a program operated at the Max-Planck-Institut in Cologne, aims at the functional analysis through gene inactivation by transposon insertion. High throughput endprobe hybridization of BAC clones from the IGF library was done at the Max-Planck-Institut in Golm. These data were integrated with information made available by other groups to assemble a complete BAC-based physical map of the Arabidopsis genome. Projects on transcript profiling have been initiated at the DKFZ in Heidelberg, the MPI in Golm and the IPK in Gatersleben. The Federal Ministery of Education and Science (BMBF) has made a call for proposals within a newly-established Plant Genome Analysis program (GABI). A joint Arabidopsis proposal involving 32 projects from 27 different institutions has been submitted, aiming at a functional analysis of the genome.

An EMBO (European Molecular Biology Organisation) Course held at the Max-Planck-Institut in Cologne in May 1998 entitled "Molecular and Biochemical Analysis of Arabidopsis" was attended by 16 participants representing 13 European countries. The course covered the theoretical and practical aspects of forward and reverse genetics, genetic and physical mapping, transformation, transient gene expression, in situ hybridisation, cell biology, physiology, the yeast two-hybrid system, complementation of yeast mutants and bioinformatics over an eleven-day period. EMBO Course seminars from ten invited speakers were integrated with a two-day meeting of the national Arabidopsis research program.

Contact Person: Gerd Jürgens, Universität Tübingen

E-mail Address: gerd.juergens@uni-tuebingen.de

Italy

Research in Italy with Arabidopsis is growing. About twenty laboratories are presently attending to researches regarding this model system. Investigations cover: plant pathogen relationships, expression of PG and PGIF genes, role of rolB and rolD in plant differentiation, HD-ZIP transcription factors in plant morphogenesis, complementation of yeast by Arabidopsis genes, selection of Ca2+ and K+ transport mutants, genes involved in heat and cold resistance, myb transcription factors, genes of the polyamine pathway, induction of noduline genes in plants by Rhizobium, use of antisense RNA to inhibit nitrogen transport, study of agravitropic mutants in earth and micro g conditions (ESA-ASI projects). Financial support for the researches is coming from different sources, e.g. the National Research Council, the Ministry of Agriculture, the European IV Frame Programs, the ESA-ASI Space Programs, and a few other National Agencies. Research groups are located both in universities and in National Institutes (National Research Council, ENEA, National Institute of Nutrition). The Italian association of researchers interested in Arabidopsis (ARABITALIA) met for the first time in September 1997 in Abbadia di Fiastra (Macerata, central Italy). In this occasion the scientists present to the meeting furnished a report of their Arabidopsis investigations and projects, and a booklet carrying the information about research on Arabidopsis in Italy was also distributed. In this occasion some young Italian researchers, who are working in foreign countries (USA, and UK) also reported about their recent investigations. The 1998 annual Meeting was held at the end of September in Viterbo (central Italy) in the occasion of the EUCARPIA Symposium on plant breeding. A document is in preparation about the state of Arabidopsis research in Italy, and about the actions that can be started to obtain the financial support that is needed to foster it.

Contact Person: Fernando Migliaccio, CNR (Monterotondo)

E-mail Address: miglia@nserv.icmat.mlib.cnr.it

Japan

Arabidopsis research is well-established in Japan. The number of laboratories using the model plant for research and education is still increasing gradually in universities, national institutes, and private companies. Areas of research are widely spread from developmental biology, metabolic regulation, gene expression, environmental stress signaling, and DNA methylation, to large scale DNA sequencing. The results of the researches were reported in international meetings such as the " International Congress of Arabidopsis Research" in Madison, WI, the "Joint Meeting of Japanese and American Societies of Plant Physiologists" in Vancouver, BC and in national meetings, especially in the "Workshop on Arabidopsis Studies", an annual meeting. The 8th workshop was organized by Kazuo Shinozaki, Minami Matsui, Yuji Kamiya, and Richard E. Kendrick from October 11 to 13, 1997, at Riken Institute at Wako city, Saitama. The workshop was joined with Frontier Research Forum, "Recent Progress of Plant Hormone Research in Arabidopsis". We had nearly 250 participants, 20 poster presentations, and 37 speakers including 7 guest speakers from abroad. The 9th workshop was held in Kazusa Academia Center from Nov. 19 to 20, 1998. The workshop organized by Satoshi Tabata had nearly 300 participants, 40 poster presentations and 11 presentations. Topics of the presentations included systemic genome analyses, patent, and postgenome tactics, as well as mutant analyses, gene cloning, and newly-developed techniques.

The Japanese Arabidopsis communication network, nazuna-net, started in January 1995, now includes 442 members (Sept. 1998) from 99 organizations including 17 private companies (contact: Dr. Takayuki Kohchi: kouchi@bs.aist-nara.ac.jp). A large-scale genome sequencing project showed extensive progress at Kazusa DNA Research Institute in coordination with the Multinational Arabidopsis Genome Initiative (contact: Dr. Satoshi Tabata: tabata@kazusa.or.jp). Nearly 12.5 Mb covering 174 P1 clones have been sequenced and reported in the journal "DNA Research" (contact: http://www.uap.co.jp), on a homepage ( http://www.kazusa.or.jp/arabi/). The Sendai Seed Stock Center (SASSC) is operated by Dr. Nobuharu Goto (n-goto@ipc.miyakyo-u.ac.jp) since 1993.

Contact Person: Kiyotaka Okada, Kyoto University

E-mail Address: kiyo@ok-lab.bot.kyoto-u.ac.jp

The Netherlands

The Dutch Arabidopsis groups organized their annual meeting in Utrecht on February 19, which was attended by approximately 80 participants. Arabidopsis groups are located at the Universities of Leiden, Utrecht and Wageningen and at CPRO-DLO in Wageningen. Important research topics are in Leiden (Hooykaas) recombination, auxin action and apoptosis, in Utrecht sugar sensing (Smeekens), root development (Scheres) and acquired resistance (van Loon), in Wageningen embryogenesis (de Vries, van Lammeren) and flowering and seed- development (Koornneef), transposons, genome sequencing, plant disease resistance genes and developmental biology (Stiekema, Pereira, Angenent, Groot all CPRO-DLO). The groups collaborate through their involvement in graduate schools and EU programs.

Contact Person: Maarten Koornneef, Agricultural University Wageningen

E-mail Address: Maarten.Koornneef@BOTGEN.EL.WAU.NL

Spain

No special funding programme supports Arabidopsis research in Spain. However, more than 20 research groups are currently active in research with this organism, mainly funded by the National Biotechnology Programme, Basic Research Programmes, and the European Union BIOTECH Programme. Some of these groups are involved in large-scale genome sequencing and function search, specially in the case of the Myb family of transcriptional factors. Spanish groups interested in Arabidopsis development are mainly focused on seed, leaf and flower development, and flowering induction. This area is seing the incorporation of new groups of Arabidopsis users, some of them also interested in cell differentiation. In the area of plant physiology and metabolism some topics that have seen significant contributions during the year are the study of secondary metabolism, the identification of new elements in the signal transduction pathways involved in different environmental stress responses, and the analysis of sulfur and phosphate assimilation. Arabidopsis has also being increasingly used for studies in plant pathogen interactions to identify new elements in the response signal transduction pathways.

The Spanish Arabidopsis network, funded by the National Biotechnology Programme, generated a collection of 10000 T-DNA lines that is being actively used in mutant screenings at both the phenotypic and DNA levels, in many laboratories. This network that includes all the Spanish laboratories working with Arabidopsis is now discussing future join activities. Many more Spanish scientists are currently involved in Arabidopsis research in other laboratories around the world. Their succesful integration in the Spanish R&D system would strongly contribute to steer the field and increase the contribution of our country.

Contact Person: José Martinez Zapater, Centro Nacional de Biotecnología (Madrid)

E-mail Address: zapater@cnb.uam.es

United Kingdom

There are over 190 projects at present in the UK involving Arabidopsis. The European Commission continues to be a major source of funding and the newly announced Framework V programme is due to begin calls for proposals. Although there are no longer any special initiatives aimed specifically at Arabidopsis research, The Biotechnology and Biological Sciences Research Council (BBSRC) funds projects through competitive grants and special initiatives, contributing approximately £ 6.8m to Arabidopsis research in the UK.

An Arabidopsis Gene Function Search Network is currently under development by Mike Bevan at the John Innes Centre. This is a network of consortia, groups of labs with a common goal, being brought together with the aim of doing large scale screening programmes to reveal the functions of very large numbers of genes being revealed by the genome project.

The Genetical Society of Great Britain chose Arabidopsis as the subject area for their annual autumn meeting in 1997. The Mendel Lecture was given by Elliott Meyerowitz who was preceded during the day by Mike Bevan, Rob Martienssen, Joe Ecker, Ben Scheres, Caroline Dean, Gerd Jurgens and Brain Staskawicz."Arabidopsis thaliana: Big Ideas from a Small Plant" was such a success that the Society has decided to host a biennial conference on Arabidopsis.

An EMBO (European Molecular Biology Organisation) Course held at the John Innes Centre in May 1997 entitled "Arabidopsis as an Experimental Organism" was attended by 12 participants representing seven European countries. The course covered the theoretical and practical aspects of mutant screening, genetic and physical mapping, plant pathology, microscopy, biolistics, the yeast two-hybrid system, and sequence fragment and data analysis over a ten day period which also included seminars from ten invited speakers.

The Chelsea Flower Show judges awarded a prestigious Silver Medal to the John Innes Centre Science Communication and Education Department exhibit, entitled "Arabidopsis - a Wonderful Weed". The exhibit demonstrated how Arabidopsis is used to recognise genes of agronomic importance in agricultural crops. The public exposure and media coverage the display attracted in the UK and abroad has helped to increase awareness of the importance of plant molecular biology.

In the last year the Nottingham Arabidopsis Stock Centre (NASC) in collaboration with the Arabidopsis Biological Resource Center (ABRC) has continued to accumulate the broadest possible range of stocks to provide the best platform of genetic diversity and genetic tools for the investigation of this model system. Currently NASC maintains and distributes over 20,000 accessions of Arabidopsis to the research community. New stocks generated within the UK and shortly to be made available include the first 10,000 of the Sainsbury Laboratory Arabidopsis transposants (SLAT) lines (Jonathan Jones, Sainsbury Lab, UK), 100 GFP lines (Jim Haseloff, Cambridge, UK) and a Recombinant Inbred population of Nd (Niederzenz) x Columbia generated by Eric Holub, Jim Beynon and Ian Crute (HRI Wellsbourne, UK).

Contact Person: Caroline Dean, John Innes Centre, Norwich

E-mail Address: caroline.dean@bbsrc.ac.uk

United States

Arabidopsis research continues to flourish in both academic and corporate laboratories in the United States. One of the most obvious indicators of the value of information that can be gleaned from Arabidopsis research has been the establishment of several genomics companies that are exploiting Arabidopsis genetics. Thanks to continued support from the National Science Foundation (NSF), the Department of Energy (DOE) and the U.S. Department of Agriculture (USDA), the Arabidopsis genome is on track for being completely sequenced by the end of 2000. A total of 46 Mb of finished sequence had been deposited in public databases as of January 1999, of which the US sequencing groups contributed more than 24 Mb. Importantly, the groups in the US Arabidopsis Genome Initiative (AGI) finished the first phase of their sequencing effort in less than the original 3 year time allowed, and could thus begin during 1998 with the second phase of sequencing ahead of time. In addition to its value for database mining and other more traditional genomic approaches, the availability of large amounts of genome sequence together with physical maps that cover almost the entire genome have begun to eliminate positional cloning as a bottleneck in Arabidopsis genetics. Much of this information is conveniently accessed through the Arabidopsis thaliana database (AtDB) at Stanford University. The growing importance of Arabidopsis research has also been evident in the increasing number of participants at the Eight and Ninth International Conferences on Arabidopsis Research, which were held in Madison, WI, and drew 817 and 998 participants, respectively.

Apart from the genome sequencing efforts, important tools are being developed for reverse genetics and functional genomics. A significant advance in this area has been an $8.7M award from the NSF Plant Genome Research Program for a cooperative effort to provide high-throughput gene expression profiling as well as gene knock out services to the Arabidopsis community. The identification of gene knock outs has been made possible through the availability of large numbers of T-DNA insertion lines, of which 48,500 have already been deposited with the Arabidopsis Biological Resource Center (ABRC) at Ohio State University. This number can be expected to at least double in 1999. The ABRC continues to be an important resource for the Arabidopsis community. It shipped 29,500 seed and 13,000 DNA stocks in 1997; and 46,500 seed and 16,000 DNA stocks in 1998.

As a direct consequence of the improvements in scientific infrastructure, significant scientific advances have been made in every area of Arabidopsis research, including hormone and light signaling, circadian clock, responses to biotic and abiotic stress and developmental biology. Some of the most noteworthy discoveries in 1998 included the discovery of master regulatory genes that protect Arabidopsis from cold damage and the identification of proteins that transport auxins.

Contact Person: Detlef Weigel

E-mail Address: detlef_weigel@gm.salk.edu

Contact Person: Jeff Dangle

E-mail Address:dangle@email.unc.edu

Appendix 1

NSF ARABIDOPSIS GENOME MEETING REPORT

INTRODUCTION

In 1990, a report entitled "A Long-range Plan for the Multinational Coordinated Arabidopsis thaliana Genome Research Project" was published by the National Science Foundation (NSF 90-80). The report detailed plans made by members of the Arabidopsis research community in the U.S. and abroad, to collaborate in the sequencing of the genome of this model plant, and to characterize the structure, function and regulation of all Arabidopsis genes. In 1998 it became possible to set a realistic goal of finishing the sequence by the end of the year 2000.

Since then, a multinational genome sequencing project involving laboratories in the United States, in Europe, and in Japan, has been engaged in achieving this goal. This report is the proceedings of a meeting held to discuss progress to date, to anticipate barriers to timely completion, and to establish an oversight committee for the U.S. -based labs. The meeting was held at the National Science Foundation in Arlington, Virginia on July 9 and 10, 1998.

Participants Representing

Elliot Meyerowitz, California Institute of Technology Chair

Ian Bancroft, John Innes Centre ESSA

Michael Bevan, John Innes Centre ESSA

Ellson Chen, Perkin-Elmer Applied Biosystems CSHSC

Ronald Davis, Stanford University SPP

Nancy Federspiel, Stanford University SPP

Gerd Jürgens, University of Tübingen MSC

Richard McCombie, Cold Spring Harbor Laboratory CSHSC

Rob Martienssen, Cold Spring Harbor Laboratory CSHSC

David Meinke Arabidopsis community

Xiaoying Lin, TIGR TIGR

Curtis Palm, Stanford University SPP

Daphne Preuss, University of Chicago Arabidopsis community

Francis Quetier, Genoscope Genoscope

Steven Rounsley, TIGR TIGR

Marcel Salanoubat, Genoscope Genoscope

Satoshi Tabata, Kazusa Kazusa

Athanasios Theologis, USDA Plant Gene Expression Ctr. SPP

Richard Wilson, Washington University CSHSC

Observers

Mary Clutter NSF

Machi Dilworth NSF

DeLill Nasser NSF

James Tavares DOE

Jane Peterson NIH

Adam Felsenfeld NIH

Peter Bretting USDA

Liang-Shiou Lin USDA

STRUCTURE AND PROGRESS

There are six different sequencing consortia participating in the sequencing phase of the Arabidopsis genome project, three from the United States, two from the European Community, and one from Japan. Each is sequencing a different region of the genome, and each has its own model for distribution of the necessary work among consortium members. The progress of each follows, taking them in turn.

TIGR (The Institute for Genome Research, http://www.tigr.org/tdb/at/at.html)

TIGR has taken on two aspects of the sequencing project. The first is BAC end sequencing (along with SPP and Genoscope), to provide one-pass sequences of both ends of the 22,000 BAC clones that are one type of clone being used for sequencing in the genome project. The purpose of this is to allow sequential progression from a single sequenced BAC to the two adjacent genomic regions with minimal overlap. TIGR has sequenced 16,392 BAC ends from a total of 9,572 BAC clones, providing a total of 7.34 Mb of BAC end sequence. The total BAC end sequence from all three groups is 36,574 BAC ends from 18,746 clones, representing 13.64 Mb.

The second TIGR project is the sequencing of chromosome 2. They have chosen 16 well-spaced starting points (by use of the Goodman lab chromosome 2 contig map), and are sequencing BAC clones in parallel, starting with the original clone in each location, and proceeding by use of BAC end sequences to adjacent clones with minimal overlap. The average overlap between adjacent BAC clones has been 8.2 kb, with a range from 150 bp to 30 kb. At present 4.83 Mb is complete and annotated, 3.25 Mb has shotgun sequencing or annotation in progress, and 1.38 Mb of BAC clones are in preparation for sequencing, for a total of 9.46 Mb.

The only problem encountered so far is a gap with no clones to cross it in present BAC collections, in the m336 large contig. Fiber FISH done at the University of Wisconsin indicates a gap size of 500 kb, and the sequence at either side of the gap shows no special features. There has also been a BAC difficult to close due to long tandem dinucleotide repeats, but there is no theoretical barrier to completion of such clones.

The total estimated length of chromosome 2 is less than 14 Mb, not including an estimated 3.5 Mb of ribosomal DNA tandem repeats at one end of the chromosome. The current rate of sequencing in this phase of the project at TIGR is presently 8 Mb per year, and there is an existing proposal to increase that to 12 Mb per year. It is estimated that, barring unforeseen problems, chromosome two, excluding highly repetitive centromeric regions and the rDNA repeats, will be completed by the end of 1999; if the full capacity is to be used, clones on other chromosomes will have to be started by the end of 1998.

SPP (Stanford University, Plant Gene Expression Center, University of Pennsylvania; http://pgec-genome.pw.usda.gov; http://cbil.humgen.upenn.edu/~atgc/ATGCUP.html; http://sequence-www.stanford.edu/ara/ArabidopsisSeqStanford.html)

These three groups have as a goal completing the sequence of chromosome 1. They have divided some of the preparative tasks, with Stanford providing automated template preparation, Penn mapping chromosome 1 BACs and providing BAC end sequences to the project, and PGEC making the sequencing libraries. All groups are involved in sequencing. The strategy is similar to that of TIGR, whereby seed BAC clones chosen by the Penn laboratory are used a sequencing origins, and progress made by use both of BAC end sequences and BAC fingerprints, to provide minimal overlap. Initially 20 starting points were used, there are plans to add an additional 20 soon.

SPP has provided 8,936 BAC end sequences to the 36,574 BAC end total.

The chromosome 1 sequencing done or in progress has so far totaled 5.64 Mb, which is the sequence of 55 BACs and 1 YAC clone. Excluding overlap between adjacent clones leaves a total unique sequence in progress or finished of 5.36 Mb. Of this 4.02 Mb are complete, 0.65 Mb in finishing and 0.97 Mb in shotgun phase. Overlap between adjacent clones has been 2 to 38 kb, with an average less than 7 kb; there has as yet been no failure to find the adjacent clone from any sequenced BAC.

The total estimated length of chromosome 1 is 30 Mb. Capacity exists to finish it by the end of 2000, given sufficient funding - completion will require sequencing approximately 300 BAC clones in the next 3 years, or 33 BACs per year per participating site.

CSHSC (Cold Spring Harbor Sequencing Consortium; http://www.cshl.org/arabweb/; http://genome.wustl.edu/gsc/)

This consortium includes Cold Spring Harbor Laboratories, Washington University and Perkin-Elmer Applied Biosystems. They are taking a different approach to choosing the BAC clones to sequence, which involves HindIII and EcoRI fingerprinting of BAC clones, and from the clone overlaps inferred from fingerprint identity, producing deep contigs of overlapping clones. Each contig is then to be anchored to known chromosomal positions by use of the abundant public information on BAC clone map positions, or by cross-hybridization with the YAC contigs already established for chromosomes 4 and 5 at the John Innes Centre in the U.K. Once a genome-wide set of BAC contigs is available, a minimal tiling path can be calculated and many clones can be sequenced in parallel. This approach requires the same degree of preparative work as BAC end sequencing for a comparable cost, but has the advantages of providing a physical map to the Arabidopsis community prior to the completion of the genomic sequence, and also will allow parallel sequencing of clones rather than the necessarily sequential sequencing using BAC end sequences. In addition, this method will allow gaps to be identified in advance of sequencing in the gapped region, and thus may allow a longer time to close gaps before they become a critical problem with sequence completion.

So far an estimated 71 MB of the perhaps 120 Mb nuclear genome is contained in 66 BAC contigs, which contain 10,840 BAC clones. The chromosome totals are:

Chromosome Mb Contigs

1 22.5 13

2 >4 7

3 17.0 11

4 15.3 8

5 13.4 8

The current rate of BAC clone fingerprinting and editing is 15 Mb per month. It is expected that all 22,000 available BAC clone will be added to this map by the end of 1998. Concentration at present is on chromosome 5, where the CSHSC is sequencing, and chromosome 3, where Genoscope plans to sequence using the CSHSC contigs.

The CSHSC is committed to sequencing the top of chromosome 4 and a region of approximately 4 Mb around the centromere and on the north arm of chromosome 5. Sequence data has been contributed by all three collaborating partners. Totals finished so far are 690 kb from ABI, 1.22 Mb from CSH and 1.64 Mb from Washington University, adding up to 3.54 Mb (with overlap subtracted). In addition to this, approximately 3 Mb of sequencing is in progress, making a total of more than 6.0 Mb in 61 BAC clones and 1 YAC. If this rate were to be continued, the proposed chromosome 4 region could be completed by the end of 1998, with chromosome 5 region completion either 1998 or early 1999.

ESSA (European Scientists Sequencing Arabidopsis; http://muntjac.mips.biochem.mpg.de/arabi/index.html)

The ESSA project is in three phases. Phase I, which is complete, was to sequence two contiguous regions on chromosome 4. One, surrounding the FCA genetic marker, is 1.92 Mb (Bevan et al. 1998, Nature 391:485), the other, around the genetic marker AP2, is 0.41 Mb, for a total completed ESSA I sequence of 2.33 Mb. ESSA II, which is to be completed in October 1998, has the goal of completing a 5 Mb region on the long arm of chromosome 4. So far 3.16 Mb is completed and annotated, an additional 1.73 Mb completed and in annotation phase, for a total of 4.89 Mb sequenced. Another 0.24 Mb is nearly complete, for an overall total of ESSA II complete and nearly complete contiguous sequence of 5.13 Mb. The ESSA I and ESSA II total of completed and nearly completed sequence is thus 7.46 Mb.

The two-year ESSA III project begins in August, 1998. Its goal is to complete the sequence of the long arm of chromosome 4 (estimated to total 13 to 13.5 Mb) and to sequence two regions of the north arm of chromosome 5 (with others to be done by CSHSC and Kazusa), with a total goal of sequencing 9 Mb.

The ESSA procedure is to use the existing YAC contig maps of chromosomes 4 and 5 to group BAC clones in bins according to their YAC cross-hybridization, then to use SalI digestions and pulsed-field gel electrophoresis followed by blotting and iterative hybridization with BAC clones to establish both BAC contigs and an overall SalI restriction map of both chromosomes. A minimal BAC tiling path is then defined and called the "sequence ready map,", the clones from this map are then sent to one of 9 collaborating sequencing laboratories for nucleotide sequencing. The data are collected and annotated at MIPS, the Munich Information Center for Protein Sequences.

The only problems encountered so far have been two difficult clones, one with a large hairpin and the other with a large region of tandem repeats. Both have been nearly completed, with the tandem repeats solved by long PCR as a supplement to the shotgun sequencing.

Kazusa DNA Research Institute ( http://www.kazusa.or.jp/arabi/)

The Kazusa Institute is engaged in sequencing the long arm of chromosome 5 and along with ESSA and CSHSC, portions of the short arm of this chromosome (totaling 17.2 Mb when complete), and they are beginning the sequencing of the long (13.2 Mb) arm of chromosome 3.

The clone libraries used are from the Mitsui Plant Biotechnology Research Institute, and consist of P1 and TAC clones. Clones from these libraries are initially selected by cross-hybridization to mapped clone markers. The clones are then anchored on the YAC contig (for chromosome 5 clones), and fingerprinted as an integrity check. They are then shotgun sequenced, assembled, and annotated. A collection of YAC, TAC and P1 clone end sequences has been made for tiling the chromosome 5 clones, it includes 1254 sequences from 690 CIC YAC clones and 706 sequences from 389 P1 or TAC clones on chromosome 5. Similar methods for chromosome 3 are starting, using the YAC contig map of that chromosome produced by D. Bouchez and collaborators at INRA. At present, two large contigs for chromosome 3 exist, one of 13.6 Mb for the long arm, and one of 9.2 Mb for the bottom arm.

Progress to date has been the release of 8.89 Mb of completed, annotated sequence, with release of an additional 1.60 Mb scheduled by August 1. Thus by August 1, 1998, 10.49 Mb will have been completed and released. 10.15 Mb of this is on chromosome 5, 0.34 Mb on chromosome 3. An additional 2 Mb of chromosome 5 sequencing is in progress. At current rates of 700 to 800 kb per month, it is expected that 27 months will be required for completion of this part of the project, which is estimated to include (in addition to the 10.49 Mb to be completed by August 1) 7.05 Mb of chromosome 5 and 13.3 Mb of chromosome 3. Genoscope has proposed to do 5 Mb of the long arm of chromosome 3 (see below), if they are able to take this on (a matter now being considered there, and dependent upon the demand for their resources by human genome sequencing) the total sequence proposed by Kazusa will be reduced, and completion will be expected within 2 years.

Genoscope (Centre Nationale de Sequencage; http://www.genoscope.cns.fr/externe/arabidopsis/Arabidopsis.html)

Genoscope is involved in the second European project. They have already provided BAC end sequences totaling approximately 11,500 completed end sequences, with plans to provide 2,000 more. Once this is complete 91% of the 22,000 BAC clones used in the sequencing project (from the IGF and the TAMU collections) will have available end sequences.

Their sequencing plan is to use the Bouchez chromosome 3 YAC contigs to make a minimal BAC tiling path by use of fingerprints done at Genoscope and at CSHSC, then to sequence the bottom (9 Mb) arm of chromosome 3. Complete contigs for this region have been supplied by CSHSC. 16 different European sequencing groups are receiving the BAC clones from Genoscope, and the data are returned to MIPS for annotation and entry into a public database. The sending out of clones is to begin within weeks, and completion of the 9 Mb region is expected by the end of 2000.

Genoscope has in addition explored with Kazusa the possibility of sequencing an additional 5 Mb on the top arm of chromosome 3; their ability to do this will depend upon the amount of their sequencing capacity that will be required to do their part of human chromosome 14, and their ability to generate extra sequencing capacity. A decision on whether Genoscope or Kazusa will sequence this 5 Mb is planned for September, 1998.

Summary of Progress

Chromosome Est. Size (Mb) Complete (Mb) Group

1 ~30 4.02 SPP

2 14 (+rDNA) 4.83 TIGR

3 23 0.34 Kazusa & Genoscope

4 17 (+rDNA) 9.02 ESSA & CSHSC

5 ~30 10.15 Kazusa, CSHSC, ESSA

TOTAL ~114 Mb +rDNA 28.36

In addition, shotgun sequencing libraries are in preparation for an additional 2.80 Mb, and sequencing is in progress but not yet complete for an additional 2.98 Mb. Furthermore, 36,574 BAC ends from 18,746 clones, representing 13.64 Mb, provided by TIGR, SPP and Genoscope are completed, as are 1254 end sequences from 690 CIC YAC clones and 706 sequences from 389 P1 or TAC clones, provided by Kazusa.

COMPLETING THE SEQUENCE

Defining completion

In addition to the gene-rich and highly informative regions of the genome (with one gene every 4-5 kb), there are regions of repetitive DNA, and perhaps of lower gene density.

One instance is the ribosomal DNA repeats, which are arranged in two uninterrupted tandem arrays. Each repeat unit contains a gene for 18S, 5.8S and 25S structural ribosomal RNAs and is 10-10.5 kb in length. The large tandem arrays of repeat units are found at the top arms of chromosomes 2 (NOR2) and 4 (NOR4). Each is on the order of 3-3.5 Mb, or 300-350 repeat units.

Centromeric regions are only beginning to be defined at the molecular level in Arabidopsis, but cloning and chromosome in situ hybridization studies have shown that these regions contain multiple tandem repeats of short sequences, a major element of which is 180 bp repeats and related repeats. In one case (chromosome 1) an estimate of the repeat length is 950 kb. For chromosome 4 the functional centromere is probably on one side of a 180 bp repeat region, and so far does not seem to be unclonable. There is some indication that BAC clones from this region may have a higher amount of repetitive sequence in tandem arrays than other BAC clones sequenced to date, and one BAC clone from the chromosome 2 centromere region has only 3 genes, a much lower density than the typical 1 gene per 4-5 kb found elsewhere. Another BAC from the centromere region of chromosome 4 has a more typical density.

Telomeres and subtelomeric regions in Arabidopsis have been characterized and appear to be small (totaling perhaps 100 to 200 kb in the genome) and not difficult to sequence so far.

There are also small regions of simple tandem repeats, as for example as described above in the ESSA project progress report. This clone, BAC F9F13, contained 10 tandem copies of a 3.5 kb repeat, as well as 2 additional copies of the same repeat.

Because the exact sequence and number of tandem repeats is not thought to be consequential for any functional analysis, and in fact is quite polymorphic between ecotypes, it was decided that a sufficient characterization of these repeats would be a sequence of one subunit, and an estimation from blotting or long-range PCR of the number of tandem copies at each site.

Given this, the complete sequence of the nuclear genome will be considered to be in hand when each chromosome arm is fully sequenced as a single contig from subtelomeric repeat to "centromeric" tandem repeats, with internal tandem repeat regions (including rDNA repeats) characterized only as far as demonstrating that they are pure tandem repeats, with the sequence of one repeat unit determined, and an estimate of repeat number at each site provided. This characterization already exists for the rDNA repeats (Copenhaver et al. (1995) Plant J. 7:273-286). This definition may have to change if unclonable regions are found, or if non-tandemly organized but nonetheless impossible to sequence (with available relevant technology) clones are found. To date there is no indication of either unclonable regions or of clones impossible to sequence for reasons other than large numbers of small tandem repeats.

Other sequence parameters

Accuracy

All of the participants have agreed before, and continue to agree, that the standard for sequence accuracy should be one error in 10,000 nucleotides or better, and the projects so far seem to be achieving this goal. The U.S. groups agreed to a common pair of tests to monitor sequence accuracy. The first would be using base calling programs such as Phred (Ewing et al. (1998) Genome Res. 8:175-185) or TIGR Assembler to assess sequence accuracy in each sequencing run. The second is to independently determine the sequence of all regions of overlap between adjacent clones, and only after sequence finishing to compare them for mismatches. This serves as an independent method to determine sequence accuracy, and since all mismatches are to be resolved by further analysis, this test will in addition indicate the degree of sequence change due to mutation in the clones being used for sequencing.

The European and Japanese groups have different methods to measure sequence accuracy, but have the same goal of less than one error in 10,000 bases.

Annotation

Proper annotation of sequences to indicate the position, structure and nature of each of the coded genes is a critical component, and in fact the primary product, of the genome project. It is clear, though, that initial annotation of sequences is not fully (or even very) accurate, as the software and algorithms used for gene recognition can miss exons and introns, and can also indicate the presence of exons or introns where there are none. This is as true in animal genome projects as in plant projects. Thus, annotation will have to be done in stages, with initial annotations that can be useful, but that must be acknowledged to be flawed.

Each of the sequence groups performs its own annotation, as this is not only an interesting part of the work, but also helps with continued sequencing. It was agreed that, to provide the highest quality initial annotation, each group would use multiple software programs for gene recognition, and would indicate in its output the product of each of the programs (something that GenBank cannot do; thus this requires output to be in a form other than that sent to GenBank or equivalent public databases). It should be emphasized that doing this does not remove the requirement for inclusion of the output in public databases like GenBank or DDBJ. In addition, experimental means of annotation are to be used by each group - that is, sequences must be compared with the EST sequences that are available and that indicate actual RNA sequences, and must be compared with the genes of known structure that have been individually studied. Furthermore, feedback from the community of Arabidopsis researchers should be invited by each group, to allow correction or improvement of each group's annotations.

As the genome project proceeds, it is important to consider additional experimental methods for gene recognition, and the application of such methods should be considered important goals for the project. Among the experimental methods to be considered is sequencing of related genomes (such as those of Arabis lyrata or Cardaminopsis petraea, see http://www.arabis.net/wild.htm). Because exonic sequences change more slowly than intronic or intergenic sequences, this could serve as a very useful indicator of gene location and exon boundaries. Additional experimental means for improving annotations include RNA blots and RT-PCR to find if suggested genic sequences in fact correspond to RNAs, and full-length sequencing of large numbers of cDNA clones for comparison to genomic sequences.

Maintenance of summary lists of identified genes according to the type of protein coded (see Bevan et al. 1998, Nature 391:485) is also an important aspect of annotation.

Because annotation methods and the experimental information on which they are based is subject to continual improvement, frequent reannotation is worthwhile. Both the Kazusa and TIGR groups have plans for systematic reannotation of sequences from all groups. To facilitate this and, especially, to facilitate community access to annotations, it was agreed that all groups would work toward a standardized format for data presentation, and that groups doing large-scale reannotation would make their data freely available for mirroring on the web sites of all groups that wish to display them.

Data release

Each of the U.S. groups sends sequence out unannotated and in small fragments as soon as it reaches either approximate 2 kb contigs or 7x average coverage. The sequences from two of the three groups are sent at this stage to the high throughput genome sequence (HTGS) part of GenBank, the third group has agreed to start doing this as well. The sequences are now sent to each group's own web page, each of which supports BLAST searches, and are also sent at short intervals to AtDB, the public Arabidopsis database, where they are also BLAST searchable ( http://genome-www2.stanford.edu/cgi-bin/AtDB/nph-blast2atdb).

The structure of the European projects, where sequence-ready clones are allocated to many groups, and each group has some discretion (and rules from their own national government) in how to sequence and when to submit completed sequence, does not lend itself to identical release methods or policies. Nonetheless, the groups agree to collect and distribute through MIPS and AtDB all sequences as soon as practicable, at latest after completion and before annotation.

The Japanese group also has its own policies and level of funding for informatics, which so far have dictated that sequence be released only after both completion and annotation, and then posted to DDBJ (DNA Database of Japan) and GenBank. This entails a delay in public access relative to other groups, as the time from completion to annotation is about a month, and the time from acquisition of the earliest data to completion is also appreciable. The Japanese group will consider mechanisms for earlier release, within the constraints of policy and of funding for this aspect of the project.

Clone registration (intention to sequence)

One critical aspect of the project is coordination between groups on the clones to be sequenced, as without tight coordination, duplication of effort will occur, especially in the closing phases of the project. In addition, as different groups complete their assigned regions, reallocation of regions may become necessary so that groups ahead of their predicted rate can help by sequencing clones originally assigned to other groups. At present this coordination has been supplied by direct communication between the groups, and by the function of an international coordinating committee of the Arabidopsis Genome Initiative (AGI: see http://genome-www3.stanford.edu/cgi-bin/Webdriver?MIval=atdb_registry_info.html). This committee will remain the arbitrator of international sequencing efforts, but will be supplemented with a new committee that will allow for closer coordination of the U.S. groups. This new committee has been mandated by the U.S. funding agencies, as a replacement for the three separate advisory groups that now exist, one for each group.

One of the tasks of the U.S. committee will be clone reallocation, and in addition frequent communication with the members of the international AGI committee, as a way of stimulating continued discussion among all groups. As representatives of all groups will be invited to the meetings of the U.S. committee, these meetings may also be able to serve as a forum for discussion and decisions of the AGI committee. This may help the AGI by increasing the frequency of its considerations.

NEW U.S. STEERING COMMITTEE

Given the important new role of the mandated U.S. Steering Committee as arbitrator and communication facilitator between the U.S. groups, and as aid to the AGI committee on the international front, the role a responsibilities of the committee were discussed and agreed upon.

The U.S. Steering Committee will have the following responsibilities:

1) Setting boundaries between the U.S. sequencing groups (ideally, to be defined by sequenced clones) to avoid duplication of effort in chromosomes where more than one group is working

2) Reallocation of clones or chromosome regions from one group to another to fit sequencing capabilities to the remaining work.

3) Monitoring and enforcement of the common agreements described earlier in this report, namely the agreement to work toward a common annotation format, to provide quality control information both from base calling programs and from clone overlap regions, and to monitor sequence release compliance.

4) Providing annual progress reports to the Arabidopsis community and to the U.S. funding agencies, separate from the progress reports of each of the individual sequencing groups. These reports will include a careful consideration not only of amount of sequence provided by each group, but of progress in all respects, balanced so that groups taking on difficult clones to sequence, or who are in closing phase and thus must devote time to closing gaps, are given full credit for such efforts. In addition, these reports are to detail progress in the informatics aspects of the project, including a summary of the progress and needs of the Arabidopsis database - as an interface between the database and its advisory committee, the sequencing groups, and the Arabidopsis community.

5) Provide an interface between the U.S. groups and the international AGI committee, and act to facilitate the setting of boundaries and clone reallocation at an international level.

6) The committee should endeavor to meet in person at least once a year, and have regularly scheduled meetings by electronic mail or conference call.

The composition of the committee is as follows:

Members:

  • 3 members of the U.S. Arabidopsis community, initially appointed; with rules for succession and for input from the North American Arabidopsis Steering Committee. Chairmanship will rotate among these members annually.

  • 1 non-U.S. member of the international Arabidopsis research community, to be appointed by the chair of the Multinational Steering Committee

  • 2 genome sequencing experts from projects other than the Arabidopsis project

  • 1 expert in genome databases

  • 1 principal investigator of the Arabidopsis genome database AtDB

    Ex officio:

  • representatives of each of the six genome sequencing groups

  • representatives of U.S. and international funding agencies

    The actual members of the committee who have so far agreed to serve:

    Elliot Meyerowitz, chair (U.S. Arabidopsis community)

    Daphne Preuss (U.S. Arabidopsis community)

    Gerd Jürgens (international Arabidopsis community)

    Ex officio:

    Joe Ecker, SPP

    Dick McCombie, CSHSC

    Steve Rounsley, TIGR

    Ian Bancroft, ESSA III

    Francis Quetier, Genoscope

    Satoshi Tabata, Kazusa

    Recommendations for the other members were:

    Joanne Chory, Pam Green or Detlef Weigel (U.S. Arabidopsis community)

    Mark Johnson, Richard Gibbs, John Sulston, Maynard Olsen (sequencing experts)

    Mark Boguski (database expert)

    Mike Cherry (AtDB representative)

    FINAL PROSPECT

    Given sufficient funding, which seems very likely, there is no technical obstacle to the completion of the Arabidopsis nuclear genome sequence by December 31, 2000. Although the efforts of the project members must be focused tightly on finishing the sequencing, it is not too early to begin considering the next steps, among them experimental methods for annotation, and functional analyses of genes and gene families.

    submitted by:

    Elliot M. Meyerowitz July 15, 1998

    Appendix 2

    Summary of December 1998 AGI Meeting at CSHL

    1. Daphne Preuss summarized her work on centromeric regions and presented detailed information on approximate map locations of BAC contigs and sequenced BACS based on hybridization (Altmann) and fingerprint (WashU) data. She agreed to make this information available to the community. Rob Martienssen stressed that individual clones would need to be compared closely with fingerprint contigs constructed at WashU because some hybridization data were unreliable.

    2. Each group discussed their estimated sequencing capacity and assigned chromosomal regions for the coming year. Kazusa expects to finish their assigned regions on III and V by the end of 1999. ESSA and CSHL/WashU may also complete their assignments on IV and V at about the same time. SPP is continuing with chromosome I and was encouraged to avoid starting many additional nucleation points in order to focus on the same closure issues being addressed by the other groups. Genoscope has begun sequencing the bottom arm of III and will continue with this region through 2000. TIGR expects to finish chromosome II by summer 1999 and will therefore be the first funded group to run out of an assigned region to sequence.

    3. AGI members discussed the importance of finishing difficult areas within assigned regions of the genome while also continuing to make rapid progress on other regions to maximize release of information to the community.

    4. Both TIGR and Kazusa proposed to begin sequencing the "unassigned" top 5-6 Mb of chromosome III during 1999. After considerable discussion, both at the AGI meeting and later in the conference when Satoshi Tabata arrived, a consensus was reached to have TIGR begin sequencing this region of chromosome III during the spring of 1999 with the aim of finishing this region by January 2000.

    5. Starting in January 2000, TIGR, Kazusa, CSHL, and ESSA will likely have residual sequencing capacity ready to shift to centromeric regions and portions of chromosome 1 that have not yet been completed. By this time a minimal tiling path based on fingerprint data should be available to facilitate assignment of remaining BACs to AGI members. SPP has funding to complete most or all of chromosome I but recognizes that the entire genome

    may be completed more rapidly if other groups contribute in the year 2000 to sequencing portions of this chromosome (or possibly part of the bottom of chromosome III depending on progress made by Genoscope) after their own assigned regions have been essentially completed.

    6. Marcel Salanoubat and Francis Quetier led a discussion of the Genoscope policy for sequence release. While it was clear that the informatics capabilities of the individual laboratories in their program varied significantly, there was a general agreement that the group should strive for immediate release of sequences (at least for the bigger laboratories within their program).

    7 . Rob Martienssen and David Meinke discussed the status of the CSHL/WashU consortium plans to continue sequencing and fingerprinting efforts. NSF has now received all of the necessary paperwork for continued funding of this consortium and expects to make an award at a level sufficient to enable sequencing another 2.4 Mb per year starting early in 1999. In addition, NSF has recommended funding an informatics person at WashU to finish editing of fingerprinted contigs and establishment of an interactive version of the BAC physical map that can be accessed via the Internet. This person will work closely with AtDB to avoid duplication of effort.

    8. The CSHL/WashU group has agreed to release to other sequencing groups all of their edited contig information and fingerprint database through their ftp site no later than the end of January, 1999. The SPP and TIGR groups are particularly anxious to make use of this information in order to avoid repeating the contig-building steps that have already been completed elsewhere. Rob Martienssen agreed to provide as soon as possible a minimal BAC tiling path for regions of the genome that may require coordination during the final year of the project..

    9. Joe Ecker and David Meinke discussed a proposal by Hiroaki Shizuya at Caltech to fingerprint and end-sequence a new BAC library with large inserts (180 kb average). The general consensus was that although this library might be very useful in regions of the genome with minimal coverage and could reduce the overall cost of sequencing other regions by reducing overlaps, it was unlikely that many AGI participants would immediate move away from using TAMU and IGF clones for the bulk of their sequencing efforts. NSF is willing to discuss further the potential value of this library with interested AGI members.

    10. Rob Martienssen agreed to serve as the next AGI chairperson. There was general agreement that AGI members should meet again in summer 1999, perhaps at the next Arabidopsis meeting in Australia, to assess progress and make specific plans for the future.

    Joe Ecker, AGI chairperson

    Appendix 3

    DATABASE NEEDS OF THE ARABIDOPSIS COMMUNITY

    I. VENUE AND PARTICIPANTS

    To assess the current and future database needs of the Arabidopsis community, an NSF-supported workshop on this topic was convened in Madison Wisconsin on June 28, 1998. The workshop participants included the following individuals:

    Rick Amasino, University of Wisconsin
    Mary Anderson, Nottingham University
    Mike Cherry, Stanford University
    Joanne Chory, Salk Institute
    Maarten Chrispeels, University of California San Diego
    Jeff Dangl, University of North Carolina
    Keith Davis, Ohio State University
    Allan Dickerman, National Center for Genome Research
    David Flanders, Stanford University
    Pam Green, Michigan State University
    Bertrand Lemieux, University of Delaware
    David Meinke, Oklahoma State University
    Larry Parnell, Cold Spring Harbor Laboratory
    Daphne Preuss, University of Chicago
    Ralph Quatrano, Washington University
    Ernie Retzel, University of Minnesota
    Steve Rounsley, The Institute for Genomic Research
    Randy Scholl, Ohio State University
    Chris Somerville, Carnegie Institution of Washington and Stanford University (chair)
    Desh Pal Verma, Ohio State University

    The following individuals provided valuable written comments prior to the meeting (Appendix I):

    Jean Greenberg, University of Chicago
    Katie Krolikowski, Harvard University
    Russell Malmberg, University of Georgia
    Jose Martinez-Zapater, Biology Molecular y Virologia Vegetal, CIT-INIA
    Natasha Raikhel, Michigan State University
    Pierre Rouze, Flanders Institute of Biotechnology
    Chris Town, Case Western Reserve University
    Desh Pal S Verma, The Ohio State University

    In addition, the workshop was attended by the following observers:

    Peter Bretting, USDA/ARS National Program Staff
    Greg Dilworth, Department of Energy
    Machi Dilworth, National Science Foundation
    Margarita Garcia, Stanford University
    Paul Gilna, National Science Foundation
    Xiaoying Lin, The Institute for Genomic Research
    Bob MacDonald, US Department of Agriculture
    DeLill Nasser, National Science Foundation

    II. GOALS

    The general goals of the workshop were to examine the present and future database needs of the Arabidopsis community and to outline in general terms the main issues which should be addressed in any future proposals concerning the development of new or expanded Arabidopsis databases. The discussions were intentionally focused on biological and community issues and there was no attempt to define or specify issues which are related to specific computer hardware or specific database programs. In particular, no assumptions were made concerning continued government funding of any current Arabidopsis database activities.

    A previous workshop with these goals was held on June 5th and 6th, 1993. A copy of the published summary that workshop was provided to all participants and served as a reference to earlier views and objectives of the Arabidopsis community. [1993 Dallas Workshop Report] In addition, participants were provided with a draft summary of a BBSRC-USDA bilateral plant bioinformatics and coordination meeting held at Llangollen Wales, March 22-24, 1998. A copy of a memorandum, dated February 26, 1998, from the North American Arabidopsis Steering Committee to the curators of AtDB, concerning the current Arabidopsis community database needs was also provided. [NAASC Memorandum] Finally, in preparation for the meeting, written comments solicited from the community on the Arabidopsis electronic newsgroup were provided to the participants before the meeting. A copy of the solicitation and written comments are appended as Appendix I.

    III. RATIONALE FOR AN ARABIDOPSIS DATABASE

    The genomes of higher plants, such as Arabidopsis, contain approximately 25,000 genes. During the next several years, the sequence of the Arabidopsis genome will be completed and extensive sequence information will become available for many other species, including many plants. Most or all of the Arabidopsis genes will be used to develop gene chips or microarrays that permit simultaneous measurements of the expression (mRNA levels) of all of the genes. These will be used to generate information about the expression of all the genes in the organism in response to a wide variety of treatments and genetic backgrounds. Each experiment could have as many as 25,000 data points for each time point or treatment of each genotype! Comprehensive libraries of insertional mutations will permit the isolation, by reverse genetics, of null mutations in any Arabidopsis gene. Extensive collections of enhancer-trap or promoter-trap lines are being developed that permit sensitive analyses of the spatial patterns of gene expression down to the single-cell level. Thousands of new classes of mutants will be isolated by selecting for suppressors or enhancers of existing mutations. The corresponding genes will be cloned by very high resolution mapping of the mutations so that a limited number of candidate genes which are evident in the delimited region of genomic sequence can be directly tested for complementation. This will depend on the development of very high resolution maps. It seems likely that high resolution proteomics methods will become important for identifying the substrates of the thousands of kinase genes that form many of the regulatory networks in Arabidopsis and other plants. Additionally, extensive genomic-based work in other plant species will produce a flood of sequence information. The value of much of that information will be greatly enhanced by comparison with the aggregate information available in Arabidopsis. Thus, we are entering an era of explosive growth of knowledge about Arabidopsis in particular, and plants in general. Most of the data generated by the projects described above will never appear in printed journals and will only be available to the community through electronic databases.

    Because Arabidopsis is one of the most intensively studied organisms, and is a direct model for 250,000 closely related species, we believe that it is appropriate to undertake a major investment in developing new information retrieval tools (IRTs) for Arabidopsis in particular and plants in general. By this we mean that because we will know everything about Arabidopsis, it is a suitable object on which to focus the building of a comprehensive database or set of linked databases. However, because the value of Arabidopsis derives from its utility in understanding other plants, it would be desirable to build a structure that permits facile high resolution linking of specific information about Arabidopsis to all other plants.

    Looking into the future more generally, it is apparent that scientific publishing is undergoing a much needed revolution. All of the major journals will be electronic within a few years and once that transition is complete, scientists will develop new tools for interacting with data. The complexity of biological knowledge in many fields is such that new mechanisms for integrating data are required. The development of computer programs that calculate genetic maps "on the fly" from currently available data is an early example of what will become a more general mechanism for integrating data. Integrated graphical representations of patterns of gene expression in individual cells of three dimensional models of organisms at various developmental stages is another example that is under development. With such a model it will be possible to find relationships between objects (eg., genes) and processes that would be difficult or impossible with current information retrieval technologies.

    Because of the changes taking place in publishing, there may be an opportunity to develop databases that will eventually be self supporting in the same way that journals are self supporting. As the distinction between the format blurs, the concept of paying for a database subscription will become commonplace. However, there are many complex issues associated with imposing charges for database use and the question is largely academic at present.

    There are many challenges in developing a new generation database. Perhaps the foremost is the difficulty in collecting information from the thousands of scientists who produce primary information for conventional publication in journals.

    IV. CURRENT PUBLICLY SUPPORTED DATABASE ACTIVITIES

    The principal publicly supported Arabidopsis database activities are the AtDB database at Stanford University and the stock center databases maintained by the Arabidopsis resource centers at Ohio State University and the University of Nottingham. In addition, the University of Minnesota supports an EST database for all plants, and each of the Arabidopsis genome sequencing groups provides database access to genomic sequences, including BAC end sequences.

    The AtDB goal is to provide the plant-biology research community with convenient and correlated access to the publicly available results of Arabidopsis research. This includes published and otherwise freely available information about the genome, the genes it contains, the gene products, their positions on genetic and physical maps, as well as DNA sequences. The users of the database are very diverse, ranging from Arabidopsis molecular biologists to biologists focusing on any other organism. The members of the AtDB project are currently shared with the Saccharomyces Genome Database, and the database administrator is shared with the Expression Microarray database and Genetic Footprinting database projects, all located at the Department of Genetics at Stanford University. In an effort to minimize wasteful duplication of effort, the AtDB project uses much of the same software and staffing structure as the Saccharomyces Genome Database (SGD). The combined SGD and AtDB groups thus benefit from an economy of scale by sharing computing and human resources.

    At a meeting of the Arabidopsis genome community in 1992 at the Cold Spring Harbor Banbury Center, a consensus was reached that AtDB should take responsibility for providing centralized access to Arabidopsis databases, a recommendation that has been repeatedly endorsed by the North American Arabidopsis Steering Committee. Since that time AtDB has been supported by a grant from the National Science Foundation. However, the annual level of support for AtDB has been only a small fraction of the support provided for database activities for similarly advanced models such as Drosophila, yeast and mouse.

    V. SUMMARY OF CONCLUSIONS AND RECOMMENDATIONS

  • The main conclusions from the workshop were consistent with the conclusions of the 1993 workshop.

  • The Arabidopsis community has a large number of unmet needs for database services that are required to make efficient use of existing information.

    The highest priorities for database content are:

  • Integration of the physical map and genetic map

  • Detailed and consistent annotation of the genomic sequence

  • Gene chip and DNA microarray data

  • Information about forward and reverse genetics

  • Spatial and temporal information about gene expression and tools for visualizing such information.

  • Protein localization and proteomics

  • Phenotypic information about mutants

  • Federal government support for increased Arabidopsis database capabilities is of crucial importance to continuing progress in understanding all aspects of plant biology. This knowledge is a vital component of the mechanisms that support continuing agricultural productivity, environmental stewardship and development of a robust agricultural biotechnology industry. Support for such databases should be long-term and contingent upon guarantees that all information in the databases will be freely available to the international scientific community.

  • The main focus of Arabidopsis database activities should be service components. However, in those cases where publicly available computer programs do not meet the needs of the Arabidopsis community, adequate resources should be made available to support the development of components that are required to fully implement the service functions of Arabidopsis databases.

  • Arabidopsis databases should provide an intellectual focus for the interpretation, synthesis and integration of biological data. The value of such a resource will be proportional to the ability of the databases to acquire all relevant data. Since past experience has indicated that it is not feasible to rely on the community to submit all useful information to databases, the Arabidopsis databases must be professionally curated by paid curators. In addition, mechanisms must be explored for obtaining data directly from authors in conjunction with publication in journals. The development of user friendly internet-accessible data entry forms that would allow direct deposit of information by members of the community into public databases must also be pursued.

  • US Arabidopsis databases activities must be linked to the community through an oversight committee that includes representation from, and is approved by, the North American Steering Committee.

  • Arabidopsis databases must be accessible internationally via the internet using commonly available internet browsers.

  • Although all Arabidopsis information need not be archived in a single database, it is essential that all Arabidopsis databases be able to seamlessly communicate with each other and with other major databases, such as the nucleic acid databases.

  • Because the databases are a world resource, an effort should be made to coordinate the development of Arabidopsis database activities internationally with a view to sharing the costs of curation and development of new tools.

  • It is not desirable or appropriate to attempt to implement partial or complete cost recovery of Arabidopsis database services in the forseeble future.

  • An encyclopedic database which interrelates all aspects of Arabidopsis biology remains an attractive long-term goal.

    VI. WHAT SHOULD BE IN THE DATABASES?

    The long-term goal is to provide interconnected access to all information about Arabidopsis. However, certain classes of information should have a higher priority for immediate inclusion and also require a high degree of curation in order to be most useful to the community.

    A. Map-Based Information

    At present, many laboratories are engaged in cloning genes by map-based cloning methods. The use of map-based cloning is expected to continue indefinitely and to become the most widely used method of cloning genes in the future. The ease with which this can be accomplished is directly proportional to the availability of information about genetic and physical maps, polymorphisms, and large clones. Thus, the greatest current need is a unified genetic and physical map that incorporates all available information about polymorphic markers (eg. CAPS, SSLPs, RFLPs), mutations, BAC and YAC clones, mapped clones and insertions or other modifications of the genome.

    Because of the pending completion of the genomic sequence, the state of the genetic map is expected to change dramatically during the next several years as sequence-based markers become anchored on the genomic sequence. The availability of the sequence information will enhance the value of the integrated map because it will stimulate map-based cloning efforts which will remain dependent on a high density of polymorphic markers. The integration of the genetic and physical maps should be undertaken by a group with appropriate expertise in both genetic and physical maps and database management and curation.

    Ready, access to primary mapping data should be given highest priority in database development. Map information should be collected and presented in a manner that allows the user to determine what is known, plus what remains questionable or unresolved with respect to map locations of genetic and molecular markers in combination with a complete physical map anchored to the complete nucleotide sequence. In constructing the database, it should be remembered that recombination data generally provide only rough estimates of map location, and that mapping data may differ widely in quality and reliability. Therefore, some database users may prefer direct access to primary mapping data in order to compare their results with those obtained in other laboratories. A database that provides options for visualizing several different maps constructed with different mapping functions or subsets of markers and primary mapping data would be particularly valuable to the Arabidopsis community.

    Any proposal for database development should also discuss in some detail how the integrity of these maps would be verified and maintained. Some mutations and cloned genes are likely to be known by several different names. It will therefore be important to establish a database that will accommodate multiple changes in nomenclature. Other plant databases are moving toward the use of standard gene names as described in the Mendel database. The Arabidopsis databases should also adopt this policy to ensure compatibility with other databases.

    Provisions should also be made to add new types of information to genetic and physical maps as they become available (break points of chromosomal aberrations; regions of extensive heterochromatin; regions with a high/low degree of sequence homology to related plants; etc.).

    B. Sequence information

    The value of the genomic sequence will depend on the quality of the annotation. The goal for the quality of annotation should be similar or identical to that of other higher organisms. It should be possible to arrive at an integrated map of a gene by various routes. A user should be able to begin a query with a sequence, a gene name, a keyword or a genetic map location. A user should be able to highlight a region of the genome on a graphical display and move to increasingly higher levels of resolution with the click of a mouse. For example, one might start with a whole chromosome, then move to a ~10 cM region which shows the contigs of BACs and YACs, the mapped mutations, the sites of insertional mutations or launching pads for transposons. Next the user should be able to visualize a ~1 cm region showing all of the above features plus the locations of open reading frames (theoretical and verified), ESTs, polymorphic markers, potentially polymorphic markers (ie,. SSLPs). Finally, at the next level of resolution the user should be able to visualize the DNA sequence, the various putative open reading frames indicated by gene finding programs, experimentally verified genes, ESTs, BAC and YAC end sequences, polymorphisms, mutations and other known aberrations. The open reading frames should be linked to information about gene expression, experimentally verified information about gene function, mutant phenotypes associated with classical mutations or over or under expression, theoretical information about gene function based on inference from other organisms, subcellular localization of the gene product, known or predicted modifications of the gene product. If there are other genes of similar structure in the genome, the presence of these genes should be indicated. Similarity to genes from other plants should be indicated with a link to the appropriate databases. The control regions of the genes should be annotated with known or predicted motifs and with information about the identity of other genes with similar motifs.

    The sequence information should not simply be a link to raw sequence in GenBank because the level of annotation and tools to manipulate that sequence do not directly support the kinds of queries made by most biologists. Thus, the sequence should be directly available from a specialized database which provides useful tools for manipulating the sequence. It should be possible to retrieve from the database sequence information based on map position, type of sequence, or other specific requirements. All information should be linked to publications describing the data when possible.

    Because the sequencing groups are not expected to have the resources to provide continued annotation, there will be a need for a group to take responsibility for continued upgrading of the annotation of the genomic sequence as information about the sequence becomes available from direct experimentation and from computational analyses based on experimental results obtained with other organisms.

    C. Expression information

    The use of microarrays and gene chips are expected to provide a massive amount of new information. Most or all of the Arabidopsis genes will be used to develop gene chips or microarrays that permit simultaneous measurements of the expression (mRNA levels) of all of the genes. These will be used to generate information about the expression of all the genes in the organism in response to a wide variety of treatments and genetic backgrounds. Each time point or treatment could have as many as 25,000 data points. Because the experiments are technically straightforward, it seems likely that a common type of experiment will be to prepare mRNA from a mutant and a wild type and to compare the consequences of the mutation on the expression of all the genes in the organism. In addition to simply archiving the raw data it should be possible to query the data in various ways. For instance, as data from different treatment accumulates, it will become possible to search for genes that are coregulated with a gene. This kind of query may provide insights into the identity of otherwise anonymous genes or reveal the existence of networks. It should also be possible to identify all the factors that cause altered expression of a gene, to identify all genes that specifically respond to certain treatments, to identify mutations that cause similar effects on gene expression. For these kinds of queries it will be necessary to have software that can identify data sets that are most similar from among hundreds or thousands of different data sets produced by different treatments.

    There is also a large need for a repository for information about spatial aspects of gene expression. There are now many transgenic lines which exhibit specific spatial patterns of reporter gene expression, and cloned genes which confer such patterns. In the short term a database with a controlled vocabulary for the various cell and tissue types and linked images of the patterns of gene expression would meet immediate needs. In the longer term, it would be useful to have graphical tools that would integrate the patterns of gene expression into an organismic model.

    D. Phenotypic Information

    Because of the diversity of processes that are being analyzed by a mutational approach in Arabidopsis, there is a need for facile access to information about gene function as it relates to the organism. One aspect of the problem involves determining the genetic basis for a phenotype. In this case it should be possible to enter a description of a phenotype and obtain a ranked list of probable genetic alteration that could give rise to the phenotype. Conversely, it would be very helpful to be able to enter a gene name and obtain a description of the corresponding mutant. This capability will greatly enhance the efficiency with which new mutations will be studied as the number of known mutations begins to plateau. It is expected that we will soon have saturating collections of transposon mutants, so having ways of describing these phenotypes, and making them accessible, will be important. No capability of this kind currently exists.

    One strategy may be to use organizational schemes as entry points (phenotypic indexes, so to speak). One such index is the genetic map position. Knowledge of this provides an entry point to other mutants and papers. Another possible organizing scheme could be based on the EcoCyc database format of metabolic pathways, so that biochemical phenotypes could be correlated, or the knowledge of existing pathways could be queried. The user would click on a pathway and learn what was known about this. Another way of indexing and accessing the data for development might be to have a standardized Arabidopsis growth animation - at appropriate times during the growth animation, a user could click on a graphic representation of an organ or other feature, and then this would lead to additional information. Clicking on a rosette leaf might lead to various types of leaf cells or indexed leaf morphologies.

    E. Stock-Based Information

    The databases maintained by the two Arabidopsis resource centers at Ohio State University and the University of Nottingham provide excellent access to information on the availability of biological and chemical materials related to Arabidopsis research. These databases have implemented many of the recommendations of the 1993 workshop report and should continue to assume responsibility for descriptive information concerning seed stocks, clones, vectors, libraries, cDNAs, oligonucleotides, and any other materials that may require distribution to the Arabidopsis community. Emphasis should be placed on careful documentation of biological materials, controlled vocabularies, and maximal utilization of sophisticated graphics to display plant phenotypes, molecular hybridization patterns, and other data where appropriate.

    With respect to seed stocks, it should be possible to search the database by general phenotype, not just by gene symbol, in order to obtain a broad listing of ecotypes and mutant lines with similar features. Information on phenotypes, screening methods, growth conditions, and differences between alleles should be included for all mutants available through the stock centers. It should also be possible to obtain information on additional mutants or alleles that have been isolated in specific laboratories but are not available from the stock centers.

    Individuals should be able to search for specialized libraries, vectors, transgenic lines, and molecular reagents (antibodies, purified proteins, unusual compounds, and biochemical standards) required for Arabidopsis research.

    The stock center databases should be directly linked to a central Arabidopsis database so that queries about the properties of a gene or mutant can lead directly to a query about the availability of the resources used to study these or related aspects of the biology.

    F. Community-Based Information

    During the past several years there has been a proliferation of electronic resources that provide easy access to information on a wide range of community issues. For instance, it is now relatively easy to retrieve contact information for colleagues or previous postings on the Arabidopsis newsgroup, the abstracts for meetings are available on line and there is an electronic Arabidopsis journal, Weeds World, which provides a forum for discussion of methods and problems and publication of short papers. Many laboratories have mounted web pages that provide detailed information about specialized methods, specialized databases or collections of genetic materials. The curators of AtDB have provided convenient access to these diverse resources by providing a web page that facilitates connection to these resources.

    While it is desirable to continue having one group take responsibility for maintaining a centralized launcher or "data warehouse" for Arabidopsis-related web sites, this should be a relatively inexpensive activity and should not require significant public financial support. The distinction betwee