The largest contiguous region sequenced so far has been 16kb surrounding the GAP-A gene on chromosome 3. In addition to the GAP-A gene, 4 novel ORFs and a retrotransposon were identified, as well as a peculiar AT-rich tract. The density of genes in this area, together with the means of identifying ORFs, bodes well for the future of the programme; we can exect a gene density of 1 every 4-5kb, close to that predicted. Based on a genome size of 100 Mbp, this indicates a total of 20,000- 25,000 genes in Arabidopsis. Data is still coming on the large regions of chromosome 4, as the participating labs have had to learn new methods of large-scale sequencing. Regions of overlap between the 7 participants in this area show a high degree of accuracy in the independently sequenced areas, a necessary precondition for integrated genome activities. in the next 3-4 months the first year's quota of 350kb of chromosome 4 should be completed. A detailed analysis of this region will provide important new information on gene density, possible clustering of gene families, intergenic DNA composition, and the sequence of novel classes of plant genes.
A sequencing project is most interesting and useful if it aims to complete the genome in a reasonable time. It is clear that the EC doesn't have the resources or the will to do the entire genome on its own- it is most properly a joint effort both in terms of the relatively large sums required and in terms of distributing the effort fairly. Because of this, and in view of the unprecedented volume of valuable information to be obtained, US colleagues have agreed to join the EU effort from 1995 onwards. The EU network, once expanded mainly in the capacity of a few labs in the network to sequence on a megabase scale, will join with a US network consisting of 3-5 major labs, and sequence 10Mb each between 1996 and 1999. It is hoped that the sequence of the remaining unknown areas of the genome, comprising about 60-70 Mb, will then be sequenced in the next five years, using the latest methods adopted from the human genome programme. The major rate- limiting step in this plan is the provision of sequence-ready libraries; present cosmid coverage amounts to only 80% of the regions to be sequenced in the next 2 years. The increased effort being put into YAC coverage, particularly on chromosomes 4 and 5, means that YACs have to be the main source of sequence substrates. Because of this, new methods for deriving random libraries from YACs are being investigated.
Further information can be obtained from
the ESSA Coordinator, Mike Bevan, bevan@bbsrc.ac.uk