Enhanced stock searching facilities and direct
linking to sequence databases in AIMS via the World Wide Web
Randy Scholl and Jin Kim
ABRC, Ohio State University; and Dept. of Computer
Science, Michigan State University, respectively.
INTRODUCTION
A large volume of information is available on the
Internet relating to genetics and molecular biology of
Arabidopsis. However, this exists in different locations,
and the most effective terms to enter when executing
searches for information for these sources are sometimes
uncertain. Hence, two new features have been added to the
Web version of AIMS. These are automatic links to the
sequence databases (including EST Sequence Analysis
Database at the University of Minnesota and to EST
Sequence and GenBank databases of NCBI) and a structured
phenotypic searching tool for seed stocks which is based
on a fixed, systematic set of categories. These system
enhancements are described below, and examples of their
use are included. This is part of the continued adaptation
of AIMS to the Web, in which we have previously
implemented a number of improvements including stock
ordering and searching(see AIMS menu).
STRUCTURED PHENOTYPE-BASED SEARCHING
A. Description
Locating items of interest in a large database can
sometimes be difficult and becomes a greater challenge
when the user is unfamiliar with the terminology of the
subject. Also, use of terminology can differ between
individual scientists, and the database user is thus faced
with the challenge of ascertaining the terms that
individuals entering the data may have been employing. If
there is any inconsistency of terminology in data entry,
these difficulties are magnified. Hence, we have
developed a series of categories for AIMS in which all
mutants (seed stocks) can be classified and the
classification system then employed as a stock-locating
tool. This new addition is referred to as "structured
phenotypic searching" and has just been added to the AIMS
Web server - as a link from the seed stock searching page
(the address of the structured phenotype search page is
http://genesys.cps.msu.edu:3333/phenotype.html).
The logical base of this tool is an hierarchical
classification scheme developed as a collaboration between
members of the Chris Somerville lab and AIMS personnel.
In this scheme, mutant stocks are assigned at least one
category in the structure. The four main headings of the
classification are I. Development,
II. Cytogenetics, III.
Biochemistry and IV. Environmental Interaction. To see the
subcategories of any of these groups, you may click on the
appropriate category name. While the categories are
arranged into an outline format, there are also assigned
sequence numbers which are shown at the left of each line.
These exist simply for convenience of the user, and are
designed to assist in keeping track of one's location in
the outline at any point in time (or to flag a location of
a category for future use)
The logic of the category structure is that main
headings represent very widely based searches, subheading
are more specific and sub-subheadings still more specific.
Searching for the item "rosette leaf" will find any stock
affecting size, shape, color or pubescence of the leaf
(which are subheadings of "rosette leaf"). A stock which
has been classified as possessing a characteristic that is
general (e.g., the "Fertility" subsection (#61) of
Development) will not be found in searches specifying more
specific subsections (e. g., "Male sterility" (#62)).
For stocks which have complex phenotypes and/or
multiple effects, more than one of the categories may be
appropriate and are assigned. Hence, different search
choices may find the same stock. In this sense, the
category system also becomes a tool for identifying
different stocks affecting single aspects of development
or metabolism.
A search for stocks affecting silique development can
be used as an example of a simple category search. The
actions required to execute the search are; a) click on
"roots" (line #5 ), and then click on "Search." The
resulting page with the list of stocks, including choices
for additional stock details, stock ordering, viewing and
comparing of images operates exactly the same way as the
similar page of the standard stock search. (Click here to
connect to the search page and work the example.) Note
that, as previously, single or multiple stocks can be
highlighted for simultaneous viewing of details or order
placement.
More complex or concise structured phenotype searches
can be formulated using the structured phenotypic search
screen by executing joint searches among different
categories. For example, highlighting only "rosette leaf
color" (item #14) finds a very large number of stocks. If,
however, one suspects that a mutant of interest affects
both leaf color and shape, these two categories can be
simultaneously highlighted to achieve a joint search.
(Highlighting of multiple categories is achieved on the
Macintosh by clicking sequentially on the desired
categories while depressing the "Apple" key; in other
systems the action is similar with keys such as "control"
being used instead of "Apple.") Note that this search
will find multiple mutants which have genes cumulatively
affecting all of the selected categories as well as stocks
with single loci having multiple effects.
B. Comments
Some categories of this structure presently have
large numbers of assigned mutants and others are more
sparsely populated. Categories representing easily
identifiable phenotypes, such as rosette leaf color, have
large numbers. Searches utilizing such categories work
correctly, although they may be slow - requiring up to
several minutes to produce the found list of stocks. To
minimize long searching times, we have classified the
Feldmann T-DNA lines and recombinant inbred lines as Type
= "lab strains" as opposed to the "mutant" classification
utilized for individual mutant lines. Hence, choosing
"mutant" using the button labelled "Type:" at the bottom
of this screen eliminates these relatively large groups
from the search pool and speeds the process.
It is hoped that this scheme will assist researchers
in locating desired stocks among the rapidly increasing
array of mutants and other stocks of Arabidopsis. The
present categories were designed to be comprehensive but
not overly complicated. Undoubtedly there are neglected
areas, and aspects which could have been constructed more
logically. While alteration of the AIMS Sybase data
structure to accommodate major changes would require
substantial effort, it should be possible to make some
changes in the categories, as warranted. A few such
changes have been accomplished during the data entry by
ABRC, as deficiencies of the structure were identified.
Comments and suggestions on the structure are welcome as
are suggestions regarding category assignments of specific
stocks. We will do everything possible to make the
category structure most useful.
If you have any questions, comments or suggestions
regarding this database feature, feel free to contact ABRC
or the AIMS manager.
AUTOMATIC LINKS FROM AIMS TO SEQUENCE DATABASES
Several important resources relating to Arabidopsis
DNA sequences are now available via the World Wide Web.
Links among all of these databases have been in existence
for some time so that transfer among them is simple.
However, AIMS and the sequence databases at NCBI and the
Arabidopsis cDNA Sequencing Project deal with different
data relating to the same entities - namely sequenced DNA
clones, especially ESTs. AIMS maintains clone data, stock
center availability of clones and histories of orders
placed for clones. The sequence databases maintain
sequence information and a very useful variety of
similarity analysis results. Hence an opportunity exists
to create direct links between corresponding data items.
Linkages at both individual clone levels and general
search pages have been created in AIMS and are described
below. These links should complement the similar links
that exist between the Minnesota cDNA sequence database
and GenBank and between dbEST and GenBank.
A. Description of AIMS clone-to-remote sequence linkages
One potentially useful link of this type is a link
from a clone in AIMS (with associated stock data) to the
up-to-date sequence information available in GenBank,
dbEST and the Arabidopsis cDNA Sequence Analysis Project.
This has now been implemented in AIMS. Clone searches in
AIMS are executed from the DNA search page, and in the
resulting report page details of clones are accessed,
orders initiated and the corresponding sequence similarity
information from any of these three databases can be
automatically accessed. Here is an example which can be
worked through starting with a click on the hypertext
"Start" below; 1) After the hypertext link is executed,
the AIMS DNA search page should appear, and you then enter
a clone number (e.g., 124f20t7) in the clone name field;
2) Click on "Submit Query"; 3) Examine the table at the
bottom of the resulting search results page, and select
appropriate hypertext items to initiate automatic searches
in any of the three database. Click Start if you wish to
initiate this search. If ***** is found in a cell of the
table, the corresponding database cannot be accessed.
These links should work for all AIMS EST entries to both
dbEST and the Minnesota analysis database. In the latter
case, an entry will only be seen if similarity results
exist for the sequence in question. For all clones of AIMS
having sequence entries in GenBank, the automatic link
will be functional.
B. Description of AIMS main search-page links
Linkages from the main search page of AIMS to
GenBank, dbEST, the Arabidopsis cDNA Analysis Project and
ATGC at the University of Pennsylvania also have been
created. This is similar to existing links among a number
of the databases possessing Arabidopsis data. In the case
of these links, the transfer is to the main search pages,
where possible. Hence, searches within each of these
databases can be launched directly from this location.
C. Comments
The principle for database linking and the level at
which clone-sequence links are created pose several
choices. The linking principle to the Minnesota database
and dbEST is clone name, and GenBank accession number is
employed in GenBank. These are items that are reliably
associated with the correct sequence in the respective
database and have produced consistent results in our
initial testing. The level of linking is aimed at
producing the same list of items that would be initially
found by a search for clone name or identification number
in the respective database. A clear alternative to this
would be to target the specific sequence associated with
the clone/stock. However, the former was chosen since the
correct clone is usually the first item on the report
list, and the additional choices represent potentially
very useful information (e.g., as alternate or additional
choices for study). It is thus hoped that the most useful
possible link has been generated. Any suggestions for
improvement of these new linkages are welcome.
The contacts for questions or comments regarding
these or any other features of AIMS are: aims-manager@aims.cps.msu or
arabidopsis+@osu.edu. All
comments are welcome.