Home | About NASC | Address/Staff
Links | Ask a question | HELP

NASC

The European Arabidopsis Stock Centre

Ontologies at NASC

NASC ontology home page

NASC are currently in the process of introducing plant ontology terms into the annotation of our germplasm lines. By introducing standards like ontologies into our germplasm, transcriptomics and genomics databases, we hope to improve the annotation and provide more powerful tools for searching and browsing our data. You can search or browse the plant ontology here; the ontology terms are associated to germplasm lines that have a mutant phenotype.

Below is a quick introduction to the plant ontology, and how NASC plan to incorporate it into our databases. Our work with the ontologies is at a preliminary stage but we are currently annotating stocks and developing tools that will utilise the ontologies and provide richer information for our users.

The problem

Biological databases are growing! A lot of this data is written in free text where the vocabulary used can vary depending on the author. Users currently have methods to search this data and get results based on the text they input (e.g. google). Unfortunately text based searches can often miss the data you were interested in because the source data was either misspelt, or represented in the databases under another name or synonym.

These problems are becoming more apparent in biology as the amount of biological data grows and the emergence of new technologies such as web services allow cross-database querying. There is a need for new tools that help users find relevant data from all these different sources. In order to achieve this, standards must be enforced for describing and representing this data.

Take the term 'bud' for example. This term has various meanings depending on the context in which it is used.

  1. (botany) A small protuberance on a stem or branch, sometimes enclosed in protective scales and containing an undeveloped shoot, leaf, or flower.
  2. (biology) An asexual reproductive structure, as in yeast or a hydra, that consists of an outgrowth capable of developing into a new individual.
  3. (medical) The primordial structures from which a tooth is formed

If you were to search for bud through a large collection of published papers, your result would include some false positive hits. However, if the word bud was given a standardised ID based on its context, it would be much easier for a computer to know whether you were searching for a botanical bud or a biological bud. There needs to be some method of classifying biological terms in a way that computers can reason over the context in which they want to be used. One solution is to develop what is known as an ontology.

What is an ontology?

An ontology can be defined as a classification methodology for formalising a subject's knowledge in a structured and controlled vocabulary. It is similar to a dictionary of terms that are specific to an area of knowledge. Each term within that area of knowledge is connected to other terms by a set of assertions or relationships. There are several consortia developing ontologies (e.g. gene ontology) that cover various aspects of biology, most of them can be found at the OBO (Open Biological Ontologies) web site. One of the ontologies we are interested in at NASC is the plant ontology. POC (Plant Ontology Consortium) is focused on developing ontologies for describing plant structures and growth/development stages.

The example below demonstrates how the term 'embryo is represented in the ontology. The term embryo is given an ID (knows as a PO accession) and annotated with a formal description and any associated synonyms. The term embryo is then connected to other terms in relationships defined by the ontology. All the plant terms in the current plant ontology are connected using three types of relationship - 'is a', 'part of' and 'develops from'.

The ontology instantly informs us that the term embryo is part of the seed, which is a sporophyte, which is a plant structure and so on. NASC are currently annotating mutant records in the database that have a phenotype association to a term in the plant ontology. In the example above there are 27 germplasm mutants in our database that have a phenotype associated to the term embryo.

What does this mean for NASC users?

NASC plan to implement ontologies including the gene ontology into our germplasm, affymetrix and genomics databases. Our first task is to begin annotating ~5000 arabidopsis germplasm that have textual descriptions relating to a phenotype observed by the original donor. We hope that the new ontology browser will provide users with better access to these lines. Have a go at using and querying the plant ontology here. At present there is only a limited set of stocks associated to the terms; however, this amount will increase as we get through the annotation. If you have any questions regarding the ontologies, you can e-mail us at NASC using curators@arabidopsis.info

Under development (PATO)

NASC are also working on an ontology that can be used to describe the phenotype of mutant plants. We will use a combination of the plant ontology with another ontology known as the Phenotype, Attribute and Trait Ontology (PATO). PATO is a generic ontology for describing certain traits or phenotypes that can be measured either quantitatively or qualitatively. It has been designed to be species independent so can therefore be used in combination with a wide range of other ontologies. PATO consists of two main term types, 'attributes' and 'values', which are related to each other and stored in a similar format to the plant ontology. You can download the development version of PATO from the Open Biological Ontologies site, where you can also join the phenotypes mailing list.

The overall model for describing phenotypes will comprise three ontology terms linked together to form an EAV (Entity, Attribute, Value) description. The entity, which in most cases is the noun you are describing, will use terms from the plant ontology. The attribute and value terms come from PATO and make up the verb part of the description.

EAV example using one of our germplasm lines N319

Free text description
Phenotype description in free text
Green dwarf. Broader leaves, glabra. Yellow seed.

The phenotype description above is the textual description supplied by the donor in his/her own words. By enhancing this description with an ontologised version we can perform more powerful searches for accessions with similar phenotypes in our database.

EAV description
Entity Attribute Value
PO:0000003 (whole plant) PATO:0000060 (relative_size) PATO:0000062 (small)
PO:0009025 (leaf) PATO:0000086 (relative_width) PATO:0000087 (wide)
PO:0009025 (leaf) PATO:0000096 (pilosity) PATO:0000099 (glabrous)
PO:0009010 (seed) PATO:0000034 (colour_hue) PATO:0000040 (yellow)

So...

The EAV annotations are still in their infancy and we are currently working on tools that will allow users to perform searches using a combination of plant ontology and PATO. We are also in the process of developing forms that will be available online for users and donors to submit phenotypes description using EAV standards. If you need guidance using the ontologies, please contact us. If you feel that any of the ontologies are lacking terms, or have errors please submit requests to the appropriate mailing lists. We encourage the plant community to adopt and use the emerging standard terminologies whenever describing genes, germplasm or writing research papers.