Protocol for Assembly of Arabidopsis ESTs and Transcripts

Preparation of EST data

Preparation of non-redundant transcript database

Assembly

Summary of Assembly Results for Release 1.0
23921 EST sequences input
following quality control.
3764 assemblies produced,
containing 15218 ESTs (63.6%).
8703 'singletons' remained (36.4%).

Name Assignment

Names are assigned on the basis of the results of sequence similarity searches against protein and nucleotide databases. Only 'significant' matches are assigned names. The match is categorized in 4 different classes.

Name assignment for singleton ESTs is automatic. The results of blastx searches stored in dbEST are parsed, and the name of the top scoring protein match is stored if the score is above a threshold of 90. These matches, because they are not manually inspected, are not assigned a class category.

Name Assignment Summary
TCs 1826 assemblies (48.5%) have been assigned putative IDs.
Class Assemblies ESTs
1 641 (17%) 4275 (28%)
2 279 (7.4%) 1286 (8.5%)
3 888 (23.6%) 3808 (25%)
4 18 (0.5%) 71 (0.5%)
No Match 1938 (51.5%) 5778 (38%)
Singletons
(IDs parsed
automatica lly
from dbEST)
Significant Hit 2446 (32%)
No Significant hits 5175 (68%)
No data available 1082

Back to the tutorial