Nottingham

Bioinformatics Practical 2008

Do not close this window during the session or your answers will be lost

Sequence Comparison

A basic introduction to BLAST as a tool for sequence comparison, including the various concepts of homology and types of database available.


The demonstrator will give you a brief description of the concepts behind sequence similarity comparisons and the meaning of homology. This will include a definition of orthologues & paralogues, and the reason why the description ' % homology ' is a meaningless concept.

IMPORTANT: READ ALL OF THE INSTRUCTIONS
Virtual Experiment

You have experimentally sequenced a random clone from a cDNA library of Arabidopsis. Please find potential candidates for these genes using blastn


Protocol for gene 1:
  • Go to The blast server at NASC
  • DO NOT change any of the choices.
  • Read all of the text
  • Note the differences between the blast programs

Now - put the following DNA sequence into the appropriate box and perform a blast search

GAAGCATACTGTGACATGTTGGTTAAATATCGTGAGGAGCTAA
CAAGGCCCATTCAGGAAGCAATGGAGTTTATACGTCGTATTGA
ATCTCAGCTTAGCATGTTGTGTCAGAGTCCCATTCACATCCTCA

Wait for a match to appear

What is the chromosome position of the top match ?
Type it here:

What is its 'E-value'? :

What is the chromosome position of the next highest match ?
Type it here:

What is its 'E-value'? :

'Mouse over' the best match on the chromosome view.
Use the dropdown for the best match to see the ContigView (context).

What is the title of the gene ? :


SO what is an E-value and what does Blast do ?
Please find some answers here
Protocol for part two:
  • Please - go to The EBI
    Note, the default setting for program is different to the one you used above.
    Change the program to be the same as the default that you used for DNA
  • blastn the sequence of the gene (from above) again.
EMBL accession number using EBI

E-value using EBI

Are the answers the same when comparing EBI and AtEnsembl (above) ? Would you expect this ? Why ?


Please answer the following questions (briefly) using the help files at any of the links above (or Google):

What is blastp, how does it differ to blastn ?

What is blastX ?

What is TblastX ?


With the following protein sequence, please find a match using blastp at EBI

  • QHMLFPHMSSLLPQTTENCF
IF the search fails, reconsider the database that you are searching and how BLASTp differs from BLASTn. Retry the search.

What is the name and proposed gene function that is conserved between the best matches ?


Part II - please let the demonstrators know when you have reached this point


Microarrays
The demonstrator will give you a brief description of the concepts behind microarray experiments and how they allow for multiple parallel 'northern-like' experiments. This will include the concept that there is now 'too much data to handle without bioinformatics'. There is a summary presentation of Microarray use here

Experiment

A friend/collaborator has found evidence for an interesting putative function for the papaya homologue for an Arabidopsis gene in influencing crop yields and fruit size (note: this is fictional). You have decided to use the public databases to get an idea of what this gene may be doing and where it is expressed in Arabidopsis so that you can write a joint grant proposal with them.

All you have a is a bit of sequence from an Arabidopsis clone library picked out from a small molecular pilot project You have decided to use Affymetrix microarray and gene annotation data to look into this.


First go to AtEnsembl (The Arabidopsis Ensembl Resource)

Paste the following sequence into the search box and press the RUN> button.

TCTAGCTCCAACAAGCTTCATGAGTATATCAGCCCTAACACCACAACGAAGGAGATCGTA
GATCTGTACCAAACTATTTCTGATGTCGATGTTTGGGCCACTCAATATGAGCGAATGCAA
GAAACCAAGAGGAAACTGTTGGAGACAAATAGAAATCTCCGGACTCAGATCAAGCAGAGG
CTAGGTGAGTGTTTGGACGAGCTTGACATTCAGGAGCTGCGTCGTCTTGAGGATGAAATG
This will give you the conserved sequences in the arabidopsis genome (Blast works on genomes a little like Google works on the web).

Mouse-over the best match and select the "Show in Contig View" link - this shows you the gene in the context of the genome.

Look for the "Affy ATH1" lane (dark green), mouse-over and choose "Spot history".

What is shown on the x-axis ?

What is shown on the y-axis ?

Click on the small number of experiments in which the gene has gone up.

Explore the graph that you obtain and choose experiments of interest to explore further by reading the instructions.


Find 4 experiments that appear to influence the expression of this gene



This should give you some clues about the tissue localisation of this gene - go back to the genome view in AtEnsembl and find out more about the gene - does it confirm your 'experimental' results ?

Try some more genes and look around the array site for practice


To save your answers you can print this page to a printer or a file in the normal manner