What Is Expressed Sequence Tags
Expressed Sequence Tags are fragments of DNA sequence that are typically between 200 and 500 nucleotides in length. These tags can be created by sequencing either one or both of the terminal regions of an expressed gene. The plan is to sequence bits of DNA that represent genes expressed in specific cells, tissues, or organs of various organisms and then use these “tags” to fish a gene out of a portion of chromosomal DNA by matching base pairs.
The difficulty of identifying genes from genomic sequences varies among different organisms and is dependent not only on the size of the genome but also on the presence or absence of introns, which are DNA sequences that interrupt the protein-coding portion of a gene.
Figure 1: An overview of the process of protein synthesis.
How are expressed sequence tags made?
mRNA to Generate cDNA
The majority of an individual’s genome consists of introns, which are DNA coding sequences, or genes. This makes gene identification extremely challenging in humans.
The process by which these genes are translated into proteins is a complicated one that consists of two primary steps. Before protein can be synthesised, the DNA in each gene needs to be “transcribed” into messenger RNA (mRNA), a type of RNA that acts as a template for the production of proteins.
The translation of the mRNA subsequently leads to the synthesis of a protein. mRNAs in a cell are interesting because they do not have sequences from the spaces between genes or from the non-coding parts of many genes called introns. So, isolating mRNA is important if you want to find expressed genes in the huge human genome.
cDNAs to Generate ESTs
Figure 2: An overview of how Expressed Sequence Tags are generated.
Researchers use enzymes to convert mRNA, which is extremely unstable outside of a cell, into cDNA, or complementary DNA. To begin, cDNA is far more stable than mRNA and also contains only the expressed DNA sequence, as it was produced by removing the introns from mRNA.
After isolating cDNA that represents an expressed gene, scientists can sequence a few hundred nucleotides from each end of the molecule to create two types of ESTs.
A 5′ EST is produced by sequencing only the first portion of the cDNA. A 5′ EST is derived from the section of a transcript that normally codes for a protein. These areas are often preserved from one species to another and do not undergo significant alterations within a gene family. When you sequence the last bit of the cDNA molecule, you get something that’s called a 3′ EST.
Because these ESTs are derived from the 3′ end of a transcript, they are more likely to be found in non-coding, or untranslated regions (UTR), and so have lower cross-species conservation than coding sequences.