Many coding genes are well annotated with their biological functions. Non-coding regions typically lack such annotation. GREAT, the Genomic Regions Enrichment of Annotations Tool, assigns biological meaning to a set of non-coding genomic regions by analyzing the annotations of the nearby genes. Thus, it is useful in studying cis functions of non-coding genomic regions.
- Overview - When is GREAT useful, and for which uses should I prefer it to other annotation tools?
- Statistics - How does GREAT calculate enrichments, and how should I interpret results?
When is GREAT useful?
Many experimental and computational screens produce sets of interest for GREAT. One natural application is analyzing data from a chromatin immunoprecipitation (ChIP) experiment with a transcription factor of interest. To hypothesize the processes that involve a given transcription factor:
- Identify transcription factor binding sites via ChIP.
- Use GREAT to find annotations enriched among the genes near the binding sites.
- Hypothesize that the transcription factor helps regulate the processes whose annotations are highly enriched.
Other annotation enrichment tools are gene based. The input consists of a list of genes, and the tools provide annotations more common in the input list than in a background list of genes. This does not accurately model input sets of genomic regions because gene-based tests do not account for biases in the assignment of genomic regions to genes. Genes in gene deserts have larger domains of attraction. In other words, a random genomic region is more likely to be assigned to a gene in a gene desert simply because deserts provide large regions where the gene is the nearest one. GREAT more accurately models this situation. Thus, it more accurately calculates enrichments for a set of genomic regions.
GREAT also includes numerous ontologies providing a range of annotations. Many other tools use only the Gene Ontology, but it is useful to consider other types of annotation, such as protein domains and pathways.
|1||Huang, D. W., Sherman, B. T., and Lempicki, R. A. Systematic and intergrative analysis of large gene lists using DAVID Bioinformatics Resources, Nature Protoc. 2009; 4(1):44-57.|
|2||Dennis, G. Jr. et al., DAVID: Database for Annotation, Visualization, and Integrated Discovery, Genome Biol. 2003; 4(5):P3.|
|3||Boyle E. I. et al., GO::TermFinder – open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes, Bioinformatics 2004; 20(18): 3710-3715.|