GREAT calculates statistics by associating genomic regions with nearby genes and applying the gene annotations to the regions. Association is a two step process. First, each gene is assigned a regulatory domain (i.e. a genomic range wherein regulatory elements are likely to target to the gene). Then, a genomic region is associated with all genes whose regulatory region it overlaps.
How does GREAT define gene regulatory domains?
As of yet, there are no clear methods for identifying the regulatory domain of a gene. It is known that regulatory elements can be located over one million bases away from their target genes and even jump over intervening genes1 . GREAT allows you to choose from among three approaches to defining gene regulatory domains.
Note that GREAT measures all distances from the transcription start site of a gene's canonical isoform.
Approach 1: Basal plus extension
Each gene is assigned a basal regulatory domain of a minimum distance upstream and downstream of the TSS (regardless of other nearby genes). The gene regulatory domain is extended in both directions to the nearest gene's basal domain but no more than a maximum extension in one direction.
Approach 2: Two nearest genes
Each gene is assigned a regulatory domain that extends in both directions to the nearest gene's TSS but no more than a maximum extension in one direction.
Approach 3: Single nearest gene
Each gene is assigned a regulatory domain that extends in both directions to the midpoint between the gene's TSS and the nearest gene's TSS but no more than a maximum extension in one direction. This has the effect of assigning a region to the single nearest gene.
Curated Regulatory Domains
In addition to the above association rules, GREAT utilizes a set of literature curated regulatory domains. Where experimental evidence demonstrates that a gene is directly regulated by an element that falls outside of its putative regulatory domain (as defined by the default Basal plus extension rule), GREAT includes a curated regulatory domain that extends the regulatory domain for the gene to include its known regulatory element.
Currently, the curated regulatory domains are:
- SHH: chr7:155130964-156277330
- HOXD global control region
- HOXD10: chr2:176423101-176655936
- HOXD11: chr2:176423101-176655936
- HOXD12: chr2:176423101-176655936
- HOXD13: chr2:176423101-176655936
- EVX2: chr2:176423101-176655936
- LNP: chr2:176423101-176655936
- Beta-globin locus control region: (PMID 11895428)
- HBB: chr11:5183507-5270700
- HBG1: chr11:5209878-5270700
- HBE1: chr11:5232664-5270700
|1||Lettice L.A., Heaney S.J., Purdie L.A., Li L., de Beer P., Oostra B.A., Goode D., Elgar G., Hill R.E., de Graaff E. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12(14):1725-1735 (2003).|