Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Which set of genes does GREAT use?

To limit the gene sets to only extremely high-confidence gene predictions, GREAT uses only the subset of the UCSC Known Genes<ref name="hsu">Hsu, F. et al. The UCSC Known Genes. Bioinformatics. 22(9):1036-1046 (2006).1 Ashburner M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat Genet. 25(1):25-29 (2000).2 .

References

Footnotes
Ref Notes
1 that are protein-coding (cdsStart != cdsEnd), are on non-random and non-haplotype chromosomes, and possess at least one meaningful Gene Ontology (GO) annotation
2

.

GO includes information on the biological processes, cellular components, and molecular functions of genes. Thus, GREAT assumes that if a gene has been annotated for function at all then it is annotated in GO. Uninformative GO terms that do not allow entry into the gene set are 'Gene Ontology', 'biological process', 'cellular component', 'molecular function', 'obsolete biological process', 'obsolete cellular component', and 'obsolete molecular function'.

How does GREAT determine a single transcription start site for each gene?

Many genes have multiple splice variants, but GREAT requires a single transcription start site for each gene to calculate regulatory domains. So, GREAT uses the transcription start site of the canonical isoform of a gene. The definition of the canonical isoform is taken from the knownCanonical table of the UCSC Known Genes track<ref name="hsu">Hsu, F. et al. The UCSC Known Genes. Bioinformatics. 22(9):1036-1046 (2006).

  • No labels