Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 37 Next »

What do the output columns mean?

Clicking on a column name causes that column to be displayed and all tables to be sorted by it. All Rank and p-value columns are sorted in ascending order, all others in descending order.

General

  • ID: Term identifier from the ontology

Binomial

  • Rank: ordinal rank of the p-value compared to the p-values of other annotations
  • Raw p-value: uncorrected p-value from the binomial test
  • Bonferroni p-Value: Bonferroni corrected p-value
  • FDR q-Value: False discovery rate q-value. Note that you cannot sort the table using this column (which is why its heading is in italics).
  • Fold Enrichment (Obs/Exp): fold enrichment of number of genomic regions in the test set with the annotation (k / (n*p))
  • Expected (n*p): expected number of genomic regions in the test set with the annotation
  • Observed Region Hits (k): actual number of genomic regions in the test set with the annotation
  • Genome Fraction (p): fraction of non-gap base pairs in the genome that lie in the regulatory domain of a gene with the annotation
  • Region Set Coverage (k/n): the fraction of all genomic regions in the test set that lie in the regulatory domain of a gene with the annotation

Hypergeometric

  • Rank: ordinal rank of the p-value compared to the p-values of other annotations
  • Raw p-value: uncorrected p-value from the hypergeometric test
  • Bonferroni p-value: Bonferroni corrected p-value
  • FDR q-value: False discovery rate q-value. Note that you cannot sort the table using this column (which is why its heading is in italics).
  • Fold Enrichment (Obs/Exp): fold enrichment of number of genes in the test set with the annotation (k*N / (n*K))
    • where N is the number of genes in the genome
    • and n is the number of genes in the test set
  • Expected (n*K/N): expected number of genes in the test set with the annotation
  • Observed Gene Hits (k): actual number of genes in the test set with the annotation
  • Total Genes (K): number of genes in the genome with the annotation
  • Set Coverage (k/n): the fraction of all genes in the test set with the annotation
  • Term Coverage (k/K): fraction of all genes with the annotation that are tagged by the test set

Statistical Significance

Output data p-values are displayed in bold when they satisfy the statistical significance criteria. In the whole genome test, the term name is also shown in bold if both the binomial test over genomic regions and the hypergeometric test over genes produce significant p-values. Note that this filter does not directly alter which terms are shown, but simply how they appear. Omitting terms from view is handled by the View filter (discussed below).

View

The View filter determines which tests to display in output tables based on their p-values and the statistical significance threshold applied. GREAT's initial Significant by Both output view shows information only for terms that are statistically significant by both the binomial and hypergeometric tests and that satisfy all other filter criteria (by default a binomial fold enrichment filter of 2 is set). Switching the View control to Significant by Region-based Binomial will show rows that are significant by the binomial test but are not significant by the hypergeometric test, and while All Terms reveals all test results that satisfy all non-statistical-significance criteria (ie. it will display terms that are not statistically significant by one or both tests).

What is a UCSC Genome Browser Custom Track?

A custom track in the UCSC Genome Browser is a way of displaying one's own annotation data in the browser. GREAT can automatically open your test regions as a custom track in the UCSC Genome Browser. It can also create annotation term specific custom tracks of the regions in your test set associated with the annotation term (available on the Term Details page accessed by clicking on a term description). Custom tracks are only viewable on the machine from which they were uploaded and are discarded 48 hours after their last access. More information is available at UCSC Genome Bioinformatics.

Output Filters

GREAT offers a number of filters that affect both the display and processing of the output data:

  • Minimum region-based fold enrichment - Only display terms with a region-based fold enrichment (observed regions hit / expected regions hit) greater than or equal to this value. This filter is useful for avoiding general terms, which can achieve strong p-values with moderate fold enrichments. Fold enrichment is a measure of effect size.
  • Observed gene hits - Only display terms that hit at least this many different genes. This filter is useful for avoiding enrichments due to a number of different regions hitting the same genes repeatedly.
  • Annotation count - Only display terms whose number of genes annotated with the term falls within this range.

Ontology Table Controls

Each ontology table has a set of controls which operate exclusively on that table's content.

  • The Export control allows you to export that table's data.
    • Shown data as HTML - This brings up a new browser tab or window, with the exact data as shown in the current table, but in an HTML format that is suitable for inclusion in publications. The new page also contains the information in the footer of the table. It can be used as-is, or edited with any HTML editor.
    • Shown data as .tsv - This allows you to download a tab-separated-values file containing the exact data as shown in the current table, with columns ordered as in the display. The file does not contain the information in the footer of the table. The file can be opened with most spreadsheet programs for further manipulation.
    • All data as .tsv - This allows you to download a tab-delimited file containing all the data used to create the current table (i.e., all columns and rows, whether displayed or not). The file does not contain the information in the footer of the table. The file can be opened with most spreadsheet programs for further manipulation. The order of the data columns output in a whole genome test is:
  1. term name
  2. term description
  3. genome fraction (p)
  4. expected region hits (n*p)
  5. observed region hits (k)
  6. region fold enrichment (k/(n*p))
  7. region set coverage (k/n)
  8. binomial uncorrected p-value
  9. total genes (K)
  10. expected gene hits (n*K/N)
  11. observed gene hits (k)
  12. gene fold enrichment (k*N/(n*K))
  13. gene set coverage (k/n)
  14. term gene coverage (k/K)
  15. hypergeometric p-value
  16. names of region hits
  17. names of gene hits

In GREAT version 1.2, the whole genome output data was:

  1. term name
  2. term description
  3. total genes (K)
  4. expected gene hits (n*K/N)
  5. gene hits (k)
  6. term gene coverage (k/K)
  7. hypergeometric p-value
  8. binomial p-value
  9. observed region hits (k)
  10. expected region hits (n*p)
  11. genome fraction (p)
  12. gene hits
  13. region hits

The order of data columns output in a foreground/background test is:

  1. term name
  2. term description
  3. genome fraction (p)
  4. expected region hits (n*p)
  5. observed region hits (k)
  6. region fold enrichment (k/(n*p))
  7. region set coverage (k/n)
  8. binomial uncorrected p-value
  9. total genes (K)
  10. expected gene hits (n*K/N)
  11. observed gene hits (k)
  12. gene fold enrichment (k*N/(n*K))
  13. gene set coverage (k/n)
  14. term gene coverage (k/K)
  15. hypergeometric p-value
  16. names of region hits
  17. names of gene hits
  • The Shown top rows in this table control allows you to choose how many rows appear in the table. If there are not enough data rows or the global table control Display is set to Summary and there are fewer of such rows than you specify in this control, then the number of rows you choose may not appear.
  • The Test min. annotation count control allows you to change the minimum annotation count used for table.
  • No labels