File size limits
GREAT currently supports input files of up to 200,000 elements and up to 20 MB in size, whichever is more restrictive.
Handling large data sets
By default GREAT displays data in a Summary View that only shows terms significant by both the binomial test over genomic regions and the hypergeometric test over genes. Large data sets can cause a large fraction of all genes to be selected via the regulatory domain association rules. This often results in saturation of the hypergeometric test over genes such that no hypergeometric test results are significant. The binomial test over genomic regions is robust to large data sets, however.
There are two ways to circumvent the saturation of the hypergeometric test:
- Restricting the input set to a few thousand regions (i.e. by picking the most robust peaks generated by a peak-calling tool) eliminates saturation of the hypergeometric test.
- Alternatively, results for large data sets can be viewed in the Full View and terms enriched due to many regions clustered around one or few genes can be filtered by using the observed gene hits display filter.