File size limits
GREAT currently supports test files with up to 200,000 regions, and background files with up to 1,000,000 regions. Each must be less than 20 MB in size.
Handling large data sets
By default GREAT displays data in a Summary View that only shows terms significant by both the binomial test over genomic regions and the hypergeometric test over genes. Large data sets can cause a large fraction of all genes to be selected via the regulatory domain association rules. This often results in saturation of the hypergeometric test over genes such that no hypergeometric test results are significant. The binomial test over genomic regions is robust to large data sets, however.
There are two ways to circumvent the saturation of the hypergeometric test:
- Restricting the input set to a few thousand regions (i.e. by picking the most robust peaks generated by a peak-calling tool) eliminates saturation of the hypergeometric test.
- Alternatively, results for large data sets can be viewed in the Full View and terms enriched due to many regions clustered around one or few genes can be filtered by using the observed gene hits display filter.