GREAT currently supports test files with up to 200,000 regions, and background files with up to 1,000,000 regions. Each must be less than 20 MB in size. Compressed data must decompress to plain text files at most 50 MB in size.
Handling large data sets
By default GREAT displays data in a Summary "Significant by Both" View that only shows terms significant by both the binomial test over genomic regions and the hypergeometric test over genes. Large data sets can cause a large fraction of all genes to be selected via the regulatory domain association rules. This often results in saturation of the hypergeometric test over genes such that no hypergeometric test results are significant. The binomial test over genomic regions is robust to large data sets, however.
- Restricting the input set to a few thousand regions (i.e. by picking the most robust peaks generated by a peak-calling tool) eliminates saturation of the hypergeometric test.
- Alternatively, results for large data sets can be viewed in the "Significant by Region-based Binomial" or "Full" View and terms enriched due to many regions clustered around one or few genes can be filtered by using the observed gene hits display filter.