Supported file formats
GREAT requires its input files to be in BED format. BED is a standard file format used by the UCSC genome browser (and others) for defining genomic regions.
What is BED format?
Browser Extensible Data (BED) format is a file format used by the UCSC genome browser for defining genomic regions. It defines one genomic region (a "BED record") per line. GREAT requires each line to contains four mandatory fields - chromosome, start position, end position, and a name for the region - separated by white space (i.e. space or tab). Additional optional fields provide further information. Full documentation of the BED format is available from UCSC.
The coordinates in a BED record are both 0-based, meaning the first base on a chromosome is numbered 0. A BED interval is also half-opened half-closed. So, the coordinates in a BED record are slightly different than those used to find a region in the genome browser. The genome browser region "chr1:1-1000" would be described in a BED record as "chr1 0 1000" with the start coordinate being one smaller and the end coordinate being the same, describing the half-closed half-open interval [0,1000) of length 1000bp starting at base 0. UCSC discusses this discrepancy here.
Can I use a different format?
GREAT only supports BED format, which is a popular standard used by the UCSC genome browser and others. Converting to this format is often very straight forward. If you do, make sure all your BED records all have unique names.
What should my test regions file contain?
The test regions file should contain one BED record per input region. You must assign each region a unique name.
What should my background regions file contain?
The background regions file, like the test regions file, must be in BED format. Again, you must assign each region a unique name.
Importantly, the background must be a superset of the main input set (that is, every record in the input set must also be in the background set).