GREAT requires its input files to be in BED format. BED is a standard file format used by the UCSC genome browser for defining genomic regions.
What is BED format?
Browser Extensible Data (BED) format is a file format used by the UCSC genome browser for defining genomic regions. It defines one genomic region (a "BED record") per line. GREAT requires each line to contains four mandatory fields - chromosome, start position, end position, and a name for the region - separated by white space (i.e. space or tab). Additional optional fields provide further information. Full documentation of the BED format is available from UCSC.
The coordinates in a BED record are 0-based, meaning the first base on a chromosome is numbered 0. So, the coordinates in a BED record are slightly different than those used to find a region in the genome browser. The genome browser region "chr1:1000-2000" would be described in a BED record as "chr1 999 2000" (with the start coordinate being one smaller and the end coordinate being the same).
Can I use a different format?
GREAT only supports BED format, which is a popular standard used by the UCSC genome browser.
What should my test regions file contain?
The test regions file should contain one BED record per input region. You must assign each region a unique name.
What should my background regions file contain?
The background regions file, like the test regions file, must be in BED format. Again, you must assign each region a unique name. Importantly, the background must be a superset of the main input set (that is, every record in the input set must also be in the background set).