# README file for our S-CAP manuscript data # The combined S-CAP score file contains data from all 8 S-CAP models # description of the INFO field # region: the splicing regions in which the single nucleotide variant lies (3intronic, 3core, exonic, 5core, 5extended, 5intronic) # rawscore: the S-CAP score output by the Gradient Boosting tree model. This score is not included for 3core and 5core variants # sensscore: a transformed S-CAP score quantifying the sensitivity at the raw score on the pathogenic test set. This score is not included for 3core and 5core variants. # rawscore_dom: the S-CAP score for the 3core DOM and 5core DOM variants. Score is not included for variants from all other regions. # sensscore_dom: Transformed S-CAP score to quantify the sensitivity of the rawscore_dom on the pathogenic test set. # rawscore_rec: the S-CAP score for the 3core REC and 5core REC variants. Score is not included for variants from all other regions. # senscore_rec: Transformed S-CAP score to quantify the sensitivity of the rawscore_rec on the pathogenic test set ## Thresholding with the rawscore Based on the region field pick the corresponding threshold (specified at the bottom) to perform pathogenicity classication ## Thresholding with the scap sensscore Variants with a sensscore <= 95 are considered pathogenic for all splicing regions. ## Raw score v. Sensitivity score The advantage of the sensitivity score is that the same thresholding (<= 95) can be used regardless of which region the single nucleotide variant is from but due to the limited size of the test set many single nucleotide variants will get grouped together with the same sensitivity score. The advantage of the raw score is that it is more granular but different thresholds need to be used per region and the scores are not as interpretable. ## How to use the 6 score in the INFO column If the splicing region for a single nucleotide variant is 3intronic, exonic, 5extended or 5intronic you should use the rawscore or sensscore. If the single nucleotide variant is 3 or 5 core and heterozygous then you should use the rawscore_dom or sensscore_dom. If the single nucleotide variant is 3 or 5 core and recessive or compound heterozygous then you should use the rawscore_rec or senscore_rec. ## S-CAP combine score file for all regions filename: scap_COMBINED_v1.0.vcf.gz # Additionally, it includes 8 files, all gziped, with all S-CAP scores for human GRCh37 # This is available at: https://stanfordmedicine.app.box.com/s/kcay2vzdz59744g5tjz91tccr71zbcut # description of the fields/columns in the files # column 1: grch37_chrom - chromosome number from hg19 assembly # column 2: pos - 1-based position in the chromosome specified # column 3: ref - reference allele # column 4: alt - alternate allele # column 5: scap_3cdv1.0 - scap score specifying possibly pathogenic or likely benign # The data is split according to the model that produced it (see manuscript for details): scap3i - 3' Intronic scap3cd - 3' Core Dominant scap3cr - 3' Core Recessive scape - Exonic scap5cd - 5' Core Dominant scap5cr - 5' Core Recessive scap5e - 5' Extended scap5i - 5' Intronic # Thresholds (>= XX implies possibly pathogenic and < XX implies likely benign) 3' Intronic - 0.006 3' Core Dominant - 0.033 3' Core Recessive - 0.264 Exonic - 0.009 5' Core Dominant - 0.034 5' Core Recessive - 0.367 5' Extended - 0.005 5' Intronic - 0.006