The text file should be a tab-delimeted .txt file with 4 columns:
- Chromosome Number/Identifier
- Reference Allelle
- Alternate Allele
Note: Please make sure your chromosomes identifiers are in [1,2,3,4] format, rather than [chr1, chr2, chr3] format.
specifics on formatted variants text file...
The PED file is tab delimited with 6 mandatory columns:
- Family ID
- Individual ID
- Paternal ID (0=unknown)
- Maternal ID (0=unknown)
- Sex (1=male; 2=female; other=unknown)
- Phenotype (1=unaffected; 2=affected, other=unknown)
(NO header is expected)
IDs are alphanumeric: individuals in the same family should have the same family ID; the individual ID should
(!) Any IDs found in the VCF file NOT in the pedigree will be treated as control
- uniquely identify a person regardless of family ID, and
- match one of the sample IDs in VCF file(s) referring to the same person.
An optional seventh column can specify the ethnicity of the person. This column is important to choose population-specific allele frequency for the filtering of variants that are rare in the specific ethnicity group of that person. An ethnicity identifier consists of a frequency database name and a population code linked by "_". For example, exac_AMR refers to American in ExAC database, and exac_ALL refers to overall frequency in ExAC database.
A list of available ethnicity identifiers available in our analysis:
|exac_AFR: African/African American
||esp_AA: African American
||kg_AMR: Ad Mixed American
||esp_EA: European American
|exac_EAS: East Asian
||kg_EAS: East Asian
|exac_NFE: Non-Finnish European
||kg_SAS: South Asian
|exac_SAS: South Asian
If none of the above identifiers found in the seventh column, exac_ALL will be used as default frequency database and ethnicity group for the allele frequency filtering.
specifics on .ped format customization...