GeMSTONE


Ge M S T O N E

File Upload ?



* Download Sample VCF
.vcf or vcf.gz is required (max-size= 500M)
Make sure you follow variant call format (VCF) v4.0, v4.1 or v4.2


Download Sample Control VCF File
Optional VCF containing variants to be excluded from analysis.
Must be.vcf, .vcf.gz, or a formatted text file with variants to ignore.

The text file should be a tab-delimeted .txt file with 4 columns:
  • Chromosome Number/Identifier
  • Position
  • Reference Allelle
  • Alternate Allele

Note: Please make sure your chromosomes identifiers are in [1,2,3,4] format, rather than [chr1, chr2, chr3] format.

specifics on formatted variants text file...


Download Sample PED

The PED file is tab delimited with 6 mandatory columns:

  • Family ID
  • Individual ID
  • Paternal ID (0=unknown)
  • Maternal ID (0=unknown)
  • Sex (1=male; 2=female; other=unknown)
  • Phenotype (1=unaffected; 2=affected, other=unknown)
(NO header is expected)

IDs are alphanumeric: individuals in the same family should have the same family ID; the individual ID should
  1. uniquely identify a person regardless of family ID, and
  2. match one of the sample IDs in VCF file(s) referring to the same person.
(!) Any IDs found in the VCF file NOT in the pedigree will be treated as control

An optional seventh column can specify the ethnicity of the person. This column is important to choose population-specific allele frequency for the filtering of variants that are rare in the specific ethnicity group of that person. An ethnicity identifier consists of a frequency database name and a population code linked by "_". For example, exac_AMR refers to American in ExAC database, and exac_ALL refers to overall frequency in ExAC database.

A list of available ethnicity identifiers available in our analysis:
ExAC 1000 Genomes ESP6500 TAGC
exac_ALL: Overall kg_ALL: Overall esp_ALL: Overall tagc_AJ: Ashkenazi
exac_AFR: African/African American kg_AFR: African esp_AA: African American
exac_AMR: Latino kg_AMR: Ad Mixed American esp_EA: European American
exac_EAS: East Asian kg_EAS: East Asian
exac_FIN: Finnish kg_EUR: European
exac_NFE: Non-Finnish European kg_SAS: South Asian
exac_SAS: South Asian
exac_OTH: Other
If none of the above identifiers found in the seventh column, exac_ALL will be used as default frequency database and ethnicity group for the allele frequency filtering.

specifics on .ped format customization...

Human Genome Build
GRCh38
GRCh37


*** Data submitted to GeMSTONE has a 2 month expiration date, after which it will be deleted from our servers. Data is not shared, or visible to any third parties. The Yu Lab does not keep any files submitted past the expiration date.

Site-basis


Phred-Scaled Quality Score Lowerbound

*QUAL score in VCF file.


Allele Frequency Upperbound
%
*defaults to AF in ExAC unless population specified in PED file.


Ignore Variants Without PASS Flag

Genotype-basis


Genotype Quality Lowerbound

*requires "GQ" FORMAT tag specified for all sites.


Individual Read Depth Lowerbound

*requires "DP" FORMAT tag specified for all sites.




Inheritance Model ?


Dominant
Recessive Homozygous
Recessive Compound Heterozygous
X-linked Dominant
X-linked Recessive
Y-linked
No Inheritance


Remove Non Pseudo-Autosomal Regions in Sex Chromosomes

Recurrence ?


Multiple Ocurrences Across All Samples
LowerBound:
UpperBound:



Variant Must Occur In At Least 1 Non-Sporadic Family

Multiple Ocurrences Across Families
(Each Sporadic Sample Counts as a Family)
LowerBound:
UpperBound:

Allele Frequency Databases

These options are only for annotation, not for filtering variants

ExAC 1000 Genomes ESP6500 TAGC
exac_ALL: Overall kg_ALL: Overall esp_ALL: Overall tagc_AJ: Ashkenazi
exac_AFR: African/African American kg_AFR: African esp_AA: African American
exac_AMR: Latino kg_AMR: Ad Mixed American esp_EA: European American
exac_EAS: East Asian kg_EAS: East Asian
exac_FIN: Finnish kg_EUR: European
exac_NFE: Non-Finnish European kg_SAS: South Asian
exac_SAS: South Asian
exac_OTH: Other

Variant Consequence

Coding Transcript Variant
Frameshift
Inframe Indel
Nonsynonymous (Missense, Start Loss, Stop Gained, Stop Lost)
Synonymous
Splicing Variant
Exon Loss
Intron Gain
Splice Site
Splice Region

Intergenic Variant
Up/Downstream
Other Intergenetic Region
Non-Coding Variant
Intron
UTR
Other Non-Coding Region
Others
Regulatory Region Variant






Transcript Biotype

Protein Coding
Protein Coding (contains an Open Reading Frame [ORF])
Immunoglobulin (Ig) Variable Chain and T-cell Receptor (TcR) gene
Nonsense-mediated Decay
Non-translating CDS
Non-stop decay
Polymorphic Pseudogene

Pseudogene
Pseudogene
Inactivated Immunoglobulin Gene
Disrupted Domain
Short Noncoding
ncRNA
ncRNA Pseudogene
Long Noncoding
Non-coding
Antisense
Sense Intronic
Sense Overlapping
Retained Intron
lincRNA
3 overlapping ncRNA
Others (no ORFs / ambiguous # of ORFS)


Custom Transcript File ?
Annotate using a personalized list of transcripts.
You can upload a .txt file with one transcript (Ensembl ID) per line.


Download Sample Transcript File

Functional Predictions

SIFT
PROVEAN
PolyPhen-2_HDIV
PolyPhen-2_HVAR
LRT
MutationTaster
MutationAssessor
FATHMM
FATHMMMKL
VEST3
CADD Phred
DANN
MetaSVM
MetaLR
fitCons

Conservation Scores

GERP++
phyloP Vertebrate
phyloP Mammalian
phastCons Vertebrate
phastCons Mammalian
SiPhy


Protein Stability Prediction

Rosetta ddG

Deleteriousness Filter ?

Keep only variants that + scores predict to be deleterious out of the 2 scores you've selected (only applicable to nsSNVs).

Gene Ontology (GO)

Annotate genes using the following gene ontology databases.


Select GO biological process(es)


Select GO cellular component(s)


Select GO molecular function(s)


Add GO Annotation For Interaction Partners
Filter Out Genes With No GO Annotation


Genotype-phenotype Databases

Annotate genes, using the following phenotype and disease databases.

Select disease(s) from HGMD database

Select disease(s) from ClinVar database

Select disease(s) from OMIM database

What do the numbers (1)(2)(3)(4), brackets [ ], and braces { } in OMIM database mean?


Annotate with knockout phenotypes from Mouse Genome Informatics (MGI)



DISEASE GENE FILE
Annotate genes using a personalized list of genes of interest.
You can upload a .txt tab-delimeted file with two columns: gene identifier and notes.
Ensembl, Gene Name, HGNC Symbol and Entrez Gene ID are all valid identifiers.


Download Disease Sample Gene File

Protein Domain Annotation


Annotate Using All Pfam Protein Domain Functions
Don't Annotate Protein Domain Functions
Select Specific Protein Domain Functions To Annotate


Select Protein Domain Functions to Annotate


Protein Domain Filter


Filter Out Variants Without Annotations in the Pfam Domain Database


Protein-protein Interactions

Annotate genes using the following protein interaction databases.


IntAct
BioGRID
ConsensusPathDB
High-quality Interactomes (HINT)


Pathway Databases

Annotate genes using the following protein pathway databases.

Select pathway(s) from KEGG database

Select pathway(s) from BioCarta database

Select pathway(s) from Reactome database


Add Pathway Annotation For Interaction Partners
Filter Out Genes With No Pathway Annotation



Pathway Enrichment Analysis

Calculate and report enriched pathways with q-value ≤


From the following databases:
KEGG

BioCarta

Reactome

GO Biological Processes

GO Cellular Components

GO Molecular Function



GTEx (The Genotype-Tissue Expression Project)
Select All


Adrenal gland
Anterior cingulate
cortex
Aorta
Atrial appendage
Blood
Breast
Caudate
(basal ganglia)
Cerebellum
Colon
Coronary
Cortex
Fibroblasts
Hippocampus
Hypothalamus
LCL
Left ventricle
Liver
Lung
Mucosa
Muscularis
Nucleus accumbens
(basal ganglia)
Ovary
Pancreas
Pituitary
Prostate
Putamen
(basal ganglia)
Skeletal muscle
Skin
suprapubic
Skin
lower leg
Stomach
Subcutaneous
Testis
Thyroid gland
Tibial
Tibial nerve
Uterus
Vagina
Visceral (Omentum)

HPA (The Human Protein Atlas)
Select All


Adipose tissue
Adrenal gland
Appendix
Bone marrow
Cerebral cortex
Colon
Duodenum
Endometrium
Esophagus
Fallopian tube
Gallbladder
Heart muscle
Kidney
Liver
Lung
Lymph node
Ovary
Pancreas
Placenta
Prostate
Rectum
Salivary gland
Skeletal muscle
Skin
Small intestine
Smooth muscle
Spleen
Stomach
Testis
Thyroid gland
Tonsil
Urinary bladder

GDI (Gene Damage Index) Disease Type



All diseases
Mendelian (general model)
Mendelian (autosomal dominant)
Mendelian (autosomal recessive)
Cancer (general model)
Cancer (autosomal dominant)
Cancer (autosomal recessive)
Primary immunodeficiency (general model)
Primary immunodeficiency (autosomal dominant)
Primary immunodeficiency (autosomal recessive)

Residual Variation Intolerance Score

Annotate RVIS Gene Score

Gene Burden Test

You can choose to upload an additional control to perform Gene Burden Tests.
Download Sample Burden Test Control VCF

BURDEN
calpha
vt
skat




Cannot submit without a project name and VCF file. Please check any orange tabs above for anything you might've missed.
?