Substitution#
Substitution: a sequence change where, compared to a reference sequence, one nucleotide is replaced by one other nucleotide.
Syntax#
Simple sequence substitution | |
---|---|
Syntax | sequence_identifier ":" coordinate_type "." position reference_sequence ">" alternate_sequence |
Examples |
|
Genome reference with coordinates from aligned transcript | |
Syntax | sequence_identifier "(" transcript_identifier "):c." transcript_position reference_sequence ">" alternate_sequence |
Examples |
|
Explanation of Symbols | |
|
Notes#
- substitutions involving two or more consecutive nucleotides are described as deletion/insertion (delins) variants (see Deletion/insertion (delins)).
- two variants separated by one or more nucleotides should be described individually and not as a "delins" of the sequence affected.
- Exception: two variants separated by one nucleotide, together affecting one amino acid, should be described as a "delins".
NOTE: This rule prevents tools predicting the consequences of a variant to make conflicting and incorrect predictions (e.g.,c.235_237delinsTAT
(p.Lys79Tyr
) versusc.[235A>T;237G>T]
(p.[Lys79*;Lys79Asn]
).
NOTE: the SVD-WG has prepared a proposal to modify this recommendation (see SVD-WG010).
- Exception: two variants separated by one nucleotide, together affecting one amino acid, should be described as a "delins".
- nucleotides that have been tested and found not changed are described as
c.123=
,g.4567_4569=
(see SVD-WG001 (no change)). - it is not correct to describe "polymorphisms" as
c.76A/G
(see Discussions).
Examples#
-
NC_000023.10:g.33038255C>A
a substitution of theC
nucleotide atg.33038255
by anA
. -
NG_012232.1(NM_004006.2):c.93+1G>T
a substitution of theG
nucleotide atc.93+1
(coding DNA reference sequence) by aT
. -
LRG_199t1:c.79_80delinsTT
nucleotidesc.79
andc.80
are replaced byTT
.
NOTE: changes involving two or more consecutive nucleotides are described as deletion-insertion (delins) so the descriptionc.[79G>T;80C>T]
is not correct.
NOTE: based on the definition of a substitution, i.e. one nucleotide replaced by one other nucleotide, this change can not be described as a substitution likec.79_80GC>TT
orc.79GC>TT
. -
NM_004006.2:c.145_147delinsTGG
two substitutions replacing codonCGC
(positionsc.145
toc.147
) byTGG
.
NOTE: two variants separated by one nucleotide, together affecting one amino acid, should be described as a "delins" so the descriptionc.[145C>T;147C>G]
is not correct (see deletion/insertion). -
LRG_199t1:c.54G>H
a substitution of theG
nucleotide atc.54
(coding DNA reference sequence) byA
,C
, orT
(IUPAC code "H", see Standards). -
NM_004006.2:c.123=
a screen was performed showing that nucleotidec.123
was aC
, as in the coding DNA reference sequence (the nucleotide was not changed).
NOTE: the descriptionNM_004006.2:c.=
can not be used,c.=
indicates the entireNM_004006.2
coding DNA reference sequence was analysed and no change was identified.
NOTE: the descriptionLRG_199t1:c.94-23_188+33=
indicates no variants where found in the region indicated (exon 3 of the DMD gene). -
LRG_199t1:c.85=/T>C
a mosaic case where at positionc.85
, besides the normal sequence (aT
, described as "="), also chromosomes are found containing aC
(c.85T>C
).
NOTE: irrespective of the frequency in which each nucleotide was found, the reference is always described first. -
NM_004006.2:c.85=//T>C
a chimeric case, i.e. the sample is a mix of cells containingc.85=
andc.85T>C
.
NOTE: irrespective of the frequency in which each nucleotide was found, the reference is always described first.
Discussion#
When I only sequenced RNA (cDNA) and not genomic DNA, should I then give the description of a variant on DNA level in parentheses?
Yes, while the variant on RNA level can be described as r.76a>g
on DNA level, based on e.g., a coding DNA reference, sequence it should be described as c.(76A>G)
.
How can I shorten the descriptions of SNPs in a manuscript?
Publications reporting linkage or association studies often use a range of different markers/SNPs. Such publications should contain at least once an unequivocal description of all markers used linking them to a reference sequence, preferably a genomic reference sequence. When this has been done, simplified descriptions can be used like
- NM_004006.1 3G>T, using a GenBank coding DNA reference sequence
- GJB2 76A>C, using a HGNC-approved gene symbol as reference
- rs2306220 T>C, using a dbSNP-identifier as a reference
- DXS1219 CA[18];[21] (or AFM297yd1 CA[18];[21]), using a marker DXS1219 (AFM297yd1) as reference
How should I describe a variant in the promoter region of a gene?
It is recommended to describe variants in the promoter region of a gene based on a genomic reference sequence, e.g., NC_000023.10:g.33357783G>A
(chrX, hg19).
Describing the variant in relation to a coding DNA reference sequence is only possible when the nucleotide is included in this transcript reference sequence, or when a genomic sequence context is added.
E.g., NM_004006.1:c.-128354C>T
or NM_000109.3:c.-401C>T
are invalid, because the positions c.-128354
and c.-401
are not included in the transcript reference sequences used.
However, when adding in the context of a genomic reference sequence, this variant can be described as NC_000023.10(NM_004006.1):c.-128354C>T
or NC_000023.10(NM_000109.3):c.-401C>T
.
The variant can also be described using a genomic reference sequence containing the promoter region (for this variant e.g., L01538.1:g.1407C>T
).
Although NC_000023.10:g.33357783G>A
seems complex, it can be used in a genome browsers helping you to quickly zoom in on the region of interest.
Are polymorphisms described like NM_004006.1:c.76A/G
?
No, all substitutions are described as NM_004006.1:c.76A>G
.
In the past, the format c.76A/G
has been used to describe "polymorphic" sequence variants.
Note that a description should be neutral, simply describe the change, and not include any other information like predicted or known functional consequences.
Can I describe a GC
to TG
variant as a di-nucleotide substitution (NG_012232.1:g.12GC>TG
)?
No, this is not allowed.
By definition, a substitution changes one nucleotide into one other nucleotide.
The change ..GAA
GC
CAG..
to ..GAA
TG
CAG..
should be described as NG_012232.1:g.12_13delinsTG
, i.e. a deletion/insertion (delins) (see Deletion-Insertion and Description - Note).
When phase information is not available, the variant should be described as NG_012232.1:g.12G>T(;)13C>G
(see Alleles).
The BRCA1 coding DNA reference sequence NM_007294.3
from position c.2074
to c.2080
is ..CAT
G
ACA..
. A variant frequently found in the population is ..CAT
A
ACA..
(NM_007294.3:c.2077G>A
). In a patient I found the sequence ..CAT
A
TA
ACA..
. Can I describe this variant as NM_007294.3:c.[2077G>A;2077_2078insTA]
?
The correct description of this variant is NM_007294.3:c.2077delinsATA
.
NOTE: the answer was modified, i.e. the addition "However, since the variant is likely a combination of two other variants, it is acceptable to describe it as NM_007294.3:c.[2077G>A;2077_2078insTA]
." was removed.