Duplication#
Duplication: a sequence change between the translation initiation (start) and termination (stop) codon where, compared to a reference sequence, a copy of one or more amino acids is inserted directly C-terminal of the original copy of that sequence.
Syntax#
single position | |
---|---|
Syntax | sequence_identifier ":p." aa_position "dup" |
Examples |
|
position range | |
Syntax | sequence_identifier ":p." aa_position "_" aa_position "dup" |
Examples |
|
Explanation of Symbols | |
|
Notes#
- all variants should be described on the DNA level; descriptions on the RNA and/or protein level may be given in addition.
- predicted consequences, i.e. without experimental evidence (no RNA or protein sequence analysed), should be given in parentheses, e.g.,
p.(Arg727_Ser783dup)
. - the "positions_duplicated" should contain two different positions, i.e.
Cys76_Glu79
, notCys76_Cys76
.- the "positions_duplicated" should be listed from 5' to 3', i.e.
Cys76_Glu79
, notGlu79_Cys76
.
- the "positions_duplicated" should be listed from 5' to 3', i.e.
- by definition, duplication may only be used when the additional copy is directly C-terminal of the original copy (a "tandem duplication").
- for all descriptions, the most C-terminal position possible of the reference sequence is arbitrarily assigned to have been changed (3'rule).
- duplications on the DNA or RNA level, starting N-terminal of and including the translation termination (stop) codon usually have no (predicted) effect on the protein level.
- duplications on DNA or RNA level that
- introduce an immediate translation termination (stop) codon on the protein level, are described as a nonsense variant.
- encode a translation stop codon in the duplicated sequence are on the protein level described as an insertion of this sequence, not as a deletion-insertion removing the entire C-terminal amino acid sequence.
- encode an open reading frame which after the duplicated sequence shifts to another reading frame, are described as a frameshift.
Examples#
-
one amino acid
-
NP_003997.2:p.Val7dup
a duplication of amino acidVal7
in the reference sequenceNP_003997.2
. -
NP_003997.2:p.(Val7dup)
the predicted consequence on the protein level is a duplication of amino acidVal7
in the reference sequenceNP_003997.2
. -
NP_003997.2:p.Trp4dup
a duplication of amino acidTrp4
in the sequenceMetLeuTrpTrpGlu
toMetLeuTrpTrp
Trp
Glu
.
NOTE: for duplications in single amino acid stretches or tandem repeats, the most C-terminal residue is arbitrarily assigned to have been duplicated.
-
-
several amino acids
-
NP_003997.2:p.Lys23_Val25dup
a duplication of amino acidsLys23
toVal25
in reference sequenceNP_003997.2
. -
NP_003997.2:p.(Pro458_Gly460dup)
a duplication of amino acidsPro458
,Pro459
, andGly460
in reference sequenceNP_003997.2
.
NOTE: the underlying DNA variant (LRG_232t1:c.1365_1373dup
) affects amino acidsPro455
,Pro456
, andGly457
, but the 3'rule needs to be applied.
-
-
one or more exons
-
NP_003997.2:p.(His321_Glu383dup)
the predicted consequences of a duplication of exon 10 of the DMD gene, duplicating amino acidsHis321
toGlu383
in reference sequenceNP_003997.2
. -
NP_003997.2:p.(Asp90_Val120dup)
the predicted consequences of a duplication of exon 5 of the DMD gene, duplicating amino acidsAsp90
toVal120
in reference sequenceNP_003997.2
.
NOTE: since the 3'rule needs to be applied, the descriptionp.(Val89_Gln119dup)
is not correct. -
NP_003997.2:p.(Asn444Lysfs*15)
the predicted consequences of a duplication of exons 10 to 11 of the DMD gene, creating a frameshift starting at amino acidAsn444
, replacing it withLys
and terminating after fifteen codons. -
NP_003997.2:p.?
the predicted consequences of a duplication of exons 1 to 2 of the DMD gene.
NOTE: since the duplication adds a second promoter/exons 1 to a normal copy of the gene, a reliable prediction of the consequences can not be made. The duplication may have no consequences, it may give a duplication of exon 2 in the transcript, it might produce both transcripts, etc. -
NP_003997.2:p.?
the predicted consequences of a duplication of exons 74 to 79 of the DMD gene.
NOTE: since the duplication adds a second last exon (exon 79) to a normal copy of the gene, a reliable prediction of the consequences can not be made. The duplication may have no consequences, it may give a duplication of exons 74 to 78 in the transcript, it might produce both transcripts, etc.
-
-
other
NP_003997.1:p.Val7=/dup
a mosaic case where at amino acid position 7, besides the normal amino acid (aVal
, described asVal7=
), also protein is found containing a duplication (p.Val7dup
).
NOTE: for the predicted consequences of a variant, the description isNP_003997.1:p.(Val7=/dup)
.
Discussion#
Why do we not describe a duplication as an insertion?
Although duplications are basically a special type of insertion, there are several reasons why the recommendation is to describe duplications separately.
- the description is simple and shorter;
- it is clear and prevents confusion regarding the position when an insertion is incorrectly reported, like
p.Ala22insGly
; - insertion more or less means "coming from elsewhere". Mechanistically, a duplication is most likely caused by a local event, DNA polymerase slippage, duplicating a local sequence.
How should I describe the change MetArgThr
GlySerSer
HisGlnTrpPhe
to MetArgThr
GlySerSer
His
GlySerSer
GlnTrpPhe
? The fact that the inserted sequence (GlySerSer
) is present in the original sequence suggests it derives from a duplicative event.
The variant should be described as an insertion; p.His7_Gln8insGly4_Ser6
.
A description using "dup" is not correct since, by definition, a duplication should be directly 3'-flanking of the original copy (in tandem).
Note that the description given still makes it clear that the sequence inserted between p.His7
and p.Gln8
is probably derived from nearby, i.e. position p.Gly4
to p.Ser6
, and thus likely derived from a duplicative event.
What do you mean with "variants should be described on the protein level and not incorporate knowledge regarding the change on the DNA-level"?
It means that protein variant descriptions should be derived from comparing the variant protein sequence with the reference protein sequence.
Knowledge on the underlying change on the DNA level should not be used.
E.g., when MetTrpSerSerSerHisAsp..
changes to MetTrpSerSerSer
Ser
HisAsp..
, this is described as p.Ser5dup
.
The information that on the DNA level the change is ..ATGTGGTCCAGTTCCCACGAT..
to ..ATGTGGTCCAGT
AGT
TCCCACGAT..
, so the codon for Ser4
is duplicated, is not used; the description p.Ser4dup
is not correct.