Duplication#
Duplication: a sequence change where, compared to a reference sequence, a copy of one or more nucleotides is inserted directly 3' of the original copy of that sequence.
Syntax#
Syntax | sequence_identifier ":" coordinate_type "." position_or_range "dup" |
---|---|
Examples |
|
Explanation of Symbols | |
|
Notes#
positions_duplicated
should contain two different positions, e.g.,123_126
, not123_123
.- the
positions_duplicated
should be listed from 5' to 3', e.g.,123_126
, not126_123
. - by definition, duplication may only be used when the additional copy is directly 3'-flanking of the original copy (a "tandem duplication").
- when a variant can be described as a duplication, it must be described as a duplication and not as, e.g., an insertion (see Prioritization).
- when there is no evidence that the extra copy of a sequence detected is in tandem (directly 3'-flanking the original copy), the change can not be described as a duplication; it should be described as an insertion (see Insertion and proposal SVD-WG003).
- inverted duplications are described as an insertion (
g.234_235ins123_234inv
), not as a duplication (see Inversion).
- when more than one additional copies are inserted directly 3' of the original copy, the change is indicated using the format for Repeated sequences, like
[3]
(triplication),[4]
(quadruplication), etc. - two variants separated by one or more nucleotides should be described individually and not as a "delins".
- exception: two variants separated by one nucleotide, together affecting one amino acid, should be described as a "delins".
NOTE: the SVD-WG has prepared a proposal to modify this recommendation (see SVD-WG010). The new proposal is: two variants that are separated by two or fewer intervening nucleotides (that is, not including the variants themselves) should be described as a single "delins" variant.
- exception: two variants separated by one nucleotide, together affecting one amino acid, should be described as a "delins".
- for all descriptions, the most 3' position possible of the reference sequence is arbitrarily assigned to have been changed (3'rule).
- exception: duplications around exon/exon junctions when identical nucleotides flank the junction (see Numbering);
when..GA
T
gta..//..cagTCA..
changes to..GA
TT
gta..//..cagTCA..
, based on a coding DNA reference sequence, the variant is described asLRG_199t1:c.3921dup
(NC_000023.10:g.32459297dup
) and not asc.3922dup
(which would translate tog.32456507dup
).
- exception: duplications around exon/exon junctions when identical nucleotides flank the junction (see Numbering);
- † = see Uncertain; when the position and/or the sequence of a duplication has not been defined.
Examples#
-
one nucleotide
-
NM_004006.2:c.20dup
(NC_000023.10:g.33229410dup
)
the duplication of aT
at positionc.20
in the sequenceAGAAG
T
AGAGG
toAGAAG
TT
AGAGG
.
NOTE: it is not allowed to describe the variant asc.19_20insT
(see prioritisation).
NOTE: the recommendation is not to describe the variant asNM_004006.2:c.20dupT
, i.e. describe the duplicated nucleotide sequence. This description is longer, it contains redundant information, and chances to make an error increases (e.g.,NM_004006.2:c.20dupG
). -
NM_004006.2:c.5697dup
(3'rule)
a duplication of theA
at positionc.5697
in the sequenceATTGAAAAAAA
A
TTAG
toATTGAAAAAAA
AA
TTAG
, i.e. the lastA
of the 8 nucleotide A-stretch running from positionc.5690
toc.5697
.
NOTE: the 3'rule has been applied here stating that "for all descriptions, the most 3' position possible is arbitrarily assigned to have been changed" (see General_Recommendations). -
NC_000023.11:g.32343183dup
(3'rule)
a duplication of theT
at positiong.32343183
in the sequenceCTAATTTTTTT
T
CAAT
toCTAATTTTTTT
TT
CAAT
, i.e. the lastT
of the 8 nucleotide T-stretch running from positiong.32343176
tog.32343183
.
NOTE: theT
nucleotide inNC_000023.11:g.32343183
corresponds to theA
nucleotide inNM_004006.2:c.5690
, a transcript annotated on the minus strand of the X-chromosome. However, applying the 3'rule, the deletion of this nucleotide based on a coding DNA reference sequence (transcript level) should be described asNM_004006.2:c.5697dup
(not asNM_004006.2:c.5690dup
).
-
-
several nucleotides
-
NM_004006.2:c.20_23dup
(NC_000023.11:g.33211290_33211293dup
)
a duplication from positionc.20
toc.23
in the sequenceAGAAG
TAGA
GG
toAGAAG
TAGATAGA
GG
.
NOTE: the recommendation is not to describe the variant asc.20_23dupTAGA
, i.e. describe the duplicated nucleotide sequence. This description is longer, it contains redundant information, and chances to make an error increases (e.g.,c.20_23dupTGGA
). -
NC_000023.11(NM_004006.2):c.260_264+48dup
(NC_000023.11:g.32844735_32844787dup
)
a duplication of nucleotidesc.260
toc.264+48
(coding DNA reference sequence), crossing an exon/intron border.
-
-
exon/intron/exon
-
exon/exon
NC_000023.11(NM_004006.2):c.3921dup
the duplication of theT
nucleotide at the exon/exon border in the sequence..GA
T
gta..//..cagTCA..
changing to..GA
TT
gta..//..cagTCA..
.
NOTE: according to an exception to the 3'rule, the variant (NC_000023.11:g.32441180dup
) is not described asc.3922dup
since this would shift the position of the variant to the next exon (c.3922
linking tog.32441180
) (see exception in Numbering and see Q&A).
-
exon/intron
NC_000023.11(NM_004006.2):c.1704+1dup
the duplication of theG
nucleotide at the exon/intron border in the sequenceGAACAG
g
t..//..agTGCCTT
changing toGAACAG
gg
t..//..agTGCCTT
(notc.1704dup
).
NOTE: this description does not depend on the effect observed on RNA level, giving either altered splicing orr.1704dup
.
-
intron/exon
NC_000023.11(NM_004006.2):c.1813dup
the duplication of theG
nucleotide at the intron/exon border in the sequenceCTGGCCgt..//..ag
G
TTTTA
changing toCTGGCCgt..//..ag
GG
TTTTA
(notc.1813-1dup
).
NOTE: this description does not depend on the effect observed on RNA level, giving either altered splicing orr.1813dup
.
-
-
exons
-
NC_000023.11(NM_004006.2):c.4072-1234_5155-246dup
a duplication of nucleotidesc.4072-1234
toc.5155-246
duplicating exon 30 (starting at positionc.4072
) to exon 36 (ending at positionc.5154
) of the DMD gene.
NOTE: the formatc.4072-1234_5155-246dupXXXXX
, withXXXXX
indicating the size of the duplication, should not be used.
NOTE: the descriptionNM_004006.2:c.4072-1234_5155-246dup
is not correct, the reference sequenceNM_004006.2
is a coding DNA reference sequence which does not include the intron sequences involved. -
NC_000023.11(NM_004006.2):c.720_991dup
a duplication of nucleotidesc.720
toc.991
starting in exon 8 (positionc.720
) and ending in exon 10 (positionc.991
) of the DMD gene. -
NC_000023.11(NM_004006.2):c.(4071+1_4072-1)_(5154+1_5155-1)dup
a duplication of exon 30 (starting at positionc.4072
) to exon 36 (ending at positionc.5154
) of the human DMD gene. The duplication break point has not been sequenced. Exons 29 (ending atc.4071
) and 37 (starting at nucleotidec.5155
) have been tested and shown to be not duplicated. The duplication therefore starts in intron 29 (positionsc.4071+1
toc.4072-1
) and ends in intron 36 (positionsc.5154+1
toc.5155-1
).
NOTE: this description is part of proposal SVD-WG003 (undecided).
NOTE: previously, the suggestion was made to describe such duplications using the formatc.4072-?_5154+?dup
. However, sincec.4072-?
indicates "to an unknown position 5' ofc.4072
" andc.5154+?
"to an unknown position 3' ofc.5154
", this description is not correct when it is known that exons 29 and 37 are not involved. -
NC_000001.11(NM_206933.2):c.[675-542_1211-703dup;1211-703_1211-704insGTAAA]
a duplication of the sequence from nucleotide positionc.75-542
toc.1211-703
, followed by the insertion of the sequenceGTAAA
.
NOTE: the variant is not described usingdupins
, a format not used in HGVS nomenclature. -
NC_000023.11:g.(32381076_32382698)_(32430031_32456357)[3]
(NC_000023.11(NM_004006.2):c.(4071+1_4072-1)_(5154+1_5155-1)[3]
)
three copies of the sequence from exon 30 (starting at positionc.4072
) to exon 36 (ending at positionc.5154
) of the DMD gene were detected (break points not sequenced). -
duplications extending beyond the transcribed region
following current recommendations (see Numbering), it is not allowed to describe variants in nucleotides beyond the boundaries of a reference sequence. Consequently, duplications extending 5' of a transcript can not be described likeNC_000023.11(NM_004006.2):c.(?_-244)_(31+1_32-1)dup
(c.-244
is the first nucleotide ofNM_004006.2
). Duplications extending 3' of a transcript can not be described likeNC_000023.11(NM_004006.2):c.(10086+1_10087-1)_(*2691_?)dup
(c.*2691
is the last nucleotide ofNM_004006.2
). Such duplications can only be described using genomic coordinates. The HGVS nomenclature committee (SVD-WG) is discussing whether a c. based format should be proposed.
-
-
gene
-
NC_000023.11:g.(31060227_31100351)_(33274278_33417151)dup
a duplication of the entire DMD gene based on a SNP-array analysis where the maximum size of the duplication lies between SNPs rs396303 and rs7887548 (nucleotides 31060227 and 33417151) and the minimum size between SNPs rs808178 and rs7887103 (nucleotides 31100351 and 33274278). Describing the duplication based on a coding DNA reference sequence usingNC_000023.11(NM_004006.2):c.(-205839_-62966)_(*21568_*61692)dup
makes no sense.
NOTE: the array analysis detects an extra copy of the sequences, and it has to be determined whether it is a duplication. When it is not sure the variant is a duplication, the variant should be described as an insertion;g.?_?ins[NC_000023.11:g.(31060227_31100351)_(33274278_33417151)]
. -
NC_000023.11:g.(?_31120496)_(33339477_?)dup
a duplication of the entire DMD gene based on a MLPA assay where nucleotidesg.31120496
andg.33339477
are the center of the probes for, respectively, the last and first (brain promoter) exons.
NOTE: the MLPA analysis detects an extra copy of the sequences, and it has to be determined whether it is a duplication. When it is not sure the variant is a duplication, the variant should be described as an insertion;g.?_?ins[NC_000023.11:g.(?_31120496)_(33339477_?)]
.
-
-
chromosome
NC_000023.11:g.pter_qtersup
a duplication of the entire X-chromosome ("sup" = supernumerary chromosome).
NOTE: when, e.g., based on next-generation sequencing, only "an additional copy of all X-chromosome sequences" is detected, the variant should be described asNC_000023.11:g.pter_qter[2]
.
-
other
-
NC_000023.11:g.33344590_33344592=/dup
a mosaic case where from positiong.33344590
tog.33344592
, besides the normal sequence, also chromosomes are found containing a duplication of this sequence. -
NC_000023.11:g.33344590_33344592=//dup
a chimeric case, i.e. the sample is a mix of cells containingg.33344590_33344592=
andg.33344590_33344592dup
.
-
Discussion#
Why do we not describe a duplication as an insertion?
Although duplications are basically a special type of insertion, there are several reasons why the recommendation is to describe duplications separately.
- the description is simple and shorter;
- it is clear and prevents confusion regarding the position when an insertion is incorrectly reported, like
c.22insG
; - it prevents hypothetical discussions regarding the site of the insertion; in the case of a duplication including an intron/exon border (e.g.,
c.123-8_137dup
), is the "insertion" in the intron or in the exon? - insertion more or less means "coming from elsewhere". Mechanistically, a duplication is most likely caused by a local event, DNA polymerase slippage, duplicating a local sequence.
Can I use g.123dup6
to describe a 6 nucleotide duplication?
No, a duplication of more than one nucleotide should give the position of the first and last nucleotide duplicated, separated using the range symbol ("_", underscore), e.g., g.123_128dup
.
Note also that from the description g.123dup6
it is not clear whether the duplication starts at position g.123
(so g.123_128dup
) or after position 123 (so g.124_129dup
).
In the example above, c.3921dup
, should the description based on a coding DNA reference sequence not be c.3922dup
?
Strictly speaking, you are right.
However, for cases like this, an exception was made to prevent that when c.3922dup
is translated back to a genomic position, one would end up at the wrong nucleotide, in the wrong exon (NC_000023.10:g.32456507dup
instead of NC_000023.10:g.32459297dup
).
How should I describe the change ATCG
ATCGATCGATCG
A
GGGTCCC
to ATCG
ATCGATCGATCG
A
ATCGATCGATCG
GGGTCCC
? The fact that the inserted sequence (ATCGATCGATCG
) is present in the original sequence, suggests it derives from a duplicative event.
The variant should be described as an insertion; g.17_18ins5_16
.
A description using "dup" is not correct since, by definition, a duplication should be directly 3'-flanking of the original copy (in tandem).
Note that the description given still makes it clear that the sequence inserted between g.17
and g.18
is probably derived from nearby, i.e. positions g.5
to g.16
, and thus likely derived from a duplicative event.