Insertion#
Insertion: a sequence change where, compared to the reference sequence, one or more nucleotides are inserted and where the insertion is not a copy of a sequence immediately 5'.
Syntax#
Syntax | sequence_identifier ":" coordinate_type "." range "ins" sequence |
---|---|
| |
Examples |
|
Explanation of Symbols | |
|
Notes#
positions flanking
should contain two flanking nucleotides, e.g.,123_124
, not123_125
.- the
positions_flanking
should be listed from 5' to 3', e.g.,123_124
, not124_123
. - tandem duplications are described as a duplication (
g.123_456
dup
), not an insertion (g.456_457ins123_456
, see Prioritization).- inverted duplications are described as an insertion (
g.234_235ins123_234inv
), not as a duplication (see Inversion).
- inverted duplications are described as an insertion (
- two variants separated by one or more nucleotides should be described individually and not as a "delins".
exception: two variants separated by one nucleotide, together affecting one amino acid, should be described as a "delins".
NOTE: the SVD-WG has prepared a proposal to modify this recommendation (see SVD-WG010). The new proposal is: two variants that are separated by two or fewer intervening nucleotides (that is, not including the variants themselves) should be described as a single "delins" variant. - for all descriptions, the most 3' position possible of the reference sequence is arbitrarily assigned to have been changed (3'rule).
- the "inserted_sequence" can be given as the nucleotides inserted (e.g.,
insAGC
) or, for larger insert sequences, by referring to the sequence in the reference sequence (e.g.,c.849_850ins858_895
) or another reference (e.g.,NC_000002.11:g.47643464_47643465ins[NC_000022.10:g.35788169_35788352]
). When the inserted sequence is not present in the reference genome, it should be submitted to a database (e.g., GenBank); the accession.version number obtained can then be used to describe the variant. - † = see Uncertain; when the position and/or the sequence of an inserted sequence has not been defined, a description may have a format like
g.(100_150)insN[25]
.
Examples#
-
simple insertions
-
NC_000023.10:g.32867861_32867862insT
(NM_004006.2:c.169_170insA
)
the insertion of aT
nucleotide between nucleotidesg.32867861
andg.32867862
. -
NC_000023.10:g.32862923_32862924insCCT
(LRG_199t1:c.240_241insAGG
)
the insertion of nucleotidesCCT
between nucleotidesg.32862923
andg.32862924
. -
NM_004006.2:c.849_850ins858_895
the insertion of a copy of nucleotidesc.858
toc.895
between nucleotidesc.849
andc.850
. -
NC_000002.11:g.47643464_47643465ins[NC_000022.10:g.35788169_35788352]
the insertion of nucleotidesg.35788169
tog.35788352
as found inNC_000022.10
between nucleotidesg.47643464
andg.47643465
.
-
-
complex insertions
-
NM_004006.2:c.419_420ins[T;401_419]
the insertion ofT
followed by a copy of the sequence fromc.401
toc.419
(a duplication not directly flanking the original sequence). -
LRG_199t1:c.419_420ins[T;450_470;AGGG]
the insertion ofT
followed by a copy of the sequence fromc.450
toc.470
, followed byAGGG
. -
NC_000006.11:g.10791926_10791927ins[NC_000004.11:g.106370094_106370420;A[26]]
the insertion of a copy of an Alu-repeat sequence (from chromosome 4 nucleotidesg.106370094
tog.106370420
), and a stretch of 26A
nucleotides, between nucleotidesg.10791926
andg.10791927
on chromosome 6.
-
-
insertion of inverted duplicated copies
-
NM_004006.2:c.849_850ins850_900inv
a copy of nucleotidesc.850
toc.900
is inserted, in inverted orientation, 5' of the original sequence, between nucleotidesc.849
andc.850
. -
NM_004006.2:c.900_901ins850_900inv
a copy of nucleotidesc.850
toc.900
is inserted, in inverted orientation, 3' of the original sequence, between nucleotidesc.900
andc.901
. -
LRG_199t1:c.940_941ins[885_940inv;A;851_883inv]
an inverted copy of nucleotidesc.851
toc.940
, with aG>A
substitution of nucleotidec.884
, is inserted directly 3' of the original sequence. -
NM_004006.2:c.940_941ins[903_940inv;851_885inv]
an inverted copy of nucleotidesc.851
toc.940
, with a deletion from nucleotidesc.886
toc.902
, is inserted directly 3' of the original sequence.
-
-
incomplete descriptions, preferably use exact descriptions only
-
NM_004006.2:c.(222_226)insG
the insertion of aG
at an unknown position in the sequence encoding amino acid 75. -
NC_000004.11:g.(3076562_3076732)insN[12]
the insertion of 12 nucleotides (not specified) at an unknown position between nucleotidesg.3076562
andg.3076732
(exon 1 of the HTT gene containing the Gln/Pro repeat region). -
NC_000023.10:g.32717298_32717299insN
(NM_004006.2:c.761_762insN
)
the insertion of one not specified nucleotide (N
) between positionsg.32717298
andg.32717299
. -
NM_004006.2:c.761_762insNNNNN
(alternativelyNM_004006.1:c.761_762insN[5]
)
the insertion of 5 not specified nucleotides (NNNNN
) between positionsc.761
andc.762
. -
NC_000023.10:g.32717298_32717299insN[100]
the insertion of 100 nucleotides (not specified) between positionsg.32717298
andg.32717299
. -
NC_000023.10:g.32717298_32717299insN[(80_120)]
the insertion of 80 to 120 nucleotides between positionsg.32717298
andg.32717299
. -
NC_000023.10:g.32717298_32717299insN[?]
the insertion of an unknown number of nucleotides between positionsg.32717298
andg.32717299
. -
NC_000006.11:g.8897754_8897755ins[N[543];8897743_8897754]
the insertion of an undefined sequence of 543 nucleotides (N[543]
), and a 12 nucleotide target site duplication (g.8897743
tog.8897754
), between nucleotidesg.8897754
andg.8897755
on chromosome 6.
-
-
other
g.?_?ins[NC_000023.10:g.(12345_23456)_(34567_45678)]
the insertion of a sequence from the X-chromosome (NC_000023.10
), maximally involving nucleotides12345_45678
but certainly nucleotides23456_34567
, at an unknown position (g.?_?
) in the genome (see Uncertain).
Discussion#
Can I describe a variant as g.123insG
?
No, since the description is not unequivocal, it is not allowed.
What does the description mean, the insertion of a G
at position g.123
or the insertion of a G
after position g.123
?
The situation becomes even more complex when, using a coding DNA reference sequence, a "-" character is used; e.g., c.-14insG
or c.456-13insG
.
In the description c.456-13insG
, when the insertion is after intronic nucleotide c.456-13
, is this position c.456-12
or c.456-14
?
Can I use the "^" character to describe an insertion?
No, insertions can not be described using the format g.123ˆ124insG
or g.123ˆ124G
.
The recommendations try to restrict the number of different characters used to a minimum.
Since a character was already used to indicate a range (the underscore), a new character was not required.
How should I describe the change ATCG
ATCGATCGATCG
A
GGGTCCC
to ATCG
ATCGATCGATCG
A
ATCGATCGATCG
GGGTCCC
? The fact that the inserted sequence (ATCGATCGATCG
) is present in the original sequence suggests it derives from a duplicative event.
The variant should be described as an insertion; g.17_18ins5_16
.
A description using "dup" is not correct since, by definition, a duplication should be directly 3'-flanking of the original copy (in tandem).
Note that the description given still makes it clear that the sequence inserted between g.17
and g.18
is probably derived from nearby, i.e. positions g.5
to g.16
, and thus likely derived from a duplicative event.
A variant in the CDKN2A gene, duplicating the first 24 nucleotides of the coding DNA reference sequence, has been described as c.23ins24
. My interpretation is it should be described as c.1_24dup
, is this correct?
Since the sequence in that region is cagcATGGAGCC
GGCGGCGGGGAGCAGCATGGAGCC
TTCG, the correct description is c.9_32dup
(p.(Ala4_Pro11dup)
).
c.1_24dup
seems correct but neglects the 3'rule (3' shift possible for the highlighted region).
c.23ins24
is not correct since the position of the insertion is not described properly and because "ins24" does not define the sequence inserted.