Insertion#
Insertion: a sequence change between the translation initiation (start) and termination (stop) codon where, compared to the reference sequence, one or more amino acids are inserted, which is not a frameshift and where the insertion is not a copy of a sequence immediately N-terminal (5').
Syntax#
Syntax | sequence_identifier ":p." aa_range "ins" sequence |
---|---|
| |
Examples |
|
Explanation of Symbols | |
|
Notes#
- all variants should be described on the DNA level; descriptions on the RNA and/or protein level may be given in addition.
- predicted consequences, i.e. without experimental evidence (no RNA or protein sequence analysed), should be given in parentheses, e.g.,
p.(Arg727_Ser728insTrpCys)
. - the "positions_flanking" should contain two flanking residues, e.g.,
Lys23_Leu24
, not two non-flanking residues (Lys23_Asn25
).- an insertion can not be described using one amino acid position, like
p.Lys23insAsp
.
- an insertion can not be described using one amino acid position, like
- for all descriptions, the most C-terminal position possible of the reference sequence is arbitrarily assigned to have been changed (3'rule).
- duplicating insertions should be described as duplications (see Duplication), not as an insertion.
- when the inserted amino acid sequence is large, the insertion may be described by its length, e.g.,
p.Lys2_Leu3insX[34]
(open reading frame insertion) orp.Lys2_Leu3insTer12
(translation stop in inserted sequence).
NOTE: the inserted amino acid sequence can be derived from the description of the variant on the DNA or RNA level. - insertions extending the full-length amino acid sequence at the C-terminal end with one or more amino acids are described as an Extension.
- insertions on DNA or RNA level that
- introduce an immediate translation termination (stop) codon on the protein level, are described as a nonsense variant.
- encode a translation stop codon in the inserted sequence are on the protein level described as an insertion of this sequence, not as a deletion-insertion removing the entire C-terminal amino acid sequence.
- encode an open reading frame which after the inserted sequence shifts to another reading frame, are described as a frameshift.
Examples#
-
p.His4_Gln5insAla
the insertion of amino acidAla
between amino acidsHis4
andGln5
, changingMetLysGlyHisGlnGlnCys
toMetLysGlyHis
Ala
GlnGlnCys
. -
p.Lys2_Gly3insGlnSerLys
the insertion of amino acidsGlnSerLys
between amino acidsLys2
andGly3
, changingMetLysGlyHisGlnGlnCys
toMetLys
GlnSerLys
GlyHisGlnGlnCys
. -
p.(Met3_His4insGlyTer)
the predicted consequence on the protein level of an insertion on the DNA level (c.9_10insGGGTAG
), is the insertion ofGlyTer
(alternativelyGly*
).
NOTE: this is not described asp.(Met3_Ile3418delinsGly)
, a deletion-insertion replacing the entire C-terminal protein coding sequence downstream ofMet3
with aGly
. -
NP_004371.2:p.(Pro46_Asn47insSerSerTer)
the predicted consequence on the protein level resulting from DNA variantNM_004380.2:c.138_139insTCATCATGAGCTCCC
, is the insertion ofSerSerTer
between amino acidsPro46
andAsn47
(alternativelySerSer*
).
NOTE: the insertion is not described asinsSerSerTerAlaPro
; amino acids after the translation termination codon are not listed. -
p.Arg78_Gly79insX[23]
the in-frame insertion of a 23 amino acid sequence between amino acidsArg78
andGly79
.
NOTE: it must be possible to deduce the 23 inserted amino acids from the description given on the DNA or RNA level. -
NP_060250.2:p.Gln746_Lys747ins*63
the in-frame insertion of a 62 amino acid sequence ending at a stop codon at position*63
between amino acidsGln746
andLys747
.
NOTE: it must be possible to deduce the inserted amino acid sequence from the description given on the DNA or RNA level. -
incomplete descriptions (preferably use exact descriptions only)
-
NP_003997.1:p.(Ser332_Ser333insX)
the insertion of an unknown amino acid (insX
) between amino acidsSer332
andSer333
.
NOTE: the IUPAC code for an unknown amino acid isX
(see Standards). Note that in the past,X
has been used to indicate a translation termination codon. -
NP_003997.1:p.(Val582_Asn583insX[5])
(alternativelyNP_003997.1:p.(Val582_Asn583insXXXXX)
)
the insertion of 5 unknown amino acids (insX[5]
) between amino acidsVal582
andAsn583
.
-
Discussion#
Can I describe a variant as p.His4insAla
?
No, since the description is not unequivocal it is not allowed.
What does the description mean, the insertion of a Ala
at position 4 or the insertion of a Ala
after position 4?
Can I use the "^" character to describe an insertion?
No, insertions can not be described using the format p.His4Gln5insAla
or p.123ˆ124Ala
.
The recommendations try to restrict the number of different characters used to a minimum.
Since a character was already used to indicate a range (the underscore), a new character was not required.
How should I describe the change MetArgThr
GlySerSer
HisGlnTrpPhe
to MetArgThr
GlySerSer
His
GlySerSer
GlnTrpPhe
? The fact that the inserted sequence (GlySerSer
) is present in the original sequence suggests it derives from a duplicative event.
The variant should be described as an insertion; p.His7_Gln8insGly4_Ser6
.
A description using "dup" is not correct since, by definition, a duplication should be directly 3'-flanking of the original copy (in tandem).
Note that the description given still makes it clear that the sequence inserted between p.His7
and p.Gln8
is probably derived from nearby, i.e. position p.Gly4
to p.Ser6
, and thus likely derived from a duplicative event.