Skip to content

Insertion#

Insertion: a sequence change where, compared to the reference sequence, one or more nucleotides are inserted and where the insertion is not a copy of a sequence immediately 5'

Syntax#

Syntax sequence_identifier ":r." positions "ins" sequence
Examples
  • NM_004006.3:r.123_124insauc
Explanation of Symbols
  • coordinate_type: the coordinate type, indicating the type of numbering used; r
  • ins: the type of change, an insertion
  • positions: the positions of the two nucleotides flanking the insertion site; 123_124
  • sequence: the RNA sequence that is inserted; auc †
  • sequence_identifier: the sequence identifier used; NM_004006.3
See also explanation of grammar used in HGVS Nomenclature.

Notes#

  • the "positions_flanking" should contain two flanking nucleotides, e.g. 123 and 124 but not 123 and 125.
    • the "positions_flanking" should be listed from 5' to 3', e.g. 123_124 not 124_123
  • when a variant can be described as a duplication it must be desribed as a duplication and not as an insertion (see Prioritization
  • for all descriptions the most 3' position possible of the reference sequence is arbitrarily assigned to have been changed (3'rule)
  • the "inserted_sequence" can be given as the nucleotides inserted (e.g. insagc) or, for larger insert sequences, by referring to the sequence in the reference sequence (e.g. r.849_850ins858_895) or another reference (see Examples).
    • when the inserted sequence is very long, it can best be submitted to a database (e.g. GenBank); the accession.version number obtained can then be used to describe the variant, like r.123_124ins[L37425.1:r.23_361].
  • † = see Uncertain; when the postion and/or the sequence of an inserted sequence has not been defined, a description may have a format like r.(100_150)insn[25].

Examples#

  • LRG_199t1:r.426_427insa: the insertion of an "a" nucleotide between nucleotides r.426 and r.427
  • LRG_199t1:r.756_757insuacu: the insertion of nucleotides "uacu" between nucleotides r.756 and r.757
  • NM_004006.2:r.(222_226)insg (p.Asn75fs): the insertion of a "g" at an unknown position in the sequence encoding amino acid 75
  • NM_004006.2:r.549_550insn : the insertion of one not specified nucleotide (n) between position r.549 and r.550
  • NM_004006.2:r.761_762insnnnnn (alternatively r.761_762insn[5]): the insertion of 5 not specified nucleotides (nnnnn) between position r.761 and r.762
  • LRG_199t1:r.1149_1150insn[100]: the insertion of 100 not specified nucleotides between position r.1149 and r.1150
  • NG_012232.1(NM_004006.2):r.2949_2950ins[2950-30_2950-12;2950-4_2950-1]: the insertion of intronic nucleotides r.2950-30 to r.2950-12 and r.2950-4 to r.2950-1 between nucleotides r.2949 and r.2950 (caused by the deletion NC_000023.10(NM_004006.2):c.2950-11_2950-5del). Alternative description r.2949_2950ins[2950-30_2950-12;uuag].
    • NOTE: for more examples of variants affecting splicing see Splicing

Discussion#

Can I describe a variant as r.123insg?

No, since the description is not unequivocal it is not allowed. What does the description mean, the insertion of a "g" at position 123 or the insertion of a "g" after position 123? The situation becomes even more complex when using a coding RNA reference sequence a "-" character is used, e.g. r.-14insG; when the insertion is after nucleotide r.-14, is this position r.-13 or r.-15?

Can I use the "^" character to describe an insertion?

No, insertions can not be described using the format r.123ˆ124insu or r.123ˆ124u. The recommendations try to restrict the number of different characters used to a minimum. Since a character was already used to indicate a range (the underscore) a new character was not required.

How should I describe the change aucgaucgaucgaucaggguccc to aucgaucgaucgaucaaucgaucgaucggguccc? The fact that the inserted sequence (aucgaucgauc) is present in the original sequence suggests it derives from a duplicative event.

The variant should be described as an insertion; r.17_18ins5_16. A description using "dup" is not correct since, by definition, a duplication should be directly 3'-flanking of the original copy (in tandem). Note that the description given still makes it clear that the sequence inserted between r.17 and r.18 is probably derived from nearby, i.e. position r.5 to r.16, and thus likely derived from a duplicative event.