Deletion-Insertion#
Deletion-Insertion (delins): a sequence change where, compared to a reference sequence, one or more nucleotides are replaced by one or more other nucleotides and which is not a substitution or inversion.
Syntax#
| single position | |
|---|---|
| Syntax | sequence_identifier ":" coordinate_type "." position "delins" sequence |
| Examples |
|
| position range | |
| Syntax | sequence_identifier ":" coordinate_type "." range "delins" sequence |
| Examples |
|
| Explanation of Symbols | |
| |
Notes#
- by definition, when one nucleotide is replaced by one other nucleotide, the change is a substitution.
- changes involving two or more consecutive nucleotides are described as deletion/insertion (delins) variants.
- two variants separated by one or more nucleotides should be described individually and not as a "delins".
- exception: two variants separated by one nucleotide, together affecting one amino acid, should be described as a "delins".
NOTE: this prevents tools predicting the consequences of a variant to make conflicting and incorrect predictions of two different substitutions at one position (e.g.,c.235_237delinsTAT(p.Lys79Tyr) versusc.[235A>T;237G>T](p.[Lys79*;Lys79Asn]).
- exception: two variants separated by one nucleotide, together affecting one amino acid, should be described as a "delins".
- conversions, a sequence change where a range of nucleotides are replaced by a sequence from elsewhere in the genome, are described as a "delins". The previous format "con" is no longer used (see Community Consultation SVD-WG009).
- for all descriptions, the most 3' position possible of the reference sequence is arbitrarily assigned to have been changed (3'rule).
Examples#
-
NC_000023.11:g.32386323delinsGA
a deletion of nucleotideg.32386323(aT, not described), replaced by nucleotidesGA, changing..CAGCTCTTT..to..CAGCGACTTT... The variant corresponds toLRG_199t1:c.4661delinsTCbased on a coding DNA reference sequence.
NOTE: the recommendation is not to describe the variant asNC_000023.11:g.32386323delTinsGA, i.e. describe the deleted nucleotide sequence. This description is longer, it contains redundant information, and chances to make an error increase (e.g.,NC_000023.11:g.32386323delCinsGA). -
NM_004006.2:c.6775_6777delinsC
a deletion of nucleotidesc.6775toc.6777(GAG, not described), replaced by aCnucleotide, changing..GGAAGAGTTGC..to..GGAACTTGC...
NOTE: the recommendation is not to describe the variant asNM_004006.2:c.6775_6777delGAGinsC, i.e. describe the deleted nucleotide sequence. This description is longer, it contains redundant information, and chances to make an error increase (e.g.,NM_004006.2:c.6775_6777delGTGinsC). -
LRG_199t1:c.145_147delinsTGG(p.Arg49Trp)
a deletion replacing nucleotidesc.145toc.147(CGC, not described) withTGG. -
LRG_199t1:c.9002_9009delinsTTT
a deletion of nucleotidesc.9002toc.9009, replaced by nucleotidesTTT.
NOTE: two variants separated by one nucleotide, together affecting one amino acid, should be described as a "delins", so the descriptionc.[145C>T;147C>G]is not correct. -
LRG_199t1:c.850_901delinsTTCCTCGATGCCTG
a deletion of nucleotidesc.850toc.901, replaced byTTCCTCGATGCCTG.
NOTE: parts of the inserted sequence "align" with the reference sequence, giving an alternative description likec.[850_869del;874_881del;887_897del;901_902insG]. The "delins" format is recommended: it is simpler and prevents software tools making incorrect predictions for the consequences on protein level. -
NC_000002.12:g.pter_8247756delins[NC_000011.10:g.pter_15825266]
nucleotidesg.ptertog.8247756of chromosome 2 are deleted and replaced by nucleotidesg.ptertog.1582566of chromosome 11: the derivative chromosome 2 from an unbalanced translocation between the short arms of chromosomes 2 and 11 (ISCNder(2)t(2;11)(p25.1;p15.2)). Example copied from Complex (HGVS/ISCN).
NOTE: balanced translocations (see Complex (HGVS/ISCN)) are described as two complementary "delins" variants. -
NC_000022.10:g.42522624_42522669delins42536337_42536382
conversion in exon 9 of the CYP2D6 gene, replacing exon 9 nucleotidesg.42522624tog.42522669with those of the 3' flanking CYP2D7P1 gene, nucleotidesg.42536337tog.42536382from the same genomic reference sequence (NC_000022.10). -
NC_000012.11:g.6128892_6128954delins[NC_000022.10:g.17179029_17179091]
conversion replacing nucleotidesg.6128892tog.6128954of the VWF gene (NM_000552.3:c.3675-45_3692) on chromosome 12 with nucleotidesg.17179029tog.17179091of the VWFP1 pseudogene on chromosome 22. -
NM_000797.3:c.812_829delins908_925
conversion replacing nucleotidesc.812toc.829of the DRD4 gene with nucleotidesc.908toc.925from the same reference sequence. -
NM_004006.2:c.812_829delinsN[12]
nucleotidesc.812toc.829have been deleted and replaced by 12 unknown nucleotides (N[12]).
Discussion#
What is an "indel"?
The term "indel" is not used in HGVS nomenclature (see Glossary). The term is confusing, having different meanings in different disciplines.
Can I describe a GC to TG variant as a di-nucleotide substitution (g.4GC>TG)?
No, this is not allowed.
By definition, a substitution changes one nucleotide into one other nucleotide (see Substitution).
The change TGTGCCA to TGTTGCA should be described as g.4_5delinsTG, i.e. a deletion/insertion (delins).
Are there specific recommendations regarding the maximum number of unchanged nucleotides between two single nucleotide variants and whether the change is described as a "delins" or as two separate changes?
Yes, two variants separated by one or more nucleotides should preferably be described individually and not as a "delins" (unless they together affect one amino acid). Why? First, the two variants may have been reported (or might occur) individually. Second, sequence analysis pipelines will describe such variants individually, giving the problem that an overlap with the description of the combined variant ("delins" description) might be missed in the annotation step (database queries).
The BRCA1 coding DNA reference sequence NM_007294.3 from position c.2074 to c.2080 is ..CATGACA... A variant frequently found in the population is ..CATAACA.. (NM_007294.3:c.2077G>A). In a patient I found the sequence ..CATATAACA... Can I describe this variant as NM_007294.3:c.[2077G>A;2077_2078insTA]?
The correct description of this variant is NM_007294.3:c.2077delinsATA.
NOTE: the answer was modified, i.e. the addition "However, since the variant is likely a combination of two other variants, it is acceptable to describe it as NM_007294.3:c.[2077G>A;2077_2078insTA]." was removed.