Checklist#
Purpose#
Going through publications, one can easily see where people give variant descriptions that do not correctly follow HGVS nomenclature. The checklist below covers the most frequently offended rules. Going through this list should assist you while preparing a publication containing sequence variant descriptions.
Checklist#
-
Reference Sequence
Do you clearly mention the reference sequence used for numbering (nucleotides/amino acids)?
A publication should mention, preferably in the Materials & Methods section and/or Figure or Table legend, which reference sequence file was used to describe variants and for numbering of the residues (DNA, RNA, and protein (see Reference Sequences)).- use a RefSeq accession with its version number.
- do not forget the underscore in the accession number (e.g.,
NM_004006.2
). - genomic (
g.
) reference sequences start with nucleotide 1 and can not have nucleotides with additions like a+
,-
, or*
. - for a coding DNA reference sequence, nucleotide numbering starts with the
A
of theATG
translation initiation site, as nucleotide 1. - legacy numbering is only allowed in addition to approved numbering.
- Does your reference sequence contain the residue that you describe as changed?
NOTE: NM reference sequences cover mature transcripts and do not contain intron and gene flanking sequences, and can only be used to describe variants in introns using ac.
prefix when a genomic reference sequence is given on which the coding DNA reference sequence is annotated, e.g.,NC_000023.10(NM_004006.2):c.94-2A>G
orLRG_199t1:c.94-2A>G
(see Reference Sequences).
-
Intronic variants
Are descriptions of intron variants correct and complete?- descriptions referring to exon or intron numbers instead of nucleotide positions, e.g.,
c.IVS4-2A>G
, are not allowed, these are ambiguous. - do you properly describe ranges in the introns?
The format
c.123-65_123-50
is correct, the formatc.123-65_-50
is not, it is incomplete.
- descriptions referring to exon or intron numbers instead of nucleotide positions, e.g.,
-
Insertions
Are descriptions of insertions correct and complete? (see Insertion)- insertions should be reported using the format
c.51_52insT
.
The formatc.52insT
is ambiguous, and not allowed. - do you provide the inserted sequence?
Describing a variant asc.5439_5430ins6
is not allowed, the inserted sequence (forins6
, e.g.,TGCCAT
) should be specified. - is the insertion reported indeed an insertion, or is it in fact a duplication?
Duplicating insertions should be described as duplications, not as insertions (see Duplication).
- insertions should be reported using the format
-
The 3' rule
Do you correctly apply the 3' rule?
For deletions, duplications, and insertions, the most 3' position possible is arbitrarily assigned to have been changed (see General recommendations). This rule also applies to variants in single residue stretches (mono-nucleotide or amino acid) or tandem repeats. -
Range
The sign used to indicate a range is the_
(underscore), not a-
(hyphen-minus).
The correct description to indicate a deletion of coding DNA nucleotides 12 to 14 isc.12_14del
. Not correct isc.12-14del
, this describes a deletion of nucleotide -14 in the intron directly 5' of nucleotidec.12
(see Numbering). -
Deletion
Do you indicate the first and last residue involved in a deletion?
Descriptions likeg.123del3
are not allowed, correct isg.123_125del
(see Deletion). -
Describe always on DNA-level
Do you describe all changes reported on DNA-level?
All changes reported must be described on DNA-level.- when descriptions on protein level are given in the text, upon first appearance, use a format like "
c.76G>T
(p.(Gly26Cys)
, RNA not analysed)" or "c.76G>T
(r.76g>u
p.Gly26Cys
)".
- when descriptions on protein level are given in the text, upon first appearance, use a format like "
-
RNA level descriptions
HGVS nomenclature includes recommendations for the description of changes detected on the RNA level.- several transcripts derived from one allele are described using the format
r.[76a>c,73_88del]
(see RNA).
- several transcripts derived from one allele are described using the format
-
protein level descriptions
- the protein reference sequence should represent the primary translation product, not a processed mature protein, and thus include the starting Methionine and a signal peptide sequence.
- the recommendation is to use three letter amino acid code.
Ter
or*
should be used to indicate a translation stop codon; theX
should not be used.- predicted "silent" protein level variants are described as p.(Leu54=), not as p.Leu54Leu or p.54L/L).
- the description
p.(Met1Val)
is not allowed (see Protein).
-
Mutation / polymorphism
Do not use the terms "mutation" or "polymorphism". (see terminology)- "polymorphic" variants should not be described using the
/
(slash), describe them as normal variants, likec.127A>G
andp.(Ile43Val)
.
- "polymorphic" variants should not be described using the
-
Recessive diseases
Do you clearly describe which changes are found in which combination?
A publication describing variants in patients suffering from a recessive phenotype should, for each individual, explicitly mention in which combination variants were found (per allele). -
Tabular overview
Is the overview of all changes reported clear and complete?
Preferably, a publication contains a tabular overview of all variants reported. This overview contains columns describing the change on the DNA-level (absolutely essential) and, optionally, on the RNA and protein level.
When data on RNA and/or protein level are provided, it should be made clear whether the data were deduced or experimentally verified (i.e. state explicitly when RNA was analysed, e.g., to study the consequences of a variant affecting splicing).
Make sure predicted consequences on protein level are reported in parentheses, likep.(Arg123Ser)
. -
Variant types
When giving numbers regarding the types of variants identified, do not mix numbers on DNA, RNA, and protein level.
Give numbers separately for DNA, RNA, and protein. Where would you list a substitution on DNA level, giving a deletion on RNA level (since it affects splicing), and a frameshift on protein level? -
Pathogenic
Be careful when using the term "pathogenic" (see terminology).
A variant in itself is not "pathogenic", whether it can be causally related to a phenotype observed in a patient is determined by other factors (like the patient's genotype).