Alleles#

Allele: a series of variants in a transcript from one chromosome.

Syntax#

Variants in cis
Syntax	`sequence_identifier ":r.[" first_change ";" second_change "]"`
Examples	`NM_004006.3:r.[123c>a;345del]`
Variants in trans
Syntax	`sequence_identifier ":r.[" first_change "];[" second_change "]"`
Examples	`NM_004006.3:r.[123c>a];[345del]`
Variants with unknown or uncertain phase
Syntax	`sequence_identifier ":r." first_change "(;)" second_change`
Examples	`NM_004006.3:r.123c>a(;)345del`
Explanation of Symbols
`coordinate_type`: the coordinate type, indicating the type of numbering used; `r` `first_change`: the description of the first variant; `123c>a` `second_change`: the description of the second variant; `345del` `sequence_identifier`: the sequence identifier used; `NM_004006.3` See also explanation of grammar used in HGVS Nomenclature.

Notes#

all variants should be described on the DNA level; descriptions on the RNA and/or protein level may be given in addition.
humans are diploid organisms and have two alleles at each genetic locus, with one allele inherited from each parent.
when two variants are identified in a transcript that derive from one chromosome (in cis), this should be described as r.[variant1;variant2].
when two variants are identified in transcripts that derive from different chromosomes (in trans), this should be described as r.[variant1];[variant2].
when two variants are identified in a transcript, but when it is not known whether these derive from one chromosome (in cis) or from different chromosomes (in trans), this should be described as variant1(;)variant2, i.e. without using [ ].
NOTE: it is recommended to determine whether the changes are in the same transcript or not.
when two variants are identified in two different transcripts that derive from one variant on the DNA level, the variants are separated using a ,; r.[variant1,variant2].

Examples#

For more examples, see DNA alleles.

variants on one allele
- LRG_199t1:r.[76a>u;103del]
  one transcript contains two different changes, r.76a>u and r.103del. The variants are found in cis.
- LRG_199t1:r.[(578c>u;1339a>g;1680del)]
  one transcript contains three different predicted changes, r.(578c>u), r.(1339a>g), and r.(1680del). The variants are found in cis.
variants on two alleles
- LRG_199t1:r.[76a>u];[103del]
  the two transcript alleles each contain a different change, r.76a>u and r.103del. A heterozygous case (compound heterozygote, e.g., in a recessive disease). The variants are found in trans.
- NM_004006.2:r.[76a>u];[76a>u]
  both transcript alleles contain the same variant, r.76a>u. A homozygous case (e.g., in a recessive disease).
  NOTE: LRG_199t1:r.76a>u(;)(76a>u) indicates analysis detects one variant (r.76a>u), suggesting both transcript alleles contain this variant, but it can not be excluded the other allele is deleted or not expressed.
- LRG_199t1:r.[76a>u];[76=]
  one transcript allele contains a variant, r.76a>u, the other transcript allele contains at position r.76 the reference sequence, r.76= (is wild-type).
  NOTE: the description r.[76a>u];[=], containing r.76a>u and r.=, is different since it indicates the entire coding RNA reference sequence was analysed and the only variant identified was r.76a>u (on one allele).
- NM_004006.2:r.[76a>u];[?]
  one transcript allele contains a variant, r.76a>u, while a variant in the other transcript allele is expected but not yet identified (r.?) (e.g., in individuals affected by a recessive disease).
alleles not certain
- NM_004006.2:r.76a>u(;)103del
  two variants are found in a transcript, r.76a>u and r.103del, but it is not known whether they derive from the same or from different transcript alleles (chromosomes).
  NOTE: when it is not known on which allele a variant is, allele brackets should not be used.
one allele, two transcripts
- LRG_199t1:r.[897u>g,832_960del]
  two different transcripts, r.897u>g and r.832_960del, derive from one variant (LRG_199t1:c.897T>G on the DNA level).

Discussion#

Was originally the recommendation to use the format [r.76a>c+r.83g>c]?

Indeed, originally den Dunnen and Antonarakis, 2000 the suggestion was to describe two changes in a transcript from one chromosome as [r.76a>c+r.83g>c], i.e. using a "+"-character to separate the two changes, while an earlier publication suggested to use a ";" ([r.76a>c;r.83g>c] (Antonarakis and the Nomenclature Working Group, 1998). To prevent confusion with older publications, to improve overall consistency, and to keep descriptions as short as possible, the 2000 proposal was retracted. The recommended format is r.[76a>c;83g>c].

In recessive diseases, is it important I show in which combination variants were found?

When in one individual you find more than one variant, it is essential that you clearly indicate on which transcript allele(s) variant(s) were found.

disease severity will depend on the combination of variants found;
in recessive disease, when two variants are in one transcript, an individual is a carrier or you might not have found the variant on transcripts from the second allele.

I find the notation r.[76a>c] without describing the second transcript allele misleading; not enough researchers know this refers to only one of the two transcripts present. Would using r.[76a>c];[] be OK?

No, the recommended description is r.76[a>c];[=], i.e. r.76= for "no change" at position r.76 on the second transcript.

How should I describe the variants detected in males and females for a transcript from the X-chromosome?

In females, the description is straightforward, like LRG_199t1:r.[76a>c];[76=]. In males, there is no transcript from the second allele (X-chromosome), which can be described as LRG_199t1:r.[76a>c];[0], i.e. using r.0 to indicate the absence of a transcript from the second X-chromosome.