| |
|
|
|
|
|
|
|||
|
CORRESPONDENCE In 2000, we elucidated the structure of the RH
locus1 by showing that it is an example for a gene
cluster; RHD and RHCE face each other by their 3'
tail ends, and a third gene, SMP1, was found to be
interspersed between the 2 rhesus genes. Two 9 000 base pair (bp) DNA
segments, dubbed "rhesus boxes," of identical orientation fringed
the RHD gene (Figure 1,
top).
Based on this structure of the RH locus, the RHD
gene deletion was parsimoniously explained by an unequal crossing-over
event.1 Furthermore, the inverse orientation of the
RH genes may facilitate gene conversion among both rhesus
genes, which would explain the high frequency of RHD-CE-D or
RHCE-D-CE hybrid alleles.2 However, it remained
unknown which rhesus gene, if any, represented the ancestral
positioning. The close proximity of the RHCE and
SMP1 in humans was startling too. The duplication of the rhesus gene is known to have occurred during
primate evolution,3 giving rise to the RHD and
RHCE genes in humans. Hence nonprimate mammals, like mice,
may reveal the ancient state of the RH locus. In this
context an 89 065 bp genomic DNA segment that was recently deposited
in public databases (GenBank entry AL611963), which
encompassed the mouse RH locus (Figure 1, bottom), was most
disclosing. In order to compare the topology in mouse to the human
RH locus we assembled a 315 242 bp DNA segment that
included the human RH locus. The assembly of this human genomic DNA was complicated by the fact that
the current GenBank entry AL139426 contained sequences representative
of RHD, SMP1, both rhesus boxes, and parts of
RHCE but did not represent their correct topology. To
overcome this limitation we compared the sequence of AL139426 to the
sequences of RHD (X63097) and RHCE (M34015) cDNA,
of RHD (AB035192) and RHCE (AB035191) intron 3, of RHD (AB035185) and RHCE (AB035184) intron 9, and of the upstream (AJ252311) and downstream (AJ252312) rhesus boxes. We determined multiple misassemblies occurring in long
regions between almost identical paralogous sequences (join of
RHD exon 3 to RHCE exon 4, RHCE exon 3 to RHD exon 4, 5' upstream rhesus box to 3' downstream
rhesus box, 3' upstream rhesus box to 5' upstream rhesus box, and
failed assembly of RHCE intron 9). We compiled the 315 242
bp human genomic DNA contig (Figure 1, upper panel) including
both rhesus genes and a stretch of surrounding DNA comprising more than
100 000 bp using AL031432 (5' of RHD), AL031284
(RHCE), AB035185 (RHD intron 9), AB035184
(RHCE intron 9) and a corrected version of AL139326. This
third party annotated human DNA segment was deposited under
GenBank accession number BN000065. The position and orientation of proteins in the human and
mouse DNA segments were determined by a homology search against the
nonredundant protein database of the GenBank (Tblastx) utilizing the
NCBI Blast page. Then, each possible match was manually evaluated after
a 2-sequence alignment (Blast 2 sequences).4 Both genomic DNA segments contained the RH gene(s), SMP1, and
the 2 additional genes, GCIP-interacting protein P29 and
NPD014. In addition, the human DNA segment, but not the
mouse DNA segment, contained the 2 rhesus boxes carrying one open
reading frame each and a succinate dehydrogenase pseudogene located in
the introns 3 of RHD and RHCE (Figure 1). The
3' ends of the rhesus boxes carry GC-rich regions that are
typical for some strong promoters. The juxtapositioning of this
structure right in front of the SMP1 start codon may modify the expression of smp1 in primates compared to nonprimate species. Based on the gene positions and orientations, RHCE was
determined to represent the ancestral state. The close proximity of SMP1 and RH known in humans1 was also
observed in the mouse RH locus (Figure
2). In the mouse, there were 8 639 bp
between NPD014 and SMP1. This size of a DNA
stretch corresponded to the 11 437 bp between NPD014 and
the upstream rhesus box rather than to the 91 136 bp between
NPD014 and SMP1. The limited conservation of the
noncoding regions did not allow a more detailed analysis of the
RH duplication site in the moment.
Among the 4 proteins, smp1 was most conserved and Rh was least
conserved (Table 1). There are 2 human
smp1-analogous proteins, smp1 (accession number AAD17754) located in
chromosome 1 and c21orf4 (P56557) located in chromosome 21. These 2 human proteins corresponded to 2 different mouse proteins, BAB29242 and
BAB32266, that had 94% and 98% homology to the human genes,
respectively.
In conclusion, RHD arose by a duplication of RHCE. It is likely that the orientation of RHD was inverted during this event. We propose that the rhesus boxes were instrumental for the duplication. SMP1 is a highly conserved gene located in the immediate proximity of RH during much of the mammalian evolution. An understanding of the events shaping the rhesus polymorphism and the underlying mechanisms will contribute to improving genotyping strategies for rhesus as well as possibly for a host of other loci with clustered genes in the genome.
Franz F. Wagner and Willy A. Flegel
References
1.
Wagner FF, Flegel WA.
RHD gene deletion occurred in the Rhesus box.
Blood.
2000;95:3662-3682 2. Wagner FF, Frohmajer A, Flegel WA. RHD positive haplotypes in D negative Europeans. BMC Genet. 2001;2:10[CrossRef][Medline] [Order article via Infotrieve]. 3. Matassi G, Cherif-Zahar B, Pesole G, Raynal V, Cartron JP. The members of the RH gene family (RH50 and RH30) followed different evolutionary pathways. J Mol Evol. 1999;48:151-159[CrossRef][Medline] [Order article via Infotrieve]. 4. Tatusova TA, Madden TL. Blast 2 sequences: a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett. 1999;174:247-250[CrossRef][Medline] [Order article via Infotrieve](http://www.ncbi.nlm.nih.gov/blast/). Accessed November 19, 2001. Related Article in Blood Online:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||
| Copyright © 2002 by American Society of Hematology Online ISSN: 1528-0020 | |||||||||||