Comparative Analysis of Insulin Gene Promoters: Implications for Diabetes Research

Colin W. Hay; Kevin Docherty


Diabetes. 2006;55(12):3201-3213. 

In This Article

Regulatory Elements Within Insulin Promoters

Regulatory elements within promoters can originate at different times, and species comparisons indicate that promoters evolve through transcription factor binding site turnover and accretion.[54,55] The relative numbers of the principal insulin promoter regulatory elements in the surveyed species are listed in Table 2 .

A Boxes

A-box sequences containing the TAAT motif bind homeodomain proteins,[56] the most important of which is pancreatic duodenum homeobox-1 (PDX-1),[57,58,59,60,61] which has been shown to be a potent stimulator of transcription of rat, mouse, and human insulin genes.[62] There are three principal A boxes in the human promoter: A1 (-82), A3 (-216), and A5 (–319) (Fig. 1). PDX-1 stimulates expression at A3[58,63,64,65] and mutation of A3 has the most significant effect on transcription.[61,65,66] Contrary to the opinion that A3 is not the most conserved,[16] this survey has shown that A3 is the only A box present in all the mammals and, therefore, must be considered to be the most conserved and central to PDX-1 stimulation. PDX-1 bound to A1 has been shown to interact synergistically with E47/ß2 in rat insulin 1.[30]

As the 4-bp TAAT motif can occur every 256 bp, the ability of PDX-1 to differentiate between potential regulatory elements must be influenced by adjacent sequences. The 3-bp flanking sequences have been shown to make an important contribution to the binding affinity of PDX-1 to TAAT core elements with a concomitant effect on activation. However, variations in these sequences are insufficient to completely explain differences in PDX-1 binding affinities.[67] Therefore, the 8-bp flanking regions of all A boxes were assayed for homology ( Table 3 ). The A3 box and 5' flanking region lie within a novel ECR, and this is reflected in the high degree of conservation. The lack of any other regulatory elements within this ECR based on computational analysis raises the intriguing possibility that, while the TAAT motif is symmetrical, binding of PDX-1 to the promoter may be directional. Clear, though less well defined, asymmetrical homology of the other A box flanking regions to the human sequences is also apparent. Regulatory elements present in multiple copies often exist in both orientations,[42] thereby increasing potential phenoplasty.

The A3 5' flanking region in rat insulin 1 has two additional TAAT sequences as a consequence of two single base pair changes. This creates the A4 site,[29] which is juxtaposed to A3 to generate an additional regulatory element that has been reported to bind other homeodomain transcription factors, some of which have been shown to affect transcription. One of the best studied is hepatocyte nuclear factor (HNF)-1α, which has been reported to activate the rat insulin 1 gene in the HIT cell line.[68] Similarly, Isl-1 has been found to bind to this site[69] and to interact with islet cell–specific transcription factor ß2 to stimulate rat 1 insulin expression.[70] Other transcription factors reported to bind to the A3/A4 box include cdx-3[29] and HMGI(Y).[71] Inspection of all other insulin promoters shows that this homeodomain-binding sequence is unique to rat insulin 1. It would, therefore, seem logical to conclude that these transcription factors play no role in other species. However, HNF-1α provides an example of how the promiscuity of transcription factors creates obstacles in predicting insulin promoter effecters. Although the consensus binding sequence is not present in the human insulin promoter, the A3 region is sufficiently similar for the protein to bind, at least in vitro, and stimulate reporter assays.[72] On the other hand, in vivo chromatin immunoprecipitation (ChIP) assays have shown that HNF-1α is not necessary for either insulin 1 or 2 expression in mice, which lack A4.[73] Surprisingly, both the 5' and 3' flanking regions of each of the A4 TAAT sequences have higher homology to the human A3 region than rat insulin 1 A3, differing by only 1 bp. This evokes the interesting likelihood that, although the rat insulin 1 A3 box seems to be the main binding site,[67] A4 could also bind and be regulated by PDX-1. Regardless of the regulatory capacity of the alternative A boxes, the binding kinetics of PDX-1 to the primary A3 regulatory element could be appreciably different in rat insulin 1 compared with humans and other mammals.

The greatly diverged chicken and zebrafish insulin promoters lack mammalian A boxes; however, several TAAT motifs are present. The chicken has two at –359 and –386, and zebrafish has three at –142, –347, and –359 plus two more further upstream at –473 and –510. The clustering of TAAT motifs is greater than would be expected from random nucleotide arrangements. While TAAT motifs are targets for a large number of homeodomain transcription factors, it is worthy to note that the 5' and 3' flanks of the zebrafish A boxes at –359 and –142 have 3-bp sequences associated with strong PDX-1 binding,[67] suggesting a possible role for PDX-1 in regulating these insulin genes. The flanking regions share no homology with human. This is unlikely to reflect divergence of the PDX-1 proteins (rodent, chicken, and zebrafish PDX-1 proteins share 89, 26, and 49% amino acid sequence identity with the human protein, respectively) as the homeodomains are well conserved and there is no evidence of species specificity in DNA binding.

GG Boxes

In addition to A boxes, the GGAAAT-containing GG2 motif (–145) is also activated by PDX-1[74] despite its deviation from the homeodomain consensus. The human insulin promoter contains a second GG motif 5 bp downstream of GG2 and commonly referred to as GG1[75] or A2.[28] Mutation of these GG regulatory elements either singly or together has been shown to drastically reduce transcription,[76] and the transcription factor binding to GG1 interacts with a transcription factor binding to the adjacent C1 site.[77] Together, these findings suggest that both of the GG regulatory elements have a function in insulin expression. Of the two, GG2 is by far the more conserved being present in all mammals except the rodent insulin promoters. GG1, on the other hand, is absent from insulin promoters that diverged from human more than 25 million years ago, with the exception of the rat insulin 1 gene and dog. The presence of the highly conserved GG1 and C1 regulatory elements immediately downstream of GG2 and GG1, respectively, precludes useful comparison of flanking regions. The chicken insulin promoter has a GG motif at –130, which is in the same general region as GG1 in human (–133); however, there is no homology with the flanking regions of either human GG1 or GG2. The zebrafish insulin promoter does not contain any GG motifs.

Cyclic AMP Response Element

In the context of the insulin promoter, cAMP responsive elements bind the broadest array of transcription factors. These are generally closely related members of the bZIP CREB/ATF family, which can exist as multiple isoforms[78] that can interact with transcription factors activated by cAMP and diacylglycerol signaling pathways[79,80] to create activators, nonactivators, or repressors. The human insulin gene has four CRE sites: CRE1 at –210; CRE2 at –183; CRE3 at +18; and CRE4 at +61.[81] Although none of the CRE sites contains the consensus CRE sequence of TGACGTCA, mutagenesis experiments have shown that all are transcriptionally active.[82]

Comparison of CRE sites between species ( Table 4 ) reveals that only primates have multiple copies of CREs with other mammals containing a single CRE corresponding to CRE2. Of these, only the dog CRE is identical to the conserved human CRE2 site. The multiple CRE sites in primates could be due to several factors; the most likely being dietary. It should be noted that while gorillas are often considered to be predominantly folivorous, it has become apparent that they also consume a significant amount of fruit.[83] This is even truer of the Western gorilla (Gorilla gorilla), whose genome is being sequenced for assembly, than of Eastern gorillas (Gorilla beringei). Also, all the primates, especially the great apes, are partly omnivorous since they supplement their diets with birds, eggs, small reptiles, and insects. In comparison to the other mammals studied, only primates consume large quantities of fruit in their diet. However, the number of CRE sites is not in a simple direct correlation with the amount of fruit consumed, as all the studied primates eat large quantities. Another possible reason is that while primates are omnivorous to varying degrees, they often gorge themselves on a single food (e.g., ripe fruit when a tree is in season or meat when a whole carcass is consumed quickly), which would give rise to major alterations in metabolic demands. This would be particularly pertinent to early humans and necessitate an insulin promoter that could respond accordingly. The phenomenon of increased numbers of CREs in primates may be expedited by the fact that that primate promoters have an increased rate of evolution.[44]

As with other regulatory elements, the chicken and zebrafish insulin promoters do not contain obvious CRE sites. The chicken insulin promoter contains four possible (three overlapping) nonconsensus sequences in the vicinity of the conserved mammalian CRE site, while the zebrafish has two potential nonconsensus octamers at –46 and –226.

It is impossible to draw conclusions on the effects of the numerous minor nucleotide changes on CRE site activity, as most regulatory elements can tolerate one or more substitutions without total loss of function.[84,85] Therefore, it may be very significant that, even with the variability of the octamer in the conserved CRE site, sequences that include the CRE core along with at least 8 bp of both 5' and 3' flanking regions represent one of the most prominent ECRs in all mammalian insulin promoters. This strongly points to the importance of CRE sites in insulin gene regulation.

C Elements

Initial expression studies on the C1 element at –128 (5'TGCAGCCTCAGCC) were carried out on the rat insulin 2 gene, showing that it binds the transcription factor RIPE3b1, which was subsequently identified as the basic leucine zipper (bZIP) protein MafA.[86,87,88,89,90] Mutagenesis of the human C1 MafA binding site reduces promoter activity by 74% in INS-1 ß-cells[91] and blocks activation by glucose in MIN6 ß-cells[92] MafA can also interact with ß2 and PDX-1.[93] All the mammalian insulin genes show extremely high conservation of the C1 site, and all are identical to human, except dog and pig, which have 1- and 2-bp substitutions at the 3' and 5' regions of the consensus sequence, respectively. As the recognition site is 13 bp long, it is possible that mutations at its extremities would not necessarily eliminate MafA stimulation. Despite the clear conservation pressures on the C1 site, no comparable sequence was detected in the chicken and zebrafish insulin promoters.

The human insulin promoter has a bipartite C2 element (5'CAGGGACAGG) at –252,[94] and rat insulin 1 promoter has been reported to contain a dissimilar, though active, sequence between –329 and –307. The C2 site can bind PAX4 and PAX6, which repress[95] and stimulate,[96] respectively. A search of insulin promoters showed that the human C2 site is present in all primates, although African green monkey has a single base pair substitution between the two CAGG motifs. Among nonprimates, dog has two substitutions between the direct repeat and cow has three repeats with the intervening regions containing 1- and 2-bp deletions. It is not immediately apparent from DNA sequence alone whether these latter sites are functional.

E boxes

E boxes (5'CANNTG) bind proteins of the basic helix loop helix (bHLH) class of transcription factor with ubiquitous E47 forming a heterodimer with neuroendocrine cell specific NeuroD/ß2.[97] Two important E boxes were initially identified in the rat insulin 1 promoter between –104 and –112 (E1 or IEB1) and between –233 and –241 (E2 or IEB2).[98] The E1 box is the more conserved of the two and analysis showed that it is present in all mammal insulin promoters. Mutagenesis of this site in the rat insulin 1 and 2 promoters results in reduced transcription,[98,99] and in the human insulin promoter drastically reduces basal transcription.[91] and responsiveness to glucose.[92] The E2 motif is less well conserved and the homologous sequence in the human insulin promoter at –239 (5'GCCACCGG).[75] contains a nonconsensus recognition site. The human E2 sequence can bind the ubiquitous transcription factor USF.[100] but it does not appear to have a measurable effect on the overall activity of the promoter. In addition to the E1 and E2 boxes, a search of the insulin promoters revealed the presence of many other "CANNTG" consensus sequences ( Table 2 ), including two in the negative regulatory element that lie just 23 and 33 bp upstream of the human E2 site. The presence of numerous potential E boxes suggests that regulation of the insulin promoter by bHLH transcription factors remains to be fully elucidated. Chicken and zebrafish insulin promoters contain neither E1 nor E2; however, they possess several consensus E box sites.

An unnamed sequence at –232 (5'GGGCCC), which we have tentatively termed G2 in Fig. 1, overlaps the 5' end of the E2 box and binds a factor with limited tissue distribution.[101] This sequence, which is known to induce DNA curvature, may serve to bring together proteins that bind at sites flanking this motif. Examination of the other insulin promoters reveals that within the primates, chimpanzee and gorilla contain the G2 sequence at the same location while orangutan, rhesus macaque, and African green monkey share a transition at the first nucleotide. The G2 site is absent from owl monkey; however, this primate has an alternative G2 motif at –453. Among the other mammalian insulin promoters, mouse insulin 2 and cow have a G2 site in the same region while dog, mouse insulin 1, and pig have alternative G2 sequences at –329, –400, and –16, respectively. Since a 6-bp motif would be expected to occur only once every 4,096 bp by random, the existence of alternative G2 motifs may indicate that G2-facilitated DNA bending abets interactions between proteins binding to the promoter. The G2 motif is absent from the rat insulin paralogues, chicken and zebrafish.

Negative Regulatory Element

The human insulin promoter contains an inhibitory sequence (–279 to –258) referred to as the negative regulatory element (NRE) (5'GAGACATTTGCCCCCAGCTGT)[75,102] that lies within the glucose sensing Z element (–243 to –292).[103,104] It displays contrary properties acting as both a potent glucose-responsive transcriptional enhancer in primary cultured islet cells and as a transcriptional repressor in immortalized ß- and non-ß-cells and in primary fibroblasts.[103] Searches of the insulin promoters detected the NRE sequence in all primates; however, it is absent from all other species, which is in agreement with reports that there is no evidence for a ß-cell–specific NRE in rat insulin 1.[98,105]

Insulin-linked Polymorphic Region

A hypervariable region containing variable numbers of tandem 14-bp repeats (5'TCTGGGGAGAGGGG) (insulin-linked polymorphic region [ILRP] or variable number of tandem repeats) is located at approximately –360 in the human insulin promoter. The ILPR adopts an altered structure, which has been characterized as a quadriplex involving interactions between the G residues on the top strand.[106] This sequence, which binds the transcription factor Pur-1/Maz.[107] has a powerful effect on promoter activity in ß-cells. Three classes of VNTR alleles have been identified based on the number of repeats of the 14-bp sequence: class I (20–63 repeats), class II (64–139 repeats), and class III (140–210 repeats). There is a correlation between the number of repeats in this region (IDDM2 locus) and susceptibility to type 1 diabetes with the highest risk conferred by class I,[108] while class III has been linked to type 2 diabetes.[109] On the other hand, studies involving large cohorts have shown that this region has no impact on early growth,[110] insulin release, or diabetes.[111] The class I allele is associated with higher levels of insulin mRNA in the pancreas, whereas class III alleles are associated with higher levels of insulin gene transcription in the thymus.[20] The increased levels of insulin in the thymus may promote efficient deletion of autoreactive T-cells for proinsulin and immune tolerance to a key antigen implicated in type 1 diabetes. The ILPR sequence was found in only the chimpanzee promoter.

G1 Box

The G1 box (5'GTAGGGGA) at –52 contains a sequence similar to that in the ILPR repeat sequence. The human insulin promoter G1 box binds the transcription factor Pur-1/MAZ.[107,112] Although rat insulin 1 and 2 promoters lack the 5'GTAGGGGA motif, Pur-1/MAZ can bind to the adjacent guanine-rich region that often contains a GAGA box to stimulate transcription.[113] A search of insulin promoters shows that chimpanzee, orangutan, and owl monkey have a G1 sequence identical to human. Gorilla and rhesus macaque share a single nucleotide change; however, like African green monkey, rat insulin 1, and both mouse paralogues, they retain the GAGA box. Therefore, it is likely that Pur-1/MAZ is active in regulation of these insulin promoters. Pig, cow, and dog all contain deletions in this region, and chicken and zebrafish lack homologous motifs.

Enhancer Core

The core element (5'TGTGGAAAG) at –312 has a perfect match to the binding site for the CCCAAT-enhancer binding protein (C/EBP) and probably other factors. There is very little known about this regulatory element, although it may act along with the adjacent A5 to mediate MafA-PDX-1 interactions.[104] The enhancer core is present in all the primates for which sequence is available (not gorilla and orangutan). Rat insulin 2, mouse insulins 1 and 2, and dog share a single conservative transition at the most 3' position. Rat insulin 1 has an additional mutation at the most 5' position that may significantly reduce stimulatory potential, and the motif is absent from all other species.

SP1 Site

The SP1 site (5'CCGCCC) at –345 was originally identified as a sequence that could bind a factor present in HIT T15 ß-cells. The SP1 site appeared to exhibit powerful transcriptional effects, but mutations that abolished protein binding had no effect on its transcriptional activity (A.R. Clark and K.D., unpublished findings), suggesting possible interactions with adjacent sites. The SP1 site has also been identified as a potential binding site for the SP1-like factor KLF11, variants of which may contribute to the development of diabetes.[114] Examination of primates for which sequence is available (not gorilla and orangutan) shows that all but African green monkey contain the identical SP1 site in the same position. African green monkey has a single nucleotide substitution of a C to T that reduces but does not eliminate KLF11 binding to oligonucleotide in electrophoretic mobility shift assay (EMSA) studies.[114] It is absent from all other species.

Ink Box

The Ink (for insulin kilobase upstream) sequence at –1,030 contains a cluster of potential binding sites comprising a palindromic element with zero spacing overlapping a direct-repeat element with 2 bp pairing (5'AG GTCCCCAGGTCATGCCCTC) and is responsive to both retinoic acid and thyroid hormone.[115] Searches of available insulin promoters sequences upstream to –1,500 shows that the Ink box is absent from all nonprimates. Of the primates, distant upstream sequence is available for only chimpanzee and rhesus macaque. Both of these monkeys contain the Ink motif at –854 and –947, respectively. Although the positions are quite removed from the human, the immediate 30-bp regions display 95% identity with the human Ink region, suggesting that this regulatory element may be influential in insulin expression, perhaps playing a role in energy homeostasis.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.