ABSTRACT
CTCF is a highly conserved zinc-finger DNA-binding protein that mediates interactions between distant sequences in the genome. As a consequence, CTCF regulates enhancer-promoter interactions and contributes to the three-dimensional organization of the genome. Recent studies indicate that CTCF is developmentally regulated, suggesting that it plays a role in cell type-specific genome organization. Here, we review these studies and discuss how CTCF functions during the development of various cell and tissue types, ranging from embryonic stem cells and gametes, to neural, muscle and cardiac cells. We propose that the lineage-specific control of CTCF levels, and its partnership with lineage-specific transcription factors, allows for the control of cell type-specific gene expression via chromatin looping.
Introduction
CTCF is a zinc-finger protein that was initially described as a transcriptional repressor of the Myc gene (Klenova et al., 1993; Lobanenkov et al., 1990). It is conserved among bilaterians, and phylogenetic analyses suggest an early origin in the evolution of Metazoa (Heger et al., 2012). CTCF is composed of multiple domains (see Box 1) that allow it to bind to different DNA motifs and various regulatory proteins (Fig. 1). CTCF was initially shown to bind to insulator sequences within the α and β-globin loci (Bell et al., 1999; Chung et al., 1997; Furlan-Magaril et al., 2011; Valadez-Graham et al., 2004), and the imprinted Igf2/H19 locus (Bell and Felsenfeld, 2000; Hark et al., 2000; Kanduri et al., 2000); studies using reporter constructs in different cell types have suggested that CTCF functions as an insulator protein that can block the ability of enhancers to activate promoters when placed between them in reporter assays (Recillas-Targa et al., 2002). Subsequent work revealed a role for CTCF in the mediation of enhancer-promoter interactions (Guo et al., 2015), alternative splicing (Marina et al., 2016; Shukla et al., 2011), recombination (Hu et al., 2015) and DNA repair (Han et al., 2017). These different functions of CTCF are presumably a reflection of its role, together with that of cohesin, in regulating the formation of chromatin loops and, hence, in controlling three-dimensional (3D) chromatin organization (see Box 2 and Fig. 2). (Fudenberg et al., 2016; Haarhuis et al., 2017; Nora et al., 2017; Sanborn et al., 2015).
CTCF is composed of an N-terminal domain, a central zinc-finger domain with 11 C2H2 zinc fingers (ZF) and a C-terminal domain. The zinc-finger domain is responsible for binding to a 15 bp core motif of DNA, employing ZFs 3-7, while the remaining ZFs can modulate CTCF-binding stability by interacting with adjacent DNA modules (Hashimoto et al., 2017; Nakahashi et al., 2013; Rhee and Pugh, 2011; Schmidt et al., 2012). All three domains of CTCF may also interact with other proteins (see Fig. 1) (Chernukhin et al., 2007; Delgado-Olguín et al., 2012; Ishihara et al., 2006; Lee et al., 2017; Uusküla-Reimand et al., 2016; Xiao et al., 2011, 2015) or RNA (Kung et al., 2015; Saldaña-Meyer et al., 2014; Sun et al., 2013), and are susceptible to post-translational modifications that could affect interactions with DNA or other proteins (Klenova et al., 2001; MacPherson et al., 2009; Yu et al., 2004).
CTCF binds to 40,000-80,000 sites in the mammalian genome, which are predominantly located in intergenic regions and introns, overlapping with regulatory sequences such as enhancers and promoters (Chen et al., 2012). CTCF occupancy across cell types is variable (Beagan et al., 2017; Chen et al., 2012; Martin et al., 2011; Maurano et al., 2015; Prickett et al., 2013; Wang et al., 2012). Cells originating from the same precursors tend to have a similar CTCF-binding landscape whereas cells from different lineages can have marked differences in CTCF occupancy (Prickett et al., 2013; Wang et al., 2012). DNA methylation can affect CTCF binding (Ayala-Ortega et al., 2016; Bell and Felsenfeld, 2000; Hark et al., 2000), possibly by regulating the affinity of CTCF for DNA (Hashimoto et al., 2017). However, the true extent to which DNA methylation directly affects CTCF binding is still controversial (Maurano et al., 2015).
Eukaryotic chromosomes are organized in the three-dimensional (3D) nuclear space and this folding is important for processes such as DNA replication, repair, recombination and transcription (Franke et al., 2016; Hnisz et al., 2016; Hu et al., 2015; Lupiáñez et al., 2015; Pope et al., 2014). Chromosomes occupy positions in the nucleus termed chromosome territories (Cremer et al., 2006; Stevens et al., 2017) and each chromosome can be further organized into interaction domains such as compartments, topologically associating domains (TADs) and loop domains (Ay et al., 2014; Crane et al., 2015; Dixon et al., 2012; Galazka et al., 2016; Hou et al., 2012; Hsieh et al., 2015; Jin et al., 2013; Liu et al., 2016; Mizuguchi et al., 2014; Nora et al., 2012; Rao et al., 2014; Vietri Rudan et al., 2015).
The 3D organization of the genome is the result of the interplay between the transcriptional state of genes and the activity of architectural proteins such as cohesin, YY1 and CTCF (see Fig. 2) (Haarhuis et al., 2017; Kubo et al., 2017 preprint; Nora et al., 2017; Rao et al., 2017; Rowley et al., 2017; Schwarzer et al., 2017; Weintraub et al., 2017). Recent reports have provided evidence that the genome is folded inside the nucleus by at least two independent mechanisms. One relies on cohesin and CTCF for the formation of chromatin loops by an extrusion process that stops once the cohesin complex encounters CTCF-bound sites arranged in a convergent or ‘head to head’ orientation (Fudenberg et al., 2016; Nichols and Corces, 2015; Sanborn et al., 2015). This model of CTCF loop formation has been tested experimentally and by computational modeling (Fudenberg et al., 2016; Goloborodko et al., 2016; Haarhuis et al., 2017; Nora et al., 2017; Sanborn et al., 2015; Schwarzer et al., 2017). The second mode of genome organization is independent of these proteins and reflects the tendency of chromatin regions to interact with regions of a similar transcriptional state and histone post-translational modifications (Rao et al., 2017; Rowley et al., 2017; Schwarzer et al., 2017).
Although this general role for CTCF in 3D chromatin organization has been studied in great detail (reviewed by Merkenschlager and Nora, 2016; Ong and Corces, 2014), the precise role played by CTCF during organismal development has remained poorly explored. Here, we review evidence linking CTCF to the control of developmental processes (summarized in Table 1). We first provide an overview of how CTCF functions to regulate genome 3D organization and how this role affects gene expression. We then discuss the roles of CTCF in the development and differentiation of various cell and tissue types, ranging from embryonic stem cells (ESCs) to neural, cardiac and muscle cells. We conclude with an integrative view of CTCF as an important determinant of cell lineage specification during vertebrate development.
Mechanisms of CTCF function
In mammals, most transcriptional activity occurs inside chromatin loops that are bound at their base by CTCF (‘CTCF loop anchors’, see Fig. 2). Sequences adjacent to such loop anchors are enriched in active histone modifications, RNA polymerase II (RNAPII), housekeeping genes and transcription start sites (Tang et al., 2015), suggesting that CTCF chromatin loops could represent topological structures within which transcription can take place. In support of this, functional experiments that remove or invert CTCF-binding sites result in changes in gene expression that can be direct or indirect, i.e. the disruption of a chromatin loop can cause changes in the transcriptional regulation of the genes contained inside the CTCF loop or affect the transcription of nearby genes originally located outside the loop (Dowen et al., 2014; Guo et al., 2015; Hanssen et al., 2017; Narendra et al., 2015). Alternatively, the disruption of a specific CTCF site may untether regulatory sequences from their target promoter, resulting in transcriptional downregulation. Therefore, the role of CTCF in transcription, either locally or genome wide, is complex, and it is difficult to predict a priori the effect of a particular CTCF genomic site on transcription or the subset of genes that can change their transcriptional status due to loss of the site (de Wit et al., 2015; Kubo et al., 2017 preprint; Nora et al., 2017).
The full spectrum of genes that can be transcriptionally regulated by CTCF also seems to depend on the cell type being analyzed. For example, inactivation of the Ctcf gene in primary mouse embryonic fibroblasts results in the mis-regulation of 698 genes (Busslinger et al., 2017), whereas similar experiments in post-mitotic embryonic and postnatal neurons results in changes in the expression of about 400 and 800 genes, respectively (Hirayama et al., 2012; Sams et al., 2016). Likewise, removal of CTCF protein in mouse ESCs results in loss of chromatin loops and changes in the expression of hundreds of genes (Nora et al., 2017), whereas experiments using a different mouse ESC cell line and a similar CTCF depletion approach (Kubo et al., 2017 preprint) found a smaller effect of CTCF depletion on transcription. It is surprising that, despite the presence of CTCF at thousands of genomic sites, complete depletion of this protein does not result in more dramatic changes in gene expression. This could be explained, in part, by the activity of additional architectural proteins such as YY1 in establishing long-range interactions between enhancers and promoters (Weintraub et al., 2017), such that, even in the absence of CTCF, many enhancer-promoter interactions remain and thus the expression of many genes is not affected. It would be of interest to deplete both YY1 and CTCF using the newly employed degron systems and to analyze the effect on genome organization and gene expression in the absence of both proteins. A second possibility is that CTCF and cohesin establish a basal topology of interactions between regulatory elements that is required for timely and dynamic control of gene transcription, e.g. during signal transduction (Oti et al., 2016; Wang et al., 2014). In this scenario, not all CTCF target genes may display a change in gene expression following CTCF depletion, and would only do so under specific cellular conditions. If this is the case, it would be interesting to investigate whether the disruption of CTCF chromatin loops impairs the transcriptional response of cells to specific stimuli. The study of CTCF during development thus offers a good starting point to understand how 3D genome organization integrates a plethora of stimuli that translate into the establishment of cell-specific transcription programs.
CTCF is required for early vertebrate development
In mice, depletion of CTCF from oocytes results in embryo lethality by the morula stage, whereas homozygous null mutant embryos fail to implant and die by the pre-implantation stage (E3.5) (Heath et al., 2008; Moore et al., 2012; Wan et al., 2008). In zebrafish, CTCF knockdown in the one-cell stage embryo results in lethality by 24 h post fertilization (Delgado-Olguín et al., 2011). In both organisms, lethality is accompanied by widespread apoptosis mediated by downregulation of p53 and upregulation of Puma (Bbc3) both of which are direct targets of CTCF (Gomes and Espinosa, 2010; Moore et al., 2012; Saldaña-Meyer et al., 2014). These observations provide strong evidence of the importance of CTCF for very early development.
CTCF function is also crucial during postnatal and adult development (Gregor et al., 2013; Hori et al., 2017; Kemp et al., 2014; Sams et al., 2016). In particular, studies in mice have demonstrated that postnatal development of an organism is sensitive to CTCF dose (Gregor et al., 2013; Kemp et al., 2014; Marshall et al., 2017). For example, while CTCF heterozygous knockout (KO) mice are viable, a halved dose of CTCF pre-disposes the mice to develop spontaneous tumors in tissues with high rates of cell proliferation. The molecular mechanisms behind this susceptibility are unclear, although it is known that heterozygous CTCF KO mice show aberrant DNA hypermethylation at specific loci (Kemp et al., 2014), which could, over the course of a lifetime, predispose certain tissues to uncontrolled proliferation by epigenetic silencing of tumor-suppressor genes. CTCF has also been shown to be important for adult cognition (Gregor et al., 2013; Hori et al., 2017). It would thus be of interest to analyze heterozygous CTCF KO mice for cognitive defects, as humans with heterozygous mutations in CTCF are known to display intellectual disability (Gregor et al., 2013; Hori et al., 2017; Sams et al., 2016).
CTCF sets a ground state of genome organization in embryonic stem cells that is crucial for development
Embryonic stem cells (ESCs) have the ability to self-renew and differentiate, and this capacity to either maintain the pluripotent state or start a differentiation program has been associated with a unique chromatin landscape that is greatly influenced by CTCF. The functional relevance of CTCF in ESC biology relies on its ability to sustain cell viability, cell proliferation and the pluripotent state by regulating the expression of multiple genes (Balakrishnan et al., 2012; Dowen et al., 2014; Handoko et al., 2011; Ji et al., 2016; Nora et al., 2017; Phillips-Cremins et al., 2013; Tee and Reinberg, 2014). In line with this, it has been shown that CTCF binds 40,000-80,000 sites genome wide in mouse and human ESCs, with around 10% of them localizing to promoter regions and at least 7000 directly involved in chromatin looping interactions (Handoko et al., 2011; Ji et al., 2016; Teif et al., 2014). In mESCs and hESCs, pluripotency genes such as Oct4 and Nanog, as well as lineage-specifying genes, are located inside chromatin loops termed insulated neighborhoods that are anchored by CTCF and cohesin (Dowen et al., 2014; Ji et al., 2016). This type of chromatin loop physically insulates genes and their regulatory sequences from the rest of the genome. For example, the miR-290-295 gene cluster encodes a miRNA population important for ESC maintenance and survival (Kaspi et al., 2013; Lichner et al., 2011; Zheng et al., 2011). This cluster of miRNA genes and a super-enhancer are located inside a chromatin loop anchored by CTCF at both sides (Fig. 3A). The removal of a CTCF-binding site at one of the loop anchors by CRISPR-Cas9 results in a decrease in pri-miR transcripts, downregulation of endodermal markers and upregulation of Pax6, a target of mirR-290-295 involved in ectodermal differentiation (Dowen et al., 2014; Kaspi et al., 2013). In addition, the promoter region of Nlpr2, a gene originally positioned outside the chromatin loop, engages in long-range interactions with the super-enhancer. These interactions take place between a CTCF site located upstream of Nlpr2 and a CTCF site next to the super-enhancer, resulting in transcriptional activation of the gene. In this scenario, the miR-290-295 locus would be still located inside a larger loop, but the Nlrp2 promoter will compete for the super-enhancer, thereby reducing miR-290-295 transcription. This example highlights two important effects of perturbing CTCF binding. First, the removal of a single CTCF site at a chromatin loop anchor can perturb gene expression, in this case leading to premature expression of differentiation genes. Second, the nature of the dysregulated gene can amplify the transcriptional defects associated with the removal of a CTCF binding site, as in this case where CTCF controls the expression of miRNAs that, in turn, regulate the levels of a cell type-specific transcription factor (Ayala-Ortega et al., 2016; Dowen et al., 2014). Insulated neighborhoods can also promote the maintenance of transcriptional repression (Fig. 3B). For example, the removal of one CTCF anchor from a loop containing Polycomb-repressed genes is sufficient to induce aberrant gene expression (Dowen et al., 2014).
Genome-wide depletion of CTCF from mESCs results in loss of chromatin loops and changes in gene expression of 400 and 5000 genes after 24 and 96 h of depletion, respectively (Nora et al., 2017). Early downregulated genes display a preferential enrichment for CTCF binding 60 bp upstream of their transcription start site (TSS) and, intriguingly, CTCF motif orientation is concordant with the direction of transcription, a trend that is also observed in human cell lines (Tang et al., 2015). In contrast, 80% of upregulated genes in CTCF depleted mESCs do not have CTCF at their promoter region and may be stimulated by enhancers located in adjacent loop domains normally separated by CTCF (Nora et al., 2017). Restoration of CTCF levels after acute removal rescues transcription defects in downregulated genes but not in upregulated genes, with some of them showing a constant increase in transcription (Nora et al., 2017). This is reminiscent of a positive-feedback mechanism that could have crucial consequences in early embryonic development.
CTCF can also influence gene expression in ESCs by regulating the transcription of epigenetic factors such as WDR5, which is a core member of the MLL complex. In ESCs, CTCF occupies the Wdr5 promoter and is necessary for its transcription. Overexpression of WDR5 is able to rescue the proliferative defects seen in CTCF-deficient ESCs, suggesting an important role for CTCF in ESC maintenance via direct regulation of TrxG components (Wang et al., 2017). CTCF also interacts with chromatin remodeling complexes such as NURF, with transcription factors like Oct4, and with the transcription initiator factor TFIID subunit 3 (TAF3) (Donohoe et al., 2009; Liu et al., 2011; Qiu et al., 2015) in ESCs, and perturbing these associations can have important functional consequences (Donohoe et al., 2009; Liu et al., 2011; Qiu et al., 2015). The interaction of CTCF with TAF3 is interesting because, despite being a general transcription factor, TAF3 regulates lineage commitment in ESCs and co-occupies distal promoter regions bound by CTCF, with CTCF being necessary for TAF3 recruitment.
CTCF plays a central role in brain development and neural function
CTCF levels vary during brain development, with the highest level of expression occurring in the embryonic brain and decreasing from birth to adulthood (Beagan et al., 2017; Sams et al., 2016). CTCF protein levels in adults are higher in the amygdala, hippocampus, cerebellum and cortex than in other organs such as the kidney and heart (Sams et al., 2016). In addition, CTCF is highly enriched in neurons but not in astrocytes and oligodendrocytes, suggesting a specialized role for CTCF in neuronal physiology in adult organisms. Furthermore, analyses of the genome-wide occupancy of CTCF in the mouse brain reveal a high number of brain-specific, CTCF-bound sites that could be involved in regulating the expression of neural genes (Prickett et al., 2013). These data suggest a potential function for CTCF in brain development but, importantly, also hint at the existence of a regulatory program to control the expression of CTCF during development. Removal of the Ctcf gene from neural cells during either embryonic or postnatal development results in lethality (Hirayama et al., 2012; Sams et al., 2016; Watson et al., 2014), while depletion of CTCF from telencephalic tissue at E8.5 results in microcephaly, apoptosis by upregulation of Puma, cell loss, and embryonic lethality. The specific inactivation of CTCF in neural precursor cells (NPCs) at E11 also induces massive apoptosis, hypocellularity and lethality at birth by asphyxiation (Watson et al., 2014). In addition, CTCF regulates the pool of NPCs by restricting premature neurogenesis. These data suggest that CTCF is required for the survival, proliferation and controlled differentiation of NPCs during early embryonic development.
CTCF is also important for the survival of post-mitotic neurons (Hirayama et al., 2012; Sams et al., 2016). The conditional removal of CTCF from post-mitotic cortical and hippocampal neurons in mice at E11.5 results in growth retardation, abnormal behavior and reduced lifespan, with mice dying by 5 weeks after birth (Hirayama et al., 2012). In this context, CTCF was found to be important for synapse formation and dendrite development during postnatal development, as mutant neurons in the cortex and hippocampus show a dramatic decrease in dendritic length and arborization, as well as dendritic spine formation. This results in decreased ability to form an appropriate number of synapses, which could affect learning and memory. These defects are evident at postnatal day 14 (P14) but not at P7, suggesting that the effect of CTCF on neuron morphology may be restricted to a particular window of adult development. CTCF also contributes to cognitive functions in adult mice. Specific removal of CTCF from postnatal excitatory neurons results in profound defects in memory and learning, as well as reduced life span, with mutant embryos dying by 14 weeks of age (Sams et al., 2016). In this case, only subtle defects in spine density are observed by 10 weeks of age, in contrast to the strong defects described when removing CTCF from postmitotic neurons during embryonic development (Hirayama et al., 2012). These phenotypic differences could be the result of a specific regulatory program directed by CTCF at different developmental time points during neuronal maturation, suggesting that the effect of CTCF on cell physiology is highly dependent on the developmental stage and, possibly, on the chromatin organization of its target genes.
In line with these phenotypes, it has been shown that CTCF controls the expression of several genes that are crucial for correct neural development and function (Hirayama et al., 2012; Sams et al., 2016). For example, CTCF depletion from post-mitotic cells, either during embryonic development or in postnatal neurons, results in profound downregulation of the stochastically expressed isoforms of the protocadherin genes (Pcdh), which are essential for the generation of neural diversity (Hirayama and Yagi, 2013). Remarkably, the dendritic arborization defects observed in CTCF mutant neurons are similar to those observed following the knockout of Pcdh genes (Garrett et al., 2012), further suggesting a functional link between CTCF and Pcdh gene expression. Studies in neural cell lines and mouse brain tissue suggest that the molecular mechanisms underlying this link reside mainly in the ability of CTCF to mediate looping interactions along the Pcdh locus (Guo et al., 2012, 2015; Monahan et al., 2012). The Pcdh locus contains three clusters of genes termed Pcdha, Pcdhb and Pcdhg, each coding for variable exons. Each of the variable exons has an upstream promoter sequence that binds CTCF, while enhancer elements located along the locus direct the expression of Pcdh genes specifically in neural cells. The binding of CTCF to the enhancer elements in neural cells promotes long-range chromatin looping interactions with the promoter sequences of variable exons in an orientation-dependent manner, resulting in stochastic expression of Pcdh genes (Guo et al., 2015). Accordingly, inverting the orientation of one CTCF site located at an enhancer element disrupts long-range interactions with target promoters and results in loss of expression of stochastically transcribed isoforms (Guo et al., 2015). However, it is possible that CTCF affects the transcription of Pcdh genes indirectly. Indeed, it is known that CTCF physically interacts with mLLP, which is a permeable nuclear protein that regulates dendritic and spine growth, to promote the expression of Pcdh genes and the amyloid β precursor (App) gene (Yu et al., 2016).
CTCF is also essential for the transcription of learning-inducible genes such as Arc and Bdnf, as well as for the repression of Hdac3 and Ppc1, which are known suppressors of memory formation (Sams et al., 2016). In the case of the Arc and Bdnf genes, CTCF may promote long-range interactions between their promoters and potential enhancer sequences, as absence of this protein results in loss of chromatin contacts of the Arc and Bdnf promoters with regions enriched for RNAPII in hippocampal neurons. CTCF is also important for brain development in humans (Gregor et al., 2013). For example, individuals with monogenic mutations in CTCF that reduce transcript levels show profound intellectual disability, microcephaly, and feeding and heart defects, which resemble the phenotypes observed in brain-specific Ctcf mutant mice (Gregor et al., 2013; Hirayama et al., 2012). CTCF has also been shown to regulate the expression of genes implicated in neurodegenerative diseases, such as Huntington's disease (De Souza et al., 2016) and Spinocerebellar ataxia type 7 (Sopher et al., 2011), psychiatric diseases such as schizophrenia (Juraeva et al., 2014), and neurodevelopmental disorders such as autism (Meguro-Horike et al., 2011). CTCF could also be indirectly involved in mediating the defects in genome organization and gene expression observed in α-thalassemia X-linked intellectual disability (ATRX) and Rett syndromes (Kernohan et al., 2014).
CTCF is involved in cardiovascular development
The mammalian heart is the first organ to function in the embryo and its development involves complex regulatory processes that instruct the differentiation and assembly of cells (Bruneau, 2013; Delgado-Olguín et al., 2012). A recent study has revealed that CTCF plays key roles during this process. The conditional removal of CTCF from cardiac progenitors in mouse embryos results in embryonic lethality by E12.5 (Gomez-Velazquez et al., 2017). Hearts from mutant animals display abnormalities that are not due to defects in cell proliferation or survival but instead arise due to defective differentiation and maturation, probably because of the downregulation of multiple cardiac transcription factors as well as changes in chromatin long-range interactions (Gomez-Velazquez et al., 2017). For example, loss of expression of Irx4 is accompanied by changes in the chromatin interaction landscape of its promoter region. In addition, key genes whose products participate in important signaling pathways such as BMP, Notch, EGF and TGFβ, also show changes in expression.
CTCF is also important for the development of the vasculature. Genetic inactivation of CTCF in endothelial progenitor cells results in embryonic lethality by E10.5, whereas heterozygous mice display no obvious defects (Roy, 2016). Mutant mice have narrow cerebral vessels and defects in the yolk sac and placenta vasculature. Importantly, vascular defects are not due to problems with cell proliferation or survival but may be attributable to the upregulation of vascular regulatory genes such as Kdr, Vegfa and Erg (Roy, 2016). In line with these findings, it has been demonstrated that CTCF binds to a chromatin insulator sequence located in proximity to the Vegf promoter region and insulates Vegf from the stimulatory effect of nearby enhancers, restraining its angiogenic potential (Tang et al., 2011). In fact, CTCF deficiency leads to increased and uncontrolled angiogenesis in vitro and in vivo due to increased expression of VEGF (Tang et al., 2011). This observation has significant implications for understanding cancer because solid tumors are typically highly vascularized. Furthermore, an important observation stemming from both the heart and vascular phenotypes is that defects are not due to loss of proliferation or survival, which are features of other CTCF knockouts in structures such as the brain and limbs. Thus, it may be possible that CTCF in cardiac and endothelial progenitor cells is important for the control of cell type-specific transcriptional programs, acting either via the regulation of master transcription factors or by modulating signaling pathways.
CTCF promotes limb development
Limb development is accompanied by changes in gene expression that are tissue and time specific (Andrey et al., 2017), and defects in this spatio-temporal regulation of gene expression can cause limb malformations (Franke et al., 2016; Lupiáñez et al., 2015; Symmons et al., 2016; Will et al., 2017). During mouse limb development, promoters of genes involved in limb development engage in static and stage-specific contacts with enhancer sequences that are enriched for the binding of CTCF and cohesin, suggesting that CTCF could have a role in the establishment of a regulatory landscape upon which fine control of gene expression can take place (Andrey et al., 2017). For example, conditional removal of CTCF from developing mouse limbs results in limb truncation with massive apoptosis and changes in the expression of hundreds of genes, including Shh (Soshnikova et al., 2010). Some of the changes in gene expression upon CTCF removal could be attributable to defects in CTCF-mediated looping interactions. This has been shown for the Wnt6/Ihh, Epha4 and Pax3 genes in mouse and human, which are located consecutively in the linear genome but are arranged into different chromatin loop domains (Lupiáñez et al., 2015). Epha4, for example, is located inside a CTCF loop, and long-range interactions between enhancer sequences and the Epha4 promoter are insulated from neighboring domains by CTCF. Structural rearrangements encompassing the CTCF boundaries of the Epha4 domain result in the gain of ectopic interactions between genes in the adjacent domains and the Epha4 enhancers, which results in ectopic transcriptional activation of neighboring genes and limb malformations (Lupiáñez et al., 2015). This work has provided compelling evidence for how the integrity of CTCF loop domains provides a framework for the establishment of specific regulatory interactions and how their disruption results in critical consequences for development.
CTCF is important for the regulation of Hox genes
Hox genes encode transcription factors that direct anterior-posterior patterning in all bilaterians as well as the formation of secondary structures such as limbs and external genitalia (Lonfat and Duboule, 2015; Mallo et al., 2010). Hox genes often exist in gene clusters, and their order of expression along the body axis correlates with their position within the cluster, with the 3′ genes expressed more anteriorly and the 5′ genes expressed in the posterior part of the embryo later in development (Mallo et al., 2010). The presence of CTCF at several sites along the Hox loci, both in D. melanogaster and mammals (Holohan et al., 2007; Narendra et al., 2015; Soshnikova et al., 2010), as well as its ability to mediate enhancer-promoter interactions, suggests that it could contribute to the temporal control of Hox gene expression by establishing chromatin domains. Indeed, a number of studies have now shown that CTCF regulates Hox gene expression in various contexts.
In Drosophila, CTCF depletion early in development results in a homeotic phenotype and decreased expression of the homeotic gene Abd-B (Mohan et al., 2007). The conditional removal of CTCF from developing mouse limbs results in changes in the expression of some posterior HoxD genes but no activation of anterior ones (Soshnikova et al., 2010). In this context, it has recently been shown that the HoxD cluster, which is rich in CTCF binding sites, acts as a dynamic border between two topological domains that contain enhancer sequences controlling the spatiotemporal transcription of the HoxD genes (Rodríguez-Carballo et al., 2017). The precise location of this border seems to be dependent on the cell type or tissue under analysis, and ultimately on the expression of specific HoxD genes. Removal of the whole-gene cluster, which encompasses most of the CTCF-binding sites, is necessary to abrogate the border and to merge the neighboring domains (Rodríguez-Carballo et al., 2017). This means that the CTCF sites located in the HoxD cluster are capable of promoting long-range contacts with convergent CTCF sites at enhancer sequences in neighboring domains. Upon removal of most of the CTCF sites, the two domains outside of the HoxD cluster merge together.
The targeted removal of specific CTCF-binding sites along the HoxA locus has clarified the role of individual binding sites for this protein in the transcriptional regulation of Hox genes (Narendra et al., 2015). ESC-derived motor neurons exposed to retinoic acid (RA) express the rostral part of the HoxA cluster (Hoxa1-Hoxa6) while Hoxa7-Hoxa13 remain repressed. The transition between active and inactive domains is localized between the Hoxa5 to Hoxa7 genes, where two highly conserved and constitutive CTCF-binding sites are located. One of these sites is located at the intergenic region between transcriptionally active Hoxa5 and Hoxa6 (C5|6), and the other is between the active Hoxa6 and the inactive Hoxa7 (C6|7). Importantly, CTCF bound at the C5|6 site establishes at least three strong, long-range interactions with CTCF sites outside the 3′ side of the cluster in a convergent orientation (Narendra et al., 2016). Removal of 9 bp within the core motif of the C5|6 site results in loss of CTCF binding, and strong upregulation of Hoxa7 and, to a lesser extent, of Hoxa9. These transcriptional effects are also accompanied by gain of H3K4me3 at both genes and the establishment of new long-range interactions with the active side of the domain (Hoxa1-Hoxa6) (Narendra et al., 2015). This effect was also recapitulated in the HoxC locus, causing homeotic transformations typical of Hox deregulation in mice (Narendra et al., 2016). The effects observed upon removal of CTCF-binding sites could be attributed to the formation of new long-range interactions between convergent CTCF sites that may bring enhancers located outside the HoxA domain (Langston et al., 1997; Woltering et al., 2014) into close proximity to promoters of the HoxA genes, limiting stimulatory activity to the Hox genes located inside the new CTCF loop. Therefore, in response to developmental signals, a single CTCF-binding site may form long-range interactions that help to establish domains of active transcription.
The association of CTCF with sites within a Hox locus can also be dynamic and modulated by epigenetic mechanisms, such as DNA methylation, or by RA signaling (Ishihara et al., 2016; Min et al., 2016). In mouse embryonic fibroblasts, the expression of both the anterior and posterior HoxC genes correlates with the binding of CTCF between the Hoxc12 and Hoxc11 genes (C12|11). This CTCF-binding site is methylation sensitive and in a hypomethylated state is permissive for CTCF association, resulting in long-range interactions between the posterior and the anterior parts of the locus, and the expression of posterior HoxC genes (Min et al., 2016). In a second example, the differentiation of human NT2/D1 cells to neural cells by treatment with RA was shown to induce the expression of anterior Hoxa1-Hoxa5 genes and the eviction of CTCF from a binding site located between the Hoxa5 and Hoxa4 genes; this CTCF site has a RA response element that may compete with CTCF for binding and may function as an enhancer blocker (Ishihara et al., 2016).
Together, these findings highlight the contribution of CTCF to the proper expression of Hox genes during development by promoting long-range contacts with enhancer sequences that stimulate the transcription of target Hox genes. At the same time, CTCF-mediated loops can insulate enhancer elements from inducing the transcription of Hox genes that have to remain inactive in a particular cell type.
The role of CTCF in regulating left-right asymmetry during organ formation
Pitx2 is a transcription factor that directs the transcriptional programs responsible for the left-right asymmetry of internal organs (Shiratori et al., 2001). In the dorsal mesentery of chicken and mouse embryos, Pitx2 regulates the molecular pathways required for looping and vascularization of the gut (Mahadevan et al., 2014). A recent study has shown that Pitx2 is transcribed in the left dorsal mesentery and its transcriptional repressor, the lncRNA Playrr, is repressed, while both loci are in close spatial proximity (Welsh et al., 2015). In contrast, in the right dorsal mesentery, Pitx2 and Playrr are no longer in spatial proximity and Playrr is expressed while Pitx2 is repressed. Furthermore, Hi-C data from mESCs, which mirror the pattern of gene expression seen in the left dorsal mesentery, suggest that Pitx2 and Playrr are located inside a TAD (see Box 2) but exist within adjacent sub-TADs; CTCF demarcates the borders of both sub-TADs (Welsh et al., 2015), suggesting that these sub-TADs are equivalent to the CTCF loop domains defined previously (Rao et al., 2014). Prominent CTCF-binding sites are located at the Pitx2 locus and upstream of Playrr, and knockdown of CTCF results in loss of spatial proximity between these loci, giving rise to a configuration similar to the one observed in the right dorsal mesentery where Pitx2 is inactive (Welsh et al., 2015). It is not known whether the loss of proximity after CTCF removal results in Playrr transcriptional activation and Pitx2 repression, but two observations are worth pointing out. First, the Pitx2 protein is important for the spatial proximity between Pitx2 and Playrr. Second, Pitx2 protein is present at the anchors of CTCF loop domains for both Pitx2 and Playrr. This is supported by observations of genome-wide association between Pitx2 and CTCF based on the finding of a Pitx2-binding motif at CTCF-binding sites (Chen et al., 2012). Based on these observations, one could speculate that CTCF establishes interactions in the left dorsal mesentery that are further stabilized by Pitx2 by its co-binding to CTCF sites. The loss of either protein thus results in loss of the 3D organization of the Pitx2-Playrr locus. As the function of Pitx2 in the control of organ asymmetry is conserved in non-bilaterian organisms, and given that CTCF is a bilaterian innovation (Heger et al., 2012; Watanabe et al., 2014), it is possible that the CTCF-dependent regulation of Pitx2 may be a bilaterian-specific strategy.
The regulation of myogenesis by CTCF
Myogenesis is regulated by the master transcription factor MyoD and additional muscle-specific transcription factors that promote a muscle-specific gene expression program (Tapscott, 2005; Weintraub et al., 1989). CTCF physically interacts with MyoD, an interaction that increases MyoD affinity for some target promoters by an as yet unknown mechanism (Delgado-Olguín et al., 2011). Interestingly, muscle differentiation is also accompanied by the upregulation of CTCF, although the role of this is still unclear (Delgado-Olguín et al., 2011). The functional relevance of CTCF in muscle development is evident in zebrafish, where CTCF is enriched in somites – the tissues in which myogenic precursors are determined. CTCF depletion in developing zebrafish embryos induces somite disorganization, loss of myogenic markers and reduced muscle fibers, as well as changes in the expression of genes related to muscle development and hematopoiesis (Delgado-Olguín et al., 2011). In line with this, CTCF knockdown in the mouse myoblast cell line C2C12 impairs myogenic differentiation and results in transcriptional downregulation of the myogenin gene (Battistelli et al., 2014; Delgado-Olguín et al., 2011), again suggesting a role for CTCF in the transcriptional activation of muscle-specific genes.
CTCF can also induce chromatin looping interactions that insulate genes that are important for the myogenic differentiation program. For example, it has been shown that the promoter of the Cdk inhibitor p57 (Cdkn1c) physically interacts with the imprinting control region KvDMR1, which is located more than 150 kb away. This looping interaction is important for p57 repression prior to differentiation and depends on the binding of CTCF and Rad21 to both regulatory elements (Battistelli et al., 2014). During skeletal muscle differentiation, p57 is induced and this correlates with disruption of the chromatin loop, loss of Rad21 and the binding of MyoD to the KvDMR1 element (Battistelli et al., 2014; Busanello et al., 2012). In this context, it has been suggested that MyoD binding causes the displacement of cohesin, possibly by physically interacting with CTCF, which induces destabilization of the chromatin loop and allows the expression of p57. Importantly, CTCF binding remains unchanged at interacting regions. Instead, the dynamic association of Rad21 and MyoD determines whether a chromatin loop is formed (Battistelli et al., 2014). Therefore, CTCF shows a complex regulatory behavior during myogenesis by promoting the expression of some muscle-specific genes or by establishing a chromatin loop that constrains the expression of other genes.
CTCF is involved in retinal cell differentiation
Several transcription factors direct the development of the retina. Of these, Pax6 has emerged as a critical regulator that is modulated by CTCF. Pax6 and CTCF display non overlapping patterns of expression during retinal development in chick embryos, suggesting antagonistic roles for these proteins during eye formation (Canto-Soler et al., 2008). Indeed, CTCF overexpression in mouse embryos results in under-developed eyes, small lenses and reduced populations of cells in the retina, lens and cornea; this phenotype is similar to the one observed in Pax6 mutants (Hill et al., 1992; Li et al., 2004). These CTCF-dependent defects are accompanied by a marked decrease in Pax6 expression, and studies have indeed demonstrated that CTCF binds to a sequence located upstream of the Pax6 promoter and downstream of an enhancer element that is active in eye-derived cells (Li et al., 2004). Therefore, CTCF could act as an enhancer blocker in the Pax6 regulatory region. The developmentally restricted expression patterns of both CTCF and Pax6 in the developing eye suggest that additional signals may affect the expression of these genes, although their nature is not known. Activation of the EGF pathway in rabbit corneal epithelial cells induces CTCF upregulation, which results in repression of Pax6 and its target genes, as well as enhanced cell proliferation (Li and Lu, 2005; Tsui et al., 2016), but whether a similar mechanism exists in other species is unclear.
CTCF contributes to the generation of immune cell diversity
One of the most remarkable properties of the immune system is its ability to differentiate between self and foreign antigens, with T lymphocytes (T-cells) playing a major role in this event. During normal development, T-cells are exposed to a vast number of tissue-restricted antigens that are expressed, processed and presented on the surface of medullary thymic epithelial cells (mTECs) in the thymus, thereby allowing the negative selection of these cells and preventing organ autoimmunity. The expression of these tissue-restricted antigens is regulated by the autoimmune regulator AIRE, which is a transcription factor largely responsible for inducing the expression of around 4000 genes specifically in mTECs (Sansom et al., 2014). In various mouse cell types, the AIRE locus is flanked on each side by two CTCF sites, suggesting that CTCF binding at this locus is relatively cell type invariant (Herzig et al., 2017). However, in mTECs, CTCF is evicted from the AIRE TSS in parallel with the binding of cell type-specific transcription factors to the promoter region and an upstream enhancer, suggesting that the presence of CTCF correlates with AIRE repression. In line with this, it has been shown that CTCF knockdown experiments result in AIRE transcription (Herzig et al., 2017). Whether AIRE is located inside a chromatin loop formed by the two flanking CTCF-binding sites remains to be determined, but overall these data suggest that CTCF flanking the AIRE locus is important either for transcriptional repression of AIRE or to insulate the AIRE promoter from active nearby enhancers that could activate AIRE transcription in cell types other than mTECs. The role of CTCF in autoimmunity is further supported by its role in regulating the expression of major histocompatibility complex II (MHC-II), both in humans and mice (Majumder and Boss, 2011; Majumder et al., 2008, 2014).
CTCF, via its regulation of immunoglobulins and T-cell receptor genes, is also crucial for the recognition of foreign antigens. A fundamental feature of the adaptive immune response is the specific recognition of a foreign antigen, which is achieved by B and T cells expressing specific receptors. These receptors originate from seven antigen receptor genes, four T-cell receptor (TCR) genes (Tcrg, Tcrd, Tcrb and Tcra) and three B-cell immunoglobulin genes (Igh, Igk and Igl). During development, B and T cells make use of these genes to generate an almost infinite variety of specific receptors that are created by the somatic rearrangement of consecutive variable (V), diversity (D) and joining (J) coding gene segments at each locus, in a process known as V(D)J recombination (Proudhon et al., 2015). This process is developmentally regulated, highly ordered and, as indicated by recent studies, involves CTCF. For example, CTCF binds to more than one hundred sites in the VH region of the mouse Igh locus (Choi et al., 2013); the distribution of CTCF at this locus is B-cell lineage-specific and it remains stable during B-cell maturation. Importantly, the depletion of CTCF results in loss of locus compaction, which in turn affects the proximity of the VH gene segment to the DJH gene segment, and biases the V(D)J recombination processes (Gerasimova et al., 2015).
The binding of CTCF to specific regions of the Igh locus has also been shown to be important for locus-specific 3D organization. The intergenic control region 1 (IGCR1) located in the VH-DH interval of the Igh locus contains two binding sites for CTCF. The binding of CTCF to these sites mediates chromatin looping interactions between regions of the VH domain with IGCR1, as well as interactions between IGCR1 and the 3′ end of the Igh locus. Accordingly, the removal of these CTCF-binding sites results in disruption of ordered and lineage-specific VH-to-DJH joining, with each of the two CTCF-binding sites contributing in a differential way to the control of V(D)J recombination (Guo et al., 2011; Lin et al., 2015). CTCF is also important for constraining V(D)J recombination to specific loci. This recombination process is known to be dependent on the activity of lymphoid specific recombinases (RAG1/2). Although these RAGs recognize sequence motifs present along the genome, their activity becomes restricted to the immunoglobulin and TCR loci. This coordinated and on-target activity of RAGs is restricted by chromatin loops flanked by convergent CTCF-binding sites. Notably, IGCR1 is a crucial element for folding the locus in at least two chromosomal domains where regulated recombination occurs (Hu et al., 2015).
CTCF also mediates inter-chromosomal interactions between genes that are important for T-cell development (Kim et al., 2014). In naïve T cells, the locus control region (LCR) of the Th2 cytokine gene on chromosome 11 physically interacts with the promoter of the Il17a gene on chromosome 1. This interaction modulates the transcription of Il17a and is dependent on the association of CTCF and the transcription factor Oct1 with the regulatory elements of both loci (Kim et al., 2014). Interestingly, CTCF and Oct1 physically interact, and the binding of both proteins to the Th2 and Il17a genes is lost during differentiation into Th2 or Th17 cells, which also correlates with the loss of inter-chromosomal interactions between Th2 and IL17. Despite the presence of binding sites for both CTCF and Oct1 at the Th2 and IL17 loci, knockdown of either one affects the binding of the other at both loci, suggesting that Oct1- and CTCF-dependent inter-chromosomal interactions are contingent on mutual recruitment, which could enhance the physical interaction between co-bound loci located in different chromosomes.
CTCF is involved in the development of gametes
In mouse oocytes, CTCF depletion causes changes in gene expression, mild meiotic defects and embryonic lethality (Wan et al., 2008), probably owing to loss of the CTCF maternal contribution. Likewise, the conditional inactivation of CTCF in mouse spermatocytes results in smaller testes and infertility (Hernández-Hernández et al., 2016). Mutant spermatids, which are eliminated through apoptosis, show defects in nuclear morphology, widespread changes in gene expression, and protrusion of de-compacted chromatin. In line with this, CTCF inactivation results in dramatic defects in histone retention and a profound reduction of the protamine PRM1 (Cho et al., 2001). These data suggest that CTCF is important for chromatin compaction during spermiogenesis, in part by regulating PRM1 protein levels. Remarkably, Hi-C analysis has shown that, despite the high level of chromatin compaction, compartmental and CTCF loop domains are clearly present in mature mouse sperm (Jung et al., 2017). In line with this, genome-wide mapping of CTCF occupancy in mature sperm has shown that CTCF sites present at loop anchors also contain cohesin and are preferentially arranged in a convergent orientation, a finding that in turn confirms that sperm chromatin follows the same rules of 3D organization as diploid somatic cells. Interestingly, chromatin 3D organization is highly similar between sperm and ESCs in terms of compartmental and CTCF loop domains (Jung et al., 2017). This observation, in conjunction with the shared CTCF-binding sites between sperm and ESCs, as well as the identification of enhancers and super-enhancers in sperm that are also present in ESCs, suggest a role for the paternal chromosome in instructing early zygotic transcription (Teperek et al., 2016).
Conclusions and perspectives
Here, we have presented evidence supporting a role for CTCF in cell differentiation and, hence, development, highlighting a crucial function of individual CTCF-binding sites in the developmental control of gene expression (Fig. 4). Together, these studies reveal that the removal or inversion of specific CTCF sites disrupts normal long-range chromatin interactions, in some cases directly affecting the contacts between enhancers and promoters. In other cases, the phenotypic effect of disruption of CTCF sites is a manifestation of the insulating properties of CTCF. Genes located inside a CTCF loop may not change their expression after deletion of a CTCF anchor, but new looping interactions with neighboring CTCF sites result in stimulation of adjacent genes located outside the original loop. These observations suggest that, despite the recruitment of cell type-specific transcription factors to enhancer sequences, these factors can stimulate the transcription of non-cognate promoters due to their inclusion in the new CTCF loop, suggesting that the specificity of an enhancer element might be dictated by the topological constraints imposed by the chromatin looping mediated by CTCF and cohesin. In these cases, the phenotypic consequences of changes in chromatin topology will depend on the nature of the affected gene. CTCF-binding sites are therefore not all equivalent at the functional level, i.e. the transcriptional consequences of the removal of a CTCF binding site at a particular locus are context dependent.
Developmental signals can also instruct a CTCF site to regulate gene expression by means of establishing functional interactions with either transcription factors or co-factors. The physical interaction of CTCF with cell type-specific transcription factors such as Oct1, Oct4, MyoD and LDB1, or its association with effectors of major signaling factors such as RBPJ or Smad proteins, suggests that a ubiquitous DNA-binding protein such as CTCF can diversify its regulatory potential by cooperating with different regulatory proteins. In this regard, an intriguing observation is the frequent physical association of cell type-specific transcription factors or co-factors with the zinc-finger domain of CTCF, which could potentially modulate CTCF binding but also could promote CTCF recruitment to additional sites or even modify cohesin dynamics at that site. Fine mapping of such physical interactions, coupled with functional assays in which critical residues involved in CTCF interactions with particular proteins are mutated, could reveal the functional significance of these physical interactions.
Based on the findings presented here, we propose that the progressive gain of genome organization during early development is accompanied by the binding of CTCF to the genome, which, together with cohesin and other architectural proteins such as YY1, can result in the establishment of loop domains. During differentiation, the CTCF-binding landscape is dynamic, with some binding sites present in most cell types while others are occupied in several lineages and yet others can be found in only a specific cell type. Importantly, it appears to be not only the presence but also the levels of CTCF at a specific site that determine the frequency of interactions and, therefore, its effect on gene expression. Ubiquitous CTCF-bound sites might represent central locations for genome organization by chromatin looping that affect the expression of housekeeping genes. Some of these stable loops may form a topological regulatory framework upon which rapid control of gene expression can take place, e.g. via the activation of transcriptionally poised or inactive genes. CTCF-binding sites may also mediate the formation of cell type-specific long-range interactions that regulate the transcription of specific genes. In this scenario, CTCF in association with specific transcription factors and in response to post-translational modifications can mediate looping among newly active enhancer sequences. Thus, by mediating long-range interactions, CTCF can establish loops that can insulate a regulatory sequence from nearby non-target genes and it can promote transcription by shortening the distance separating enhancers and promoters. The binding of CTCF to some sites close to promoters can also protect against DNA methylation, therefore protecting the associated gene from transcriptional repression.
Overall, these findings suggest that the transcriptional outcomes of removing CTCF will be highly dependent on context, on its relative location within the 1D genome and on the underlying topology mediated by this protein. For example, in a particular context, CTCF removal may immediately affect genes for which CTCF directly promotes transcription. By contrast, the removal of CTCF from loops containing transcriptionally poised genes in a particular cell type may not result in transcriptional activation because of the absence of a signal that is able to promote transcription in that cell type. However, the presence of an active enhancer sequence nearby could result in activation of transcription. Furthermore, as CTCF binding may signal cohesin to stop at specific sites during loop extrusion, the loss of CTCF could result in cohesin failing to stop at the sites originally occupied by CTCF, and instead continuing to extrude until a loop is formed at a new anchor. In this scenario, regulatory sequences such as enhancers may not be insulated by the original loop, resulting in transcriptional changes. Consequently, the effect of CTCF on gene expression programs will depend on the type of affected genes.
Results from several avenues of research have greatly informed our understating of the mechanisms by which CTCF exerts its regulatory functions and have highlighted how this protein can regulate a variety of cellular processes. However, we still have much to learn. The precise mechanisms by which CTCF integrates cell signals, for example, is still unclear. In addition, the modes by which CTCF regulates the expression of specific genes during differentiation are complex, and it is currently unknown how the interaction of CTCF with cell type-specific transcription factors or other architectural proteins impacts gene expression and genome organization. Detailed knowledge of these functional properties of CTCF will no doubt shed light on the processes controlling gene expression during development at the level of cell types, tissues and the whole organism.
Acknowledgements
The content of this Review is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Footnotes
Funding
Work in the authors' laboratories is supported by a US Public Health Service Award (R01 GM035463) from the National Institutes of Health to V.G.C., by Dirección General de Asuntos del Personal Académico, Universidad Nacional Autónoma de México and Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica (IN201114 and IN203917), and by Consejo Nacional de Ciencia y Tecnología (220503) and Fronteras de la Ciencia 2015-290 to F.R.T. R.G.A.M. is a doctoral student from Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México and received fellowships 288814 and 25590 from Consejo Nacional de Ciencia y Tecnología. Deposited in PMC for release after 12 months.
References
Competing interests
The authors declare no competing or financial interests.