Human development is regulated by spatiotemporally restricted molecular programmes and is pertinent to many areas of basic biology and human medicine, such as stem cell biology, reproductive medicine and childhood cancer. Mapping human development has presented significant technological, logistical and ethical challenges. The availability of established human developmental biorepositories and the advent of cutting-edge single-cell technologies provide new opportunities to study human development. Here, we present a working framework for the establishment of a human developmental cell atlas exploiting single-cell genomics and spatial analysis. We discuss how the development atlas will benefit the scientific and clinical communities to advance our understanding of basic biology, health and disease.
Understanding human development is a fundamental quest that involves diverse areas of basic biology ranging from regulation of gene expression and epigenetics, cellular decision-making and differentiation to organogenesis and growth. A deep understanding of human development has direct translational applications in stem cell biology and regenerative medicine, reproductive medicine, developmental disorders, and childhood cancers – some of which have their origin during prenatal development. Despite the central importance of this area of biology, given the experimental inaccessibility of the human embryo and fetus much of our understanding of human development has been gained by inference from in vitro and animal model systems and morphological studies of human fetuses and their tissues. Model systems (particularly, from a mammalian perspective, the mouse) have therefore defined the rules and principles that govern development but there are important differences across species (Fogarty et al., 2017; Blakeley et al., 2015). Morphological studies, including recent application of three-dimensional high-resolution imaging techniques, provide cellular spatial insights into human development (Casoni et al., 2016; Belle et al., 2017). However, with the exception of pre-implantation stages where the embryo can be cultured in vitro (Deglincerti et al., 2016; Shahbazi et al., 2016), human development cannot be manipulated experimentally.
An alternative route to obtaining a snapshot of human development at a molecular and functional level is via transcriptomic studies of human fetuses. Such approaches enable genome-wide interrogation of fetal mRNA, providing high parameter data measurements. Two recent studies of human fetal organs exemplify the utility of fetal transcriptomics to study organogenesis during human development and using in vitro culture systems (Gerrard et al., 2016; Roost et al., 2015). Although insightful, such analyses provide measurements of average values within complex cell ensembles and cannot decode cellular/molecular heterogeneity and sublineage restriction, which are central to understanding the molecular pathways underpinning human development.
To dissect tissues in more detail in terms of their cellular composition and high-resolution architecture, we need to drill down to resolving fetal tissues at the single-cell level. Historically, cytometry and imaging technologies have pioneered single-cell analyses, but these only allow limited parameters to be measured simultaneously. Although newer flow cytometry platforms and mass cytometry have increased the number of measurable parameters to ∼40, selection of these markers relies on legacy knowledge which might bias data analysis and downstream findings. Recent technological advances have enabled the interrogation of the genome and transcriptome at the single-cell level, providing new opportunities to dissect human development at unprecedented resolution (Macaulay et al., 2016).
Single-cell RNA-sequencing (scRNA-seq) is the most widespread, scalable and robust technology available among the single-cell genomics methods (Kolodziejczyk et al., 2015). Although technology development is still evolving rapidly, scRNA-seq approaches are currently sufficiently robust, precise and sensitive to enable detection of even low-level transcripts in single cells (Svensson et al., 2017). As an illustration of the tremendous discovery potential of these methods, single-cell transcriptomics has recently unearthed new cell states in tissues as well studied as blood (Villani et al., 2017). scRNA-seq technology has also provided insights into the molecular regulation of early human development during preimplantation embryonic stages (Petropoulos et al., 2016; Blakeley et al., 2015; Yan et al., 2013; Xue et al., 2013).
In this context of such powerful new technologies, a dedicated effort has arisen to chart the cell types of the developing human entitled the Human Developmental Cell Atlas (HDCA), which is embedded within the Human Cell Atlas (HCA) project. The HCA is an international collaborative effort to map all human cells (www.humancellatlas.org), the aim of which is to generate a cellular ontology using unbiased genomics methods (Regev et al., 2017). Specifically, the two pillars will be cellular (e.g. scRNA-seq) and spatial methods to integrate the transcriptomics profiles within their representative tissue context (e.g. spatial transcriptomics, highly multiplex RNA in situ hybridisation methods) (Ståhl et al., 2016; Chen et al., 2015). By computationally integrating these high-throughput datasets, the HCA will create a comprehensive atlas of cells and tissues at cellular resolution (Fig. 1). The HCA was officially launched in autumn 2016 and is led by a committee representing researchers from around the world. The HCA is an open, non-exclusive endeavour with global meetings as its main form of organisation. While funding for HCA projects may come from various sources, the Chan Zuckerberg Initiative provides dedicated funding streams for HCA projects.
The big picture
In and among the considerable excitement that the prospect of an HDCA creates, one might question its raison d’être. The HDCA will be an expensive effort, in terms of financial and intellectual resources, that is descriptive at its heart: anatomy at the resolution of single cells. The HDCA will help to provide a transcriptome-based definition of cells in gestational time and histological space, creating a knowledge framework for developmental biologists to leverage for mechanistic studies and clinical translation. The developmental datasets will enable accurate comparisons across species and help guide animal studies for enhanced relevance to human biology.
By way of comparison with previous studies at single-cell resolution, a plethora of additional insights has arisen by profiling embryonic and fetal tissues using scRNA-seq approaches. These range from studies on early human embryos (Petropoulos et al., 2016) to human fetal brain (Pollen et al., 2015; Camp et al., 2015; La Manno et al., 2016; Liu et al., 2016) and liver (Camp et al., 2017; Notta et al., 2016). In each case, the data have provided insights into unexpected cell states and the transcription factors regulating the molecular pathways and developmental switches for lineage specification and cell differentiation. This provides a strong rationale to expand single-cell genomics studies to human fetal cells and tissues in an organic way. In practice, therefore, the HDCA will start as a single-cell transcriptomic effort, the progress of which is dependent on the enthusiasm of individuals, organisations and funders involved in the HCA project.
Besides unravelling cell states, a further key question relates to their developmental and lineage relationships. While much of this can be inferred from transcriptomic data, tracing the precursor-progeny relationships to the level of individual clones is more difficult. It is feasible to reconstruct cellular lineage from mosaic mutations of an organism (Behjati et al., 2014; Ju et al., 2017), using epigenetic marks (Mooijman et al., 2016) or leveraging VDJ recombination for tracing lymphocyte populations (Stubbington et al., 2016). It will be compelling to integrate these methods with single-cell transcriptomics, although the scalability of combined single-cell DNA, RNA, and epigenetic assays is limited (Macaulay et al., 2017).
What else can be learnt from an atlas of human development at single-cell resolution? One might argue that a detailed insight into human embryology in itself justifies the effort. However, the HDCA will generate a reference map that is relevant to a wide range of biological questions beyond basic developmental biology, including regenerative medicine, ageing, cancer and reproduction. An encyclopaedia of the genes that define the development of organs through space and time will provide an invaluable resource for researchers looking to recapitulate cell differentiation, or organogenesis, in a Petri dish for cell therapy or replacement purposes.
Moreover, carcinogenesis exploits early embryonic and fetal molecular pathways to overcome the growth constraints of terminally differentiated cells. A thorough understanding of the parallels between human development and neoplasia will provide fertile ground for the advancement of cancer therapies. How the developing immune system is licensed and gains functional competence impacts all aspects of human biology. Further utility of the HDCA will derive from comparative analyses of adult and fetal cell maps to understand development and ageing. These will additionally define fetal analogues of adult transcripts, which could be exploited for therapy, as in the induction of fetal haemoglobin by hydroxyurea in the treatment of sickle cell disease (Pule et al., 2015).
At this point, with the project still at a very early stage, there are many unanswered questions and undefined standards in generating a comprehensive map of human development, from tissue acquisition to data analysis. It should be stressed that, like the HCA, the HDCA is an open, decentralised and open-ended project that has no formal membership. Hence, multiple groups around the world using samples from established developmental bioresources or independent sources will likely contribute to the project. The experimental, technology and data analysis standards set by the HCA initiative can be capitalised to guide harmonisation of standards for the development atlas. Reproducibility of data within groups and across groups will prove challenging, with tissue quality and processing being important variables.
One such existing human developmental repository is the almost 20-year-old MRC-Wellcome Trust-funded Human Developmental Biology Resource (HDBR) in the UK (www.hdbr.org) (Gerrelli et al., 2015). The HDBR provides tissue access, with consent and ethical approval, including for open access nucleic acid sequence data publication, to researchers in the UK and, with further ethical approval, to researchers worldwide. The lower and upper age limit of embryos and fetuses that can be studied will vary between repositories and legislations. Analysis time points will need consideration of a specific organ's development timeline. There are no established power calculation methods for scRNA-seq analysis in terms of the number of individuals needed to establish a complete profile of a given tissue. However, our experience with adult tissue analysis suggests donor-to-donor variability for the major cellular features is captured with as few individuals as four [based on analysis of lung and peripheral blood (S.A.T., unpublished)].
A single scRNA-seq technique is unlikely to construct a global picture yet retain fine-grain resolution of human development from ‘snapshot’ views across gestational ages. High-throughput techniques such as droplet encapsulation-based methods (Drop-Seq, inDrop, Chromium System from 10X Genomics), massively parallel RNA-seq (MARS-Seq), CEL-Seq and combinatorial barcoding strategy can provide a broad overview of tens of thousands of cells (Kolodziejczyk et al., 2015; Stubbington et al., 2017). However, these approaches rely on short sequencing of the 3′ or 5′ end of the mRNA combined with the use of unique molecular identifiers as a surrogate for mRNA count. By contrast, protocols such as Smart-Seq2 (Picelli et al., 2014; Trombetta et al., 2014) provide full-length mRNA transcripts at the expense of scalability. We envisage combined use of high- and lower-throughput approaches for mapping human development, with the former providing a global view of the organ at a lower resolution and the latter the capacity to zoom in on cellular regions of interest for fine-grained analysis. Nuclear sequencing protocols (Habib et al., 2017) might be required for some tissues or cells that are sensitive to tissue dissociation, isolation and capture. These genomics strategies will be combined with conventional microscopy and imaging cytometry for protein-level information in situ, and will form the basis of the studies on the developing human immune system undertaken by the authors.
Making sense of the data
A significant challenge to any cell atlas initiative relates to the logistics of data management, including storage, processing, analysis, curation and measures to enable wider data dissemination and use. With transcriptomics and genomics methodologies dominating the early phase of HDCA, data generation and management will need to be centralised and led by international genome and bioinformatics institutes. Examples of such databases are the EMBL-EBI Expression Atlas (Petryszak et al., 2014), and the dedicated HCA Data Coordination Platforms based in North America and the UK.
Processing single-cell transcriptomics data poses new challenges (Stegle et al., 2015). In the context of the HDCA (and the HCA more broadly), scalable and standardisable methods will have to be developed to handle large datasets. Given the magnitude of data that will be generated, the need for machine learning algorithms becomes crucial to aid in cell clustering and annotation, which currently requires significant manual human input. Semi-automated analytical methods need to be balanced with more bespoke algorithms for defining cellular lineages, which will involve temporal reconstruction across developmental stages and interrogation of specific biological questions.
Applying such algorithms to analyse snapshot human data, where the continuum of cellular developmental states should exist but at variable frequency, can be challenging for two main reasons. First, cells at the earliest stage of development down a particular lineage need to be identified at a point when canonical markers used to identify cells in postnatal and adult life may not be faithfully expressed in these primitive precursors. This might seem intuitive in cases such as the well-characterised haematopoietic stem cell system (Notta et al., 2016), but not necessarily for other cellular lineages and during fetal life where alternative and/or lesser-known progenitors may be dominant. Second, some cell types and intermediate states may be found at very low frequency – and/or only very transiently – necessitating enrichment for adequate capture.
Making use of the data
Harnessing the initial fruits of the HDCA will necessitate engagement of the research community with the encyclopaedia that will be generated. Similar large-scale systematic cataloguing efforts, such as cancer genome sequencing, had a natural scientific and clinical audience. The potential relevance and applications of HDCA-derived knowledge will appeal to a diverse community and research areas not necessarily rooted in developmental biology per se. The other users of the research tissue repositories and the data generation groups will provide ‘seed points’ for disseminating information, and the HDCA will have to advertise its existence and relevance widely and make its curated data publicly available through browsable portals with an intuitively designed user interface. Data access cannot be an afterthought of the HDCA, but a priority at the outset. Data structures must be built to facilitate access, navigation and integration of conventional morphological, histological, experimental and multi-omics datasets. Access to the HDCA will be a single portal for all branches of the HCA, which will require maintenance and continued update.
A major challenge will be to integrate the single-cell transcriptomic data into a spatial atlas, ranging all the way to the anatomical structures of the fetus. There are 3D atlases based on imaging and coarse-grained gene expression data for both mouse (Richardson et al., 2014) and human (Kerwin et al., 2010). These will be valuable resources, which the HDCA will build upon to link cellular and tissue architecture to anatomy.
Looking forward, the HDCA must not languish as a static catalogue but work with the wider scientific community to rapidly translate the findings into practical and economical ways to isolate and study the new cell types identified, revise cellular taxonomy and derive consensus nomenclature, interrogate the cellular and molecular pathways identified to unravel new biology and advance the application of organoids as a testable model system (Clevers, 2016). Data generation and biological enquiry should be conducted as an integrated effort to make the HDCA relevant and maintain the enthusiasm of researchers and funders alike. Whether HDCA requires the study of a limited number of fetuses, or requires a monumental effort lasting a decade, is unclear at this point. We do not know how many fetal cell types exist, or have credible means of estimating their number.
Funding and ethical considerations
In pursuing HDCA (and broader HCA) projects, relevant permissions, in accordance with local ethical and legal frameworks, will be required to generate the data and to publish raw sequencing reads, as, for example, the aforementioned UK HDBR has implemented. Another obvious debate is who should be funding encyclopaedic cell atlas endeavours and how this will impact on intellectual property and any potential commercial rights arising from these efforts. How existing academic, funding and commercial stakeholders engage with each other in the current landscape and how these relationships evolve throughout the HDCA initiative will require nuanced navigation through unchartered territory to maximise benefit for all parties, including the general public.
The HDCA endeavour, by its nature and ambition, will need to be responsive and adaptable, learning from a phase of pilot experiments to develop efficient, robust and standardised experimental protocols, data generation and management pipelines. Feedback and involvement from the developmental biology community are crucial to ensure its success and future utility as a shared open resource. Participation in the HDCA (and broader HCA) projects can be established in the following ways: direct email contact with the HCA organising team (email@example.com); discussion forums within the HCA Slack channel (https://join-hca-slack.herokuapp.com); and through links at the HCA website on HCA meetings (www.humancellatlas.org/news) and membership and project registry (www.humancellatlas/joinHCA). HCA protocols are also available to share at www.protocols.io.
A philosophical consideration in HDCA or any similar attempt is in deciding when such an atlas serves its purpose and is complete. In contrast to Borges' infinite but stagnant Library of Babel, any effort to map human development, such as HDCA, must remain dynamic, forward facing and continually evolve to achieve its ultimate aim in generating a transformative encyclopaedia to benefit human health and advance clinical practice.
We thank Susana Chouva de Sousa Lopes for critical comments on this article.
We acknowledge funding from the Wellcome Trust (S.B., 110104/Z/15/Z; S.A.T. and M.H., WT107931/Z/15/Z), The Lister Institute of Preventative Medicine (S.A.T. and M.H.), National Institute for Health Research and Newcastle-Biomedical Research Centre (M.H.) and St. Baldrick's Foundation (S.B.). The HDBR is funded by the Medical Research Council and Wellcome Trust (S.L., 099175/Z/12/Z).
The authors declare no competing or financial interests.