Place recognition is a complex process involving idiothetic and allothetic information. In mammals, evidence suggests that visual information stemming from the temporal and parietal cortical areas (‘what’ and ‘where’ information) is merged at the level of the entorhinal cortex (EC) to build a compact code of a place. Local views extracted from specific feature points can provide information important for view cells (in primates) and place cells (in rodents) even when the environment changes dramatically. Robotics experiments using conjunctive cells merging ‘what’ and ‘where’ information related to different local views show their important role for obtaining place cells with strong generalization capabilities. This convergence of information may also explain the formation of grid cells in the medial EC if we suppose that: (1) path integration information is computed outside the EC, (2) this information is compressed at the level of the EC owing to projection (which follows a modulo principle) of cortical activities associated with discretized vector fields representing angles and/or path integration, and (3) conjunctive cells merge the projections of different modalities to build grid cell activities. Applying modulo projection to visual information allows an interesting compression of information and could explain more recent results on grid cells related to visual exploration. In conclusion, the EC could be dedicated to the build-up of a robust yet compact code of cortical activity whereas the hippocampus proper recognizes these complex codes and learns to predict the transition from one state to another.
Navigation is a critical task for most species, and it requires complex representations (target objects, landmarks, places, etc.) and strategies. Many species use path integration (PI) or dead reckoning (a navigation strategy relying on idiothetic information useful in the dark or in a poor visual environment) in conjunction with allothetic information-based navigation (vision, odors, sounds, touch). Starting from a robotics and system modeling point of view, we investigate how mammals can use visual and proprioceptive information to perform homing behaviors or to reach specific goal places.
Many studies related to mammals focus on the hippocampal system (HS), which is composed of the hippocampus and the entorhinal cortex (EC), two brain structures highlighted by the discovery of place cells (O'Keefe and Nadel, 1978) and grid cells, respectively (Hafting et al., 2005), as well as by their involvement in episodic memory (Eichenbaum et al., 1992). Indeed, hippocampal destruction results in a severe anterograde amnesia in humans (Scoville and Milner, 1957). The HS is connected to the associative cortical areas and performs multimodal fusion, as well as fast detection and recall of complex events (McClelland et al., 1995). The HS can be also be thought of as a generic indexing tool for episodic memories (Teyler and DiScenna, 1986; McClelland et al., 1995; Cohen and Eichenbaum, 1993). The main inputs to the hippocampus come from the EC, which contains grid cells. Grid cells have been recorded in the dorsal medial EC (dMEC) in rodents (Hafting et al., 2005), bats (Yartsev et al., 2011; Geva-Sagiv et al., 2015) and even in humans (Doeller et al., 2010; Killian et al., 2012; Jacobs et al., 2013). In contrast, the lateral EC (LEC) shows little spatial selectivity (Hargreaves et al., 2005) but is hypothesized to perform joint processing of spatial and non-spatial information (Yoganarasimha et al., 2011; Deshmukh and Knierim, 2011; Van Cauter et al., 2012; Tsao et al., 2013; Knierim et al., 2014; Save and Sargolini, 2017). The linear combination of dMEC grid cells with different spacings and orientations provides an efficient way to build place cell activities (Rolls et al., 2006; Solstad et al., 2006). Yet, how grid cell activity is generated (and what grid cells code for) is intensely debated (Giocomo et al., 2011; Zilli, 2012; Moser et al., 2014). The hexagonal firing patterns of grid cells can be modeled either by oscillatory interference models (Burgess et al., 2007) or by continuous attractor models (Wan et al., 1994; Touretzky and Redish, 1996; Redish and Touretzky, 1997; Samsonovich and McNaughton, 1997). These models have been instrumental for designing neurobiological experiments to either validate their theoretical predictions or modify their assumptions (Giocomo et al., 2011; Zilli, 2012). Prior robotics experiments and simulations showed that: (1) visual place cells can be easily obtained from the merging of ‘what’ and ‘where’ information [we refer to the ‘what’ and ‘where’ visual pathways (Ungerleider and Haxby, 1994) for primates, but it looks as if rodents could also perform some detailed visual tasks (Lashley, 1938; Kolb and Tees, 1990)] and (2) homing behavior can be achieved through the competition between a small number of place–action associations (Gaussier and Zrehen, 1995; Gaussier et al., 1997, 2000) in a way similar to what has been shown for insects (Wittmann and Schwegler, 1995; Collett and Baron, 1995; Etienne, 1998; Schwarz et al., 2017). In addition, our model of PI shows that it can be performed based on a one-dimensional (1D) field of neurons in a way also quite similar to insects. In mammals, this computation might be performed in different brain structures, including the retrosplenial cortex (RSC) or the parietal cortex (Cooper and Mizumori, 2001; Save et al., 2001; Parron and Save, 2004; Vann et al., 2009; Elduayen and Save, 2014; P.G and J.L.K., unpublished data).
dorsal medial EC
head direction (cells)
modulo operation (i mod M is the remainder of the Euclidian division of i by M)
projection according to a modulo value
In this paper, we first summarize results showing how place cells can be built from visual information or PI. Second, we show how grid cell activity can be explained as a general ‘modulo projection’ property of the cortical activity (Gaussier et al., 2007) if we suppose the existence of PI traces in the RSC. Third, we show how to obtain visual grid cells again using a modulo operator on the ‘where’ component of visual information. We conclude that spatial specificity of the rodent HS reflects a more general role in memory formation through its capability to detect novelty and to learn transitions between complex multimodal states that cannot not be easily detected at the cortical level. Hence, we question whether the hippocampus is the brain area where PI is computed (so as to build a Cartesian map of the environment as proposed by O'Keefe and Nadel, 1978) or, rather, where the results of different spatial computations are projected.
In this paper, we show how a simple robotics model of mammal navigation is useful to interpret neurobiological recordings. We question the current models of the dMEC as a path integrator. Instead, we propose that the EC is a generic merging tool that builds a compact representation of the cortical activity (a kind of hash table). We summarize experiments and simulations showing that grid cells related to PI could be explained as a modulo projection of cortical activity computed in the RSC, where PI could take place. Furthermore, we suggest that the visual grid cells recorded in the human EC could also be explained by the same mechanism.
Place cells from visual information
A place can be characterized by the identity of the landmarks (‘what’), and their azimuth, distance or elevation (‘where’). It can be a direct visual snapshot from a given position (as proposed for insects). Yet, if the image is decomposed into local views related to ‘important’ visual features or landmarks, then conjunctive cells (i.e. cells combining some different input types on common postsynaptic targets to produce ‘conjunctive’ activities) can be used to build a place code merging ‘what’ and ‘where’ information with better generalization capabilities over long distances (Insausti et al., 1987; Zola-Morgan et al., 1989; Gaussier and Zrehen, 1995; Gaussier et al., 1997). This increases the robustness of the visual system to occlusions, and the displacement of objects or displacement of the animal by making it possible to recognize unaltered local areas (landmarks) and to measure their apparent displacement in the visual scene for place recognition and navigation. The merging of ‘what’ and ‘where’ information stemming from the perirhinal and parahippocampal cortex, respectively, is a good candidate for such a conjunctive function. If we assume the animal has access to an allocentric reference frame corresponding to a kind of internal compass (built from vestibular, tactile or visual information, for instance), it may easily obtain information about the angular displacement of a given landmark while moving in the environment. The multimodal head direction (HD) cells can represent such a visual compass (Lepretre et al., 2000; Delarboulas et al., 2014).
Fig. 1 shows a simplified view of the neural network, which was tested on a robot that learned visual places. The robot performed a serial exploration of the visual scene focusing on the local maxima of off-center cells (difference of Gaussians) applied on the gradient of the input image. The robot's visual scan path was controlled by the intensity of the feature points (a winner-takes-all mechanism with an inhibition of return allows the selection of feature points such as corners, end of line, etc., of decreasing activity). This simulated ocular saccades and an attentional spotlight mechanism. One local view is extracted around each saccade after a log/polar transformation of the image, mimicking the projection from the retinal to the primary visual areas (Schwartz, 1980). This transformation provides a robust signature to small-scale variation and rotation. We proposed that the azimuth and elevation of the focus point are coded as a spatial activity bump on two different 1D maps of neurons representing the associated angles. Next, conjunctive cells could be triggered when a local view was recognized under a specific azimuth or elevation (Fig. 1). As a result, if the landmark was perceived from a different position, its azimuth would change. The activity bump on the ‘where’ field would move and would provide lower activity on the associated connection. This mechanism provided a direct way to measure the angular variation of a given landmark between its learned angular position and its current position in the visual field. When the local views or landmarks were explored sequentially (focus of attention or ocular saccades), adding a short-term memory on the conjunctive activities resulted in an activity pattern insensitive to the exploration order of the panorama. At the end of exploration, the short-term memory had a copy of all the activated conjunctive cells building a spatial code (a constellation of landmark×azimuth×elevation) easy to manage for pattern matching (i.e. place recognition).
To test the performance of this model, the robot was put in different locations in a room (7×5 m) to learn 5×5=25 places regularly spaced every 1 m. For each location, the learning of one place cell was triggered. Next, the robot was moved everywhere in the room and the activity of each cell was recorded.
As shown in Fig. 2A, we obtained place fields that generalize over long distances in the room. Each rectangle represents the recording of one cell according to the location of the robot in the room. We can see, for instance, that the neuron that has learned the upper left corner of the room (upper-left rectangle) responds more in that part of the environment. Similarly, the neuron associated with the position in the middle of the room responds more in that location. The neuronal activity is related to the different places. It decreases slowly and spreads over more than 2 m in the experimental room. This interesting result means that we can create large attraction basins around a given location that allow very good generalization capabilities to unvisited places, and the capability to learn a path from place–action association and competition (Giovannangeli et al., 2006). Yet, it is also a puzzling finding, because place fields recorded in the dorsal hippocampus (in the CA3 or CA1 regions) are clearly sharper than the model's place fields. The diameter of the real fields in rodents is at least 10 times smaller (10 to 20 cm) (Mizuseki et al., 2012). This discrepancy may be resolved by adding a competition between place cells, thus leading to sharper place fields as shown in Fig. 2B) or by supposing that our cells correspond more to the place cells recorded in the ventral hippocampus (Jung et al., 1994; Poucet et al., 1994). A second question is whether the cells we recorded in the robot would be classified as place cells by neurobiologists. The answer is clearly no, because the cells are somewhat active everywhere in the environment. The neurobiologists would term these ‘diffuse place cells’ (Quirk et al., 1992; Song et al., 2012), much like some of the unclassified medial EC (MEC) cells [e.g. see Savelli et al. (2008) and also the discussion in Poucet et al. (2014) on ventral MEC ‘place-like cells’].
Is it possible that place cells useful for navigation in an open environment (with long-distance generalization capabilities) exist outside the hippocampus proper? If we suppose that entorhinal cells merge multimodal information, they could be a good candidate to explain why rats with a lesioned hippocampus can still perform some navigation tasks. Even though such large visual place cells could be located in the LEC, results from Deshmukh and Knierim (2011) and Tsao et al. (2013) do not support their existence. Yet, many cells react to objects and their locations, and their weak spatial correlates could correspond to our large place cells. Place-related activity in the LEC in the presence of objects or goals could also correspond to the kind of cells proposed in our model because their learning would be related to the presence of specific distant landmarks and their azimuth or apparent distance. The cells recorded in the ventral MEC could also correspond to some low-resolution grid cells, which we will discuss in the next section, or to the parahippocampal place area, where neurons were found to react to the visual recognition of places (Epstein and Kanwisher, 1998; Epstein, 2008). A last question related to our simple model of visual place cell is what happens if the field of view is limited to 180 deg, as in primates who have a smaller field of view than rodents? By using the same neural network as that used for place cells with a field of view of 180 deg, we obtained an activity pattern similar to the one recorded in the hippocampus of monkeys by Rolls and O'Mara (1995) (Fig. 2). It should be noted that the landmarks were tables, chairs, windows, cabinets, etc., in the room. Yet, no parallax issue related the apparent displacement of proximal landmarks can be seen in the results because the robot was not using two or three landmarks but more than 20 local views for the learning and recognition of the view cells. This provides an averaging effect, reducing the parallax effect regarding any particular landmark. Moreover, landmarks perceived from two different points of view are not recognized and do not provide activity to the view cells, thus also reducing parallax issues in this case. In more complex cases, the solution proposed by Bicanski and Burgess (2016) may apply as a robust input to our network.
Grid cells from path integration
Although place cells occur in the HS and grid cells have been observed in the EC, we propose that PI in a two-dimensional environment is primarily performed outside the EC and the HS. PI can be performed on a 1D field of neurons on which activity bumps related to the current direction and speed of the animal are integrated. The summation of different bumps of activity having a cosine shape results in a single bump of activity in which the most active neuron corresponds to the direction of the global motion (PI vector) and the amplitude of its activity represents the distance traveled in a straight line. This model, inspired by insect navigation (Mittelstaedt and Mittelstaedt, 1980; Wehner and Srinivasan, 1981; Hartmann and Wehner, 1995; Collett et al., 1996; Heinze and Homberg, 2007; Stone et al., 2017), can be generalized to mammals if the cosine shape is replaced by a Gaussian-like shape (or a Von Mises function) representing the activity of the HD cells (built from a 1D attractor model using the angular speed as an input; see McNaughton et al., 1996). Even if the HD cell activity bump is limited to 120 deg, it is sufficient to perform a good approximation of PI, and a leaky integrator or synaptic learning is sufficient to perform PI (P.G. and J.L.K., unpublished data). Field activity can be recalibrated according to egocentric (the entrance of a maze) or allocentric information (a distant landmark). As a result, we can obtain allocentric information related to either turns or the route based on the time constant used in the leaky integrators. The resulting activity looks similar to cell activity recorded in the rat's RSC (Alexander and Nitz, 2015, 2017) and parietal cortex (Nitz, 2012), two of the main inputs to the EC.
At the level of the RSC and posterior parietal cortex, several 1D fields associated with different preferred directions may be maintained and used to perform navigation in the direction of one target according to some population code. The interesting point is that the coding of the direction on these maps would be similar to the one used in the motor cortex (Georgopoulos, 1988) allowing direct control of the action from those fields (Hasson and Gaussier, 2010). Next, if we assume that the activity of some neurons on the field is discretized and folded onto EC cells using a modulo projection, then we can obtain grid cells (Gaussier et al., 2007) (as shown in Fig. 3A).
On a field of neurons indexed with the variable i from 1 to N, we define a modulo projection (based on the mathematical modulo operator) as a parallel projection of N neurons on a group of M neurons. Every M neurons, a folding is performed so that the neuron j in the output group receives the projection of the activities of all the neurons i such that i mod M=j, where mod is the modulo operation and j is the remainder of the Euclidian division of i by M. In our case, grid cell activity results from a projection of PI and should not be seen as an interference model (Giocomo et al., 2011; Burgess et al., 2007), even if the process is somehow similar to a spatial interference. The difference in relative direction preference of two neurons on the PI field defines the grid orientation whereas the modulo value determines the grid spacing.
The drift of PI over time requires us to introduce a recalibration procedure to allow spatial stabilization of the grid cells according to the animal position in its environment (Gaussier et al., 2007). The recalibration can be performed by visual place cells or from tactile information such as that stemming from environmental boundaries (Hardcastle et al., 2015). Robotics experiments (Jauffret et al., 2015) confirm that the sharpness of grid maps is better if the recalibration takes place in a corner of the arena rather than in the middle of the arena because the walls stop the robot and provide a more reliable positioning for the reset or recalibration of the PI field (see Fig. 3B for an example of grid cells obtained from the projection of the discretized activity of a 1D PI field using a modulo projection). From a spectrum of grid cells with different spatial frequencies, it becomes possible to build place cells (Solstad et al., 2006; Gaussier et al., 2007). Yet to obtain useful place recognition for navigation, it is important to avoid the merging of too sharp grid cells. As a matter of fact, the use of binary grid codes provides place cells with a weak generalization property. If the contribution of each grid is the same for the recognition of a place, this means that a change in the smallest grid resolution has the same effect as a change in the largest grid resolution, resulting in very poor navigation (and generalization) capabilities. To overcome this difficulty, a simple technical solution would be to smooth the grid activity pattern using positive lateral interaction between the neurons having the same grid spacing. Hence, there is a need to control the size of the peak activity of grid cells to obtain a good generalization. The need for both a competition mechanism, to learn a sparse code, and positive lateral interactions (or at least a positive effect of interaction), to allow better generalization capabilities, is reminiscent of the properties of the attractor networks. Yet, there is no particular constraint on the lateral interaction weights in terms of regularity of the connectivity or homogeneity in the weight values (e.g. to avoid bias in a given direction).
For these reasons, the grid cell system appears to be a better candidate to represent and categorize the displacements than a system for the computation of PI. This seems to be confirmed by the recordings of grid-like neuronal activity during human spatial navigation in a virtual environment (Jacobs et al., 2013) and in functional magnetic resonance imaging studies (Doeller et al., 2010; Stangl et al., 2018). It must be emphasized that in a multi-compartment environment, where the open field is broken up by barriers, the grid cell firing pattern is fragmented [the grid map is reset when the animal enters a new alley (Derdikman et al., 2009)]. Moreover, grid cells mature more slowly than place cells (Langston et al., 2010; Wills et al., 2010). This could support our initial intuition of a PI performed outside the HS for our grid cell model in the MEC (Gaussier et al., 2007).
Finally, it is interesting to note that grid cell firing is also observed when animals are not moving and when people are performing ocular saccades on a visual scene (Killian et al., 2012; Wilming et al., 2018), a task with no obvious relationship to navigation and PI. Some direction-sensitive cells may also relate to viewing particular landmarks (Rolls and O'Mara, 1995; Rolls, 1999; Ekstrom et al., 2003). In the next section, we examine whether grid cells can be built from visual information only (without the need of idiothetic information and PI).
Grid cells from still images
In the second section (‘Place cells from visual information’), ‘what’ and ‘where’ information were merged in the EC to obtain large visual place fields, whereas in the third section (‘Grid cells from path integration’), different projections of PI were merged after a modulo operation to obtain grid fields. What would happen if the modulo projection was applied to the azimuth or to the apparent distance of different landmarks connected to the same conjunctive cell used for building visual place cells?
We show here that simulated grid cell activity can be obtained from visual information only (similar to Killian et al., 2012; Wilming et al., 2018). For the sake of simplicity, we propose that the modulo operation is only performed on the angular or distance information related to some visual landmark (the ‘where’ information). If a cell learns at a given position the conjunction of the modulo projection of two distinct landmark fields (see Fig. 4A), it will be reactivated for all the angular positions associated with the conjunctive cell [i.e. for all the angles θi such that proj(θi)=proj(θlearned), where proj is a projection following the modulo principle (see Fig. 4A) and θlearned is the azimuth associated with the neuron activated for the learned position]. The places where the two modulo patterns coincide correspond to grid activity as shown in Fig. 4B–D.
First, we consider the minimal case of an environment with two different landmarks (i.e. landmarks associated with the recognition of different visual features) L1 and L2 located at the coordinates (0,0) and (0,50) in a simulated environment. The azimuths are computed from landmark angular positions relative to an absolute direction obtained from idiothetic information (vestibular compass or even a visual compass; Delarboulas et al., 2014) measured at the center of the environment. We assume here that one conjunctive cell in the EC has learned the landmark configuration at the center of the environment [for the coordinates (25,25)]. Fig. 4C shows that grid patterns are directly obtained from the conjunction of the modulo of the azimuth (according to Eqn A1 in the Appendix). A high thresholding of cell activity, in our case set to 0.995, is necessary for contrast enhancement. Neurons with an activity level lower than the average activity are set to zero and a high gain is applied so that the maximum of activity is set to 1. Without this strong competition, all neurons would be very active.
Fig. 4E shows the results with four different identifiable landmarks located in the four corners of the visible environment. The grid distribution is much more regular in this case. Using the apparent size of the landmarks instead of their azimuth produces slightly different results as the apparent size varies as the arc tangent of the distance and induces non-homogeneous deformations (see Appendix for more details).
Yet, to obtain hexagonal grid maps, several constraints must be verified. (1) The ‘where’ information must be preserved when animal's displacements induces view changes. Either the recognition of ‘what’ information has to be powerful enough to be invariant to the animal's displacements, or each piece of ‘where’ information has to be stored in an attractor network, allowing a smooth update of the landmark heading when the animal is moving. (2) The conjunctive cells merge the activity of few landmarks. If the number of landmarks used to build one grid cell becomes too high (for instance, using 30 landmarks), the probability of obtaining a grid cell decreases because the landmarks allow a complete place discrimination in spite of the folding (sharp place cells are obtained instead of grid cells). (3) It is very important to merge landmarks associated with very different orientations relative to the position where the learning was triggered. The best results will be obtained for an angular distance approximately 60 deg between two landmarks. This value does not need to be precise, as shown in our simulations (e.g. see Fig. 4E), but still the use of the cell activity will be more relevant if the angular distance between the landmarks allows a good differentiation of the grid pattern. If we suppose that the ‘where’ information comes from an activity bump in the RSC or the parietal cortex, its shape could be similar to that of HD cells. If their activity covers 90 deg and we consider that a given conjunctive cell tries to maximize the independence between its inputs, then two landmarks with angular differences less than 45 deg will present an overlap and will have a lower independence score for learning (so their conjunction should not be selected for the learning of a grid cell). However, an angular landmark distance of approximately 60 deg will present the best independence score. If a modulo operator is used, then there will be coincidence for larger angular differences. Hence, if we assume the cells are conjunctive cells using some kind of maximum of independence to build the modulo operator and to select landmarks, then grid cell patterns should be obtained. (4) To obtain effective grid activity, an important thresholding needs to be performed as well as a strong competition between the cells, especially if the bump activity on the ‘where’ field is large, allowing high activity for a distant azimuth, especially after the modulo projection.
Even if azimuth information seems more reliable than visual distance information, integration of both types of information in visual grid cells can provide more accurate discrimination (conjunction of the landmark identity×azimuth×elevation). In the brain, clearly the projection of the cortical areas coding for the landmark identity or features could also be compressed using the same folding mechanism proposed for azimuth and elevation without changing the main properties at study here.
Discussion and conclusions
Our work supports the hypothesis that the EC is able to build compact representations of cortical activity. Assuming that the HS tries to detect spatio-temporal configurations of the brain activity, it is important that the EC builds a compact code representing the complex instantaneous configuration of the brain activity (including sensory-motor activities, internal body state, emotions, motivations, etc.). To differentiate a huge number of states presumably largely superior to the number of neurons in the EC, each LEC or MEC neuron must receive projections from a large number of cortical neurons through the postrhinal and perirhinal cortices. If neighbor neurons in the cortex code for neighbor states and are projected onto the same EC neuron, then the capability to discriminate between neighbor states in the hippocampus would be very poor. A modulo projection (cyclic projection using a modulo operator) of cortical information on a set of EC neurons minimizes the risk that the same EC neuron will be activated from neighbor situations. More precisely, the modulo projections using prime numbers are a good way to build a hash code of the cortical activity. This projection minimizes collisions similar to how a correspondence table, used in computer science, represents a complex state (e.g. words in a dictionary) with a short reference code. Although in computer science, collisions are an important issue that need to be avoided, in our model, local collisions can be accepted for states belonging to different environments because other projections may provide a way to avoid confusion. The modulo projection could be explained by biological self-organization mechanisms following the Turing diffusion equation as proposed by McNaughton et al. (2006), by the competition between cells in an attractor network (Giocomo et al., 2011) or by the hexagonal tiling of place cells in a competitive structure [if the recruitment of visual place cells is performed when the activity of the most activated place cell is below a given threshold, then the result is a hexagonal tiling of the environment (Gaussier et al., 2007) that could be used as a way to modulate EC learning]. Another solution could be based on synaptic learning rules using the hypothesis that neighboring neurons in the cortex cannot project onto the same neuron in the EC, or that EC neurons try to maximize the diversity of their inputs across their dendritic tree. If the input comes from some cortical HD cells with bumps of activity having a radius of approximately 60 deg, it would be possible that selecting landmarks with as little as possible overlap on the conjunctive cells would end in triangular grid activity of approximately 60 deg. Hence, it looks as if several different mechanisms could contribute alone or in mixed way to explain the necessary properties of the modulo mapping necessary in our model.
Starting from homing behaviors and detour problems in navigation (Gaussier et al., 2000), we conclude as others (Spiers and Gilbert, 2015) that ‘path integration’ or ‘dead reckoning’ (Etienne and Jeffery, 2004) may (Worsley et al., 2001; Wolbers et al., 2007; Stangl et al., 2018) or may not (Shrager et al., 2008; Kim et al., 2013) involve hippocampal–parahippocampal structures, and that this involvement perhaps depends on the distance navigated (Gaussier et al., 2007; Arnold et al., 2014). Our different studies, as well as work on insect modeling (Stone et al., 2017; Schwarz et al., 2017; Goldschmidt et al., 2017), support the hypothesis that PI can be obtained without the need of EC grid cells. HD cells or any kind of internal compass associated with local memories (to support temporal integration) can be used to perform PI in different brain structures [different kinds of reset/preset and different time constants are sufficient to explain many different cells found in the RSC and the parietal cortex (P.G. and J.L.K., unpublished data)]. Grid cells from PI can be explained from the projection of the discretized activity of a 1D field presumably located in the RSC without the need of a two-dimensional attractor network in the EC (Gaussier et al., 2007; P.G. and J.L.K., unpublished data). Some EC cells would be dedicated to representing in a concise way different cortical PI vectors thanks to a modulo projection. Yet, the PI mechanism required recalibration based on recognition of some specific places or boundaries/corners in the environment. This could explain why the disruption of the MEC seems to induce an incapacity to perform PI (Gil et al., 2018) and why young rats can recognize places and navigate correctly despite their immature grid cells (Bjerknes et al., 2018). Indeed, our model suggests that grid-like activity in the MEC is not necessary to build place cells, but place cells are important for the recalibration of PI.
Applying the modulo projection to the ‘where’ information in an image (i.e. relative azimuth and elevation) can explain other forms of grid activity found in the human brain (Killian et al., 2012, 2015; Killian and Buffalo, 2018). Our results on grid cells also bear a relation with results from Krupic et al. (2015) showing that grid cell symmetry is shaped by environmental geometry. In the same way, non-spatial information would also take advantage of the modulo projection and conjunctive cells could maximize the independence between input codes. They could correspond to some stripe activity or some grid activity in complex sensory space, explaining why the activity of these neurons is not easy to decode or associate with a specific situation. From our model, we predict that the activities related to non-spatial modalities should also provide some periodic folding. Projection of tactile information at the cortical level could generate interesting grid patterns. For audio information, or the perception of time, the information being mainly unidimensional, the modulo folding should induce some stripe-like activity where the same neuron would be activated for increasing delays (perhaps non-periodic because the time perception is not linear with the clock time). Even some trace of temporal activities (if discretized) could activate periodically the same EC conjunctive cell.
To obtain grid patterns as predicted by our model, several constraints need to be verified. For visual grid cells, the competition in the EC superficial layer (EC2) has to be strong enough to help isolate the maxima associated with the grid activity. Yet, if the number of coherent landmark azimuths or distances used by a given conjunctive cell is too large, our simulations show that this cell becomes a place cell because a high response can only be obtained for a single location. This is not a strong constraint if we suppose that the time window for the temporal integration on conjunctive cells in the dMEC is limited or that information related to visual local views is merged before it reaches the EC, so that the EC receives only a limited number of signatures characterizing the visual environment (conjunction of visual signatures distant of an angle of ∼60 deg). Next, because the modulo value controls the scale of the grid, the modulo value should be related to the position on the dorso-ventral axis of the EC. More work needs to be performed to determine whether the √2 scale variation found by Stensola et al. (2012) in the field spacing of grid cells could be related to the distance integration properties on the PI fields (leaky integrators) used as input. Our robotic experiments show that the projections of visual information without the modulo operator produce place cells or view cells with large fields. The modulo operator could help in building precise place recognition, and this would be useful for the hippocampus to recognize transitions from one location to another. Yet this modulo mapping can induce the loss of a priori generalization capabilities to neighbor locations and the incapacity to perform a shortcut or a detour to reach a goal when starting from a place never visited but belonging to a known visual environment. If the projection on the ventral MEC is not performed using a modulo projection or with a very large modulo, then the ventral MEC could code for the large visual place cells used in our robots for visual navigation in an open environment (large place cells allow some competition mechanism to find a shortcut and reach a homing location from a long distance). These cells could correspond to those found previously (Quirk et al., 1992; Song et al., 2012) or the unclassified MEC cells (Savelli et al., 2008).
Finally, in this paper, we focused only on the feedforward integration of information coming from associative cortical areas onto the EC with the hippocampus proper as a merging output to obtain state categorization. In previous work (Gaussier et al., 2002; Banquet et al., 2005), we proposed that the hippocampus learned to predict transitions between places or any state characterizing the current behavior (and its timing) for the building of cognitive graphs. In the fronto-parietal network, the feedback loop from neurons in CA1 to the subiculum and the deep layers of EC (EC5/6) can be used as a way to predict the next states (see Fig. 5). Our modulo projection model could take advantage of any feedback from the hippocampal activity to stabilize the grid activity. Because our model shows that grid cells may be obtained from different modalities, we face the issue of the alignment of these different grids. In navigation tasks, if grid cells are primarily obtained from the modulo projection of a discretized PI code (or landmarks) built outside the HS (presumably in the RSC), maintaining the alignment of the grid pattern when one goes from one environment to another and then back again requires recalibrating the PI and/or the head orientation always in the same way as in the original environment. Assuming the recognition of specific places or transitions between places is performed in the hippocampus proper, impairment at the hippocampal level should suppress the coherent recalibration of the different PI fields inducing drift in the grid patterns from one experiment to the next, as shown by Stangl et al. (2018) for Alzheimer's disease.
However, this PI outside the HS does not suppress the capabilities of the hippocampal loop to maintain and update grid patterns in the EC. As seen on Fig. 5, movement integration activity coming from the subiculum and its thalamic input the anterodorsal nucleus of the thalamus (where HD cells are found) can be easily associated with the transition between ‘important’ places to build conjunctive grid×HD cells (Sargolini et al., 2006). It would be interesting to classify CA1 neurons as ‘transitions cells’ instead of classical ‘place cells’ because one transition can trigger a unique action whereas a place cell can be associated with multiple actions. Hence, while selecting an action from a winning place cell is not direct because several actions may be associated with the same place, a unique action will be associated with a winning transition (Gaussier et al., 2002; Banquet et al., 2005; Hirel et al., 2013). Moreover, the transitions should predict the future state of the system and can be very useful for self-assessment [novelty detection, detection of deadlock, etc. (Jauffret et al., 2013)] and the building of a cognitive map in the fronto-parietal network. As a result, the hippocampus can be seen as a predictor of complex spatio-temporal and multimodal events allowing the detection of novelty and the coding of future transitions (and timing).
Results showing the importance of theta oscillations (Brandon et al., 2011; Koenig et al., 2011) and the excitatory drive from the hippocampus (Bonnevie et al., 2013) or the existence of grid cells without theta oscillations in the EC of bats (Yartsev et al., 2011) could be interpreted in light of the hippocampal loop and the interaction between the hippocampus and the septum (Hasselmo and Schnell, 1994; Hasselmo, 2006). Our model would be also consistent with Sanders et al.’s (2015) questioning whether PI occurs in the EC and their proposal about mind-travel in the hippocampus (see also Gorchetchnikov and Grossberg, 2007; Mhatre et al., 2012; Grossberg and Pilly, 2014). The formation of grid cells from an extra-hippocampal PI does not exclude the capacity of the EC to combine their ‘stripe cells’ (note that the proposed ‘stripe cells’ could correspond to the result of the modulo projection of the discretized activity of our one-dimensional PI field in a given direction (Gaussier et al., 2007)] to form grid cells and/or to use the hippocampal loop to predict the evolution of the grid pattern, especially if we suppose that EC neurons learn the correlations between different input signals. One important point in the future will be to determine which pathway(s) control the formation of the grid cells and their shape. It is likely that some pathways are very important for the maturation and organization of grid cells and that other sources of information could provide sufficient information to maintain the grid patterns (and to exploit them) in the absence of the first ones. These are very challenging questions both for cognitive sciences and for the development of robust navigation systems.
where i is the index of the neuron on the Fk field, the kth input field, and Fk (p,i) is the activity of the neuron i on the field when the animal is at position p. Each field is composed of Nk neurons (the size of the field). mk is the value of the modulo operator applied on the projection of the field, and fk() is a function representing the activity bump on the input field k.
Place cell and not grid cell activity
If the number of landmarks used increases, the probability of obtaining grid cells decreases. Yet, place discrimination increases owing to the increase of independent information. In the following example (Fig. S1), with 10 random landmarks located on the four different sides of the environment, we have still some kind of grid or wave activity. However, with 30 random landmarks, there is a peak of activity at the center of the arena: the location used to build the code. We can also obtain cells that look like place cells with modulo π/4 when using 30 ‘landmarks’ randomly distributed around the simulated room (see Fig. S2). In this case, the ‘place field’ is much larger because of the choice of a higher value for the modulo.
Visual grid cells from apparent landmark size
Distances can be obtained from the apparent size of a given landmark in the camera image. This apparent size corresponds to the elevation of the landmark. Hence, for a landmark of height h located at a distance d from the animal, the visual angle is φ=arctan(h/d). This information has to be scaled according to the number of neurons or pixels in the vertical dimension of the camera. In our case, for the sake of comparison, we consider that an angle of π/2 corresponds to the maximal distance dmax to allow comparisons between azimuth and apparent size (or elevation). The visual distance is computed as dv=dmax·(φ/2π), where φ is the elevation. In our case, we simulate a room of 5×5 m with dmax=2√5 and visual landmark of size d=1 m. We see the activity decreases less than from classical distance. With a threshold of 0.995, there are still plenty of activated locations.
For comparison purposes, in the following simulations, the distance is used explicitly as it can be provided by a regular PI starting from each landmark. With two landmarks in the corners and a modulo value equal to the size of the environment divided by 6, we have still grid cells has shown in Fig. S3A,B. With 30 landmarks and a modulo equal to the size of the environment divided by 10, the grid pattern disappears as with the apparent size information (Fig. S3C,D). With four landmarks taken at the corners, we obtain quite regular grid cells, as shown Fig. S4. The activity pattern looks very regular because of the specific geometrical properties of this configuration. The distortions are not visible in this case.
Many thanks to Adrien Jauffrey for his work on the grid cells and Arnaud Blanchard and Frederic Demelo for their technical support. A special thanks to G. Gerlach, S. Heinze, J. Knierim, T. Wolbers, W. Warren and B. Webb for the very interesting discussions we had during the workshop on ‘Linking brain and behavior in animal navigation’ organized by the Journal of Experimental Biology.
P.G. was supported by Equipex Robotex and the VEDECOM institute, and a Centre National de la Recherche Scientifique scholarship for his sabbatical leave at the University of California, Irvine. J.K. was supported by the National Science Foundation Division of Information and Intelligent Systems (award IIS-1302125) and the Intel Corporation.
Videos of robotics experiments are available at https://perso-etis.ensea.fr/neurocyber/Videos/homing/
Supplementary information available online at http://jeb.biologists.org/lookup/doi/10.1242/jeb.186932.supplemental
The authors declare no competing or financial interests.