Home Back

Hippocampus and striatum show distinct contributions to longitudinal changes in value-based learning in middle childhood

elifesciences.org 1 day ago

eLife assessment

Behavioral results

Modeling results

Longitudinal brain-cognition links

Cognitive and brain measures with cross-sectional and longitudinal links. (A) Recognition memory (corrected recognition = hits - false alarms) for objects presented during delayed feedback was only enhanced at trend. (B) Learning scores depicted here were used in the LCS analyses. Learning scores were the model-derived choice probability of the contingent choice using fitted posterior parameters. (C) Hippocampal and striatal volumes increased between waves, while hippocampal volume increased most. (D) A four-variate latent change score (LCS) model that included striatal and hippocampal volumes as well as immediate and delayed learning scores. Depicted are significant paths cross-domain (brain-cognition, dashed lines) and within-domain (brain or cognition, solid lines), other paths are omitted for visual clarity and are summarized in Table 4. Depicted brain-cognition links included ϕ S T R w 1 , L S i m e , w 1 (covariance between striatal volume and immediate learning score at wave 1), as well as ϕ H P C w 1 , L S d e l , w 1 and ϕ S T R w 1 , L S d e l , w 1 (covariances between hippocampal and striatal volumes and delayed learning score at wave 1). Brain links included ϕ S T R w 1 , H P C w 1 and ρ ∆ S T R , ∆ H P C (wave 1 covariance and change-change covariance), and similarly, cognition links included ϕ L S i m e , w 1 , L S d e l , w 1 and ρ ∆ L S i m e , ∆ L S d e l . Covariates included age, sex and estimated total intracranial volume. ** denotes significance at α < 0.001, * at α < 0.05.

In this study, we examined the longitudinal development of value-based learning in middle childhood and its associations with striatal and hippocampal volumes that were predicted to differ by feedback timing. Children improved their learning in the 2-year study period. Behaviorally, learning was improved by an increase in accuracy and a reduction in reaction time (i.e. faster responses). Further, children’s switching behavior improved by an increase in win-stay and a decrease in lose-shift behavior. Computationally, learning was enhanced by an increase in learning rate and inverse temperature, which together constituted more optimal value-based learning. Further, feedback timing modulated specifically the inverse temperature. In terms of brain structures, we found that longitudinal changes in hippocampal volume were larger compared to striatal volume, which suggests more protracted hippocampal maturation. The brain-cognition links were longitudinally stable and partially confirmed our hypotheses. In line with previous adult literature and our assumption, hippocampal volume was more strongly associated with delayed feedback learning. Contrary to our expectations, episodic memory performance was not enhanced under delayed feedback compared to immediate feedback. Furthermore, striatal volume unexpectedly was associated with both immediate and delayed feedback learning, suggesting a common involvement of the striatum during value-based learning in middle childhood across timescales.

Children’s learning improvement between waves was described behaviorally by increased win-stay and decreased lose-shift behavior. Our finding is in line with cross-sectional studies in the developmental literature that reported increased learning accuracy and win-stay behavior (Chierchia et al., 2023; Habicht et al., 2022). Our longitudinal dataset with younger children further suggests that learning change is not only accompanied by increased win-stay, but also decreased lose-shift behavior. We found lower learning performance and less optimal switching behavior in girls compared to boys, which could point to sex differences for reinforcement learning during middle childhood (Appendix 2). Previous studies have found both male and female advantages depending on their age and the type of learning task (Mandolesi et al., 2009; Overman, 2004; Evans and Hampson, 2015). Alternatively, sex differences may have been driven by confounding variables not included in the analysis.

Computationally, we found longitudinally increased and more optimal learning rate and inverse temperature, as shown by simulation data, that add to the growing literature of developmental reinforcement learning (Nussenbaum and Hartley, 2019). Adult studies that examined feedback timing during reinforcement learning reported average learning rates range from 0.12 to 0.34 (Foerde and Shohamy, 2011; Höltje and Mecklinger, 2020; Lighthall et al., 2018), which are much closer to the simulated optimal learning rates of 0.29 than children’s average learning rates of 0.02 and 0.05 at wave 1 and 2 in our study. Therefore, it is likely that individuals approach adult-like optimal learning rates later during adolescence. However, the differences in learning rate across studies have to be interpreted with caution. The differences in the task and the analysis approach may limit their comparability (Zhang et al., 2020; Eckstein et al., 2021). Task proporties such as the trial number per condition differed across studies. Our study included 32 trials per cue in each condition, while in adult studies, the trials per condition ranged from 28 to 100 (Foerde and Shohamy, 2011; Höltje and Mecklinger, 2020; Lighthall et al., 2018). Optimal learning rates in a stable learning environment were at around 0.25 for 10–30 trials (Zhang et al., 2020), another study reported a lower optimal learning rate of around 0.08 for 120 trials (Behrens et al., 2007). This may partly explain why in our case of 32 trials per condition and cue, optimal learning rates called for a relatively high optimal learning rate of 0.29, while in other studies, optimal learning rates may be lower. Regarding differences in the analysis approach, the hierarchical bayesian estimation approach used in our study produces more reliable results in comparison to maximum likelihood estimation (Brown et al., 2020), which had been used in some of the previous adult studies and may have led to biased results towards extreme values. Taken together, our study underscores the importance of using longitudinal data to examine developmental change as well as the importance of simulation-based optimal parameters to interpret the direction of developmental change.

Despite a relatively immature hippocampal structure in middle childhood, our results confirmed a longitudinally stable association between hippocampal volume and delayed feedback learning. However, episodic memory in this learning condition was not enhanced. This suggests a developmentally early hippocampal contribution to value-based learning during delayed feedback, which does not modulate episodic memory as much as compared to adults. Therefore, our study partially extends the findings from the adult literature to middle childhood (Foerde and Shohamy, 2011; Foerde et al., 2013; Höltje and Mecklinger, 2020; Lighthall et al., 2018). The reduced effect of delayed feedback on episodic memory may be due to the protracted development of hippocampal maturation. In an aging study with a similar task, older adults failed to exhibit enhanced episodic memory for objects presented during delayed feedback trials, and they showed no enhanced hippocampal activation during delayed feedback and (Lighthall et al., 2018). Therefore, the findings converge nicely at both childhood and older adulthood, during which the structural and functional integrity of hippocampus are known to be less optimal than at younger adulthood (Shing et al., 2010; Keresztes et al., 2017; Ghetti and Bunge, 2012).

Our brain-cognition links were only partially confirmed, as striatal volumes exhibited associations with not just immediate learning scores, as we predicted, but also with delayed learning scores. This result suggests that the striatum may be important for value-based learning in general rather than exhibiting a selective association with immediate feedback learning. This is also what we found in an explorative analysis that related the striatum to learning rate in general and further predicted longitudinal change in learning rate (Appendix 5). This overall reduced brain-behavior specificity could reflect less differentiated memory systems during development, similar to findings from aging research. Here, older adults exhibited stronger striatal and hippocampal co-activation during both implicit and explicit learning, compared to more dissociable brain-behavior relationships in younger adults (Dennis and Cabeza, 2011). Interestingly, even in young adults, clear dissociations between memory systems such as in non-human lesion studies are uncommon, and factors like stress modulate their cooperative interaction (Packard and Goodman, 2013; Packard et al., 2018; Schwabe and Wolf, 2013; Ferbinteanu, 2016; White and McDonald, 2002). Further, there are methodological differences to previous studies that could explain why striatal volumes were not uniquely associated with immediate learning in our study. For example, previous studies related reward prediction errors to striatal and hippocampal activation (Foerde and Shohamy, 2011; Höltje and Mecklinger, 2020; Lighthall et al., 2018), whereas we examined individual differences in brain structure and the model-derived learning scores. Future functional neuroimaging studies with children could further clarify whether children’s memory systems are indeed less differentiated and explain the attenuated modulation by feedback timing. Taken together, compared to the adult literature, our results with children showed that the hippocampal structure was associated with delayed feedback learning, but did not enhance episodic memory encoding, while the striatum generally supported value-based learning. These findings point towards a developmental effect of less differentiated and more cooperative memory systems in middle childhood.

Our computational modeling results revealed a separable effect of feedback timing on inverse temperature, which suggests that the memory systems modulated learning during decision-making. The reported behavioral differences in reaction time and their correlation to the inverse temperature further support the idea of a decision-related mechanism, as we found children to respond faster during delayed feedback trials and faster responding children also exhibited more value-guided choice behavior (i.e. higher inverse temperature) during delayed compared to immediate feedback. The hippocampus may contribute to a decision-related effect in the delayed feedback condition by facilitating the encoding and retrieval of learned values (Shadlen and Shohamy, 2016). This is in contrast to previous event-related fMRI and EEG studies reporting feedback timing modulations at value update (Foerde and Shohamy, 2011; Höltje and Mecklinger, 2020; Lighthall et al., 2018), which may be due to at least two reasons. First, we did not include a functional brain measure to examine its differential engagement during the choice and feedback phases. Second, in such a reinforcement learning task, disentangling model parameters from the choice and feedback phases can be challenging, such as for the inverse temperature and outcome sensitivity (Browning et al., 2023). Taken together, hippocampal engagement at delayed feedback may enhance outcome sensitivity as well as facilitate choice behavior through improved retrieval of action-outcome associations. A mechanism facilitating retrieval seems especially relevant in our paradigm, where multiple cues were learned and presented in a mixed order, thus creating a high memory load. To summarize, our study results suggest that feedback timing could modulate decision-making in addition to or as alternative to a mechanism at value update. However, disentangling the effects of inverse temperature and outcome sensitivity is challenging and warrants careful interpretation. Future studies might shed new light by examining neural activations at both task phases, by additionally modeling reaction times using a drift-diffusion approach, or by choosing a task design that allows independent manipulations of these phases and associated model parameters, for example, by using different reward magnitudes during reinforcement learning, or by studying outcome sensitivity without decision-making.

One aim of developmental investigations is to identify the emergence of brain and cognition dynamics, such as the hippocampal-dependent and striatal-dependent memory systems, which have been shown to engage during reinforcement learning depending on the delay in feedback delivery. Our longitudinal study partially confirmed these brain-cognition links in middle childhood but with less specificity as previously found in adults.

An early existing memory system dynamic, similar to that of adults, is relevant for applying reinforcement learning principles at different timescales. In scenarios such as in the classroom, a teacher may comment on a child’s behavior immediately after the action or some moments later, in par with our experimental manipulation of 1 s versus 5 s. Within such short range of delay in teachers’ feedback, children’s learning ability during the first years of schooling may function equally well and depend on the striatal-dependent memory system. However, we anticipate that the reliance on the hippocampus will become even more pronounced when feedback is further delayed for longer time. Children’s capacity for learning over longer timescales relies on the hippocampal-dependent memory system, which is still under development. This knowledge could help to better structure learning according to their development. Furthermore, probabilistic learning from delayed feedback may be a potential diagnostic tool to examine the hippocampal-dependent memory system during learning in children at risk. Environmental factors such as stress (Schwabe and Wolf, 2013) and socioeconomic status (Raffington et al., 2019; Hackman et al., 2010) have been shown to affect hippocampal structure and function and may contribute to a heightened risk for psychopathology in the long term (Frodl et al., 2010; Lucassen et al., 2017; Rahman et al., 2016). Deficits in hippocampal-dependent learning may be particularly relevant to psychopathology since dysfunctional behavior may arise from a tendency to prioritize short-term consequences over long-term ones (Levin et al., 2018; Von Siebenthal et al., 2017) and from the maladaptive application of previously learned behavior in inappropriate contexts (Maren et al., 2013). Interestingly, poor learners showed relatively less value-based learning in favor of stronger simple heuristic strategies, and excluding them modulated the hippocampal-dependent associations to learning and memory in our results. More studies are needed to further clarify the relationship between hippocampus and psychopathology during cognitive and brain development.

Another key question is whether developmental trajectories observed cross-sectionally are also confirmed by longitudinal results, such as for the learning rate and inverse temperature. Our results show developmental improvements in these learning parameters in only 2 years. This suggests that the initial 2 years of schooling constitute a dynamic period for feedback-based learning, in which contingent feedback is important in shaping behavior and development.

People are also reading