Updating Prospective Self-Efficacy Beliefs About Cardiac Interoception in Anorexia Nervosa: An Experimental and Computational Study

Patients with anorexia nervosa (AN) typically hold altered beliefs about their body that they struggle to update, including global, prospective beliefs about their ability to know and regulate their body and particularly their interoceptive states. While clinical questionnaire studies have provided ample evidence on the role of such beliefs in the onset, maintenance, and treatment of AN, psychophysical studies have typically focused on perceptual and ‘local’ beliefs. Across two experiments, we examined how women at the acute AN (N = 86) and post-acute AN state (N = 87), compared to matched healthy controls (N = 180) formed and updated their self-efficacy beliefs retrospectively (Experiment 1) and prospectively (Experiment 2) about their heartbeat counting abilities in an adapted heartbeat counting task. As preregistered, while AN patients did not differ from controls in interoceptive accuracy per se, they hold and maintain ‘pessimistic’ interoceptive, metacognitive self-efficacy beliefs after performance. Modelling using a simplified computational Bayesian learning framework showed that neither local evidence from performance, nor retrospective beliefs following that performance (that themselves were suboptimally updated) seem to be sufficient to counter and update pessimistic, self-efficacy beliefs in AN. AN patients showed lower learning rates than controls, revealing a tendency to base their posterior beliefs more on prior beliefs rather than prediction errors in both retrospective and prospective belief updating. Further explorations showed that while these differences in both explicit beliefs, and the latent mechanisms of belief updating, were not explained by general cognitive flexibility differences, they were explained by negative mood comorbidity, even after the acute stage of illness.

While however such psychophysical studies have focused on 'local' and 'retrospective' measures (e.g., trial-by-trial confidence-accuracy correspondence; Fleming & Lau, 2014;Garfinkel et al., 2015;Rouault, McWilliams et al., 2018), clinical traditions usually employ questionnaires (e.g., the Metacognitions Questionnaire; Cartwright-Hatton & Wells, 1997;Wells & Cartwright-Hatton, 2004) to sample explicit global, retrospective and prospective, metacognitive beliefs that have been found to be critical for the onset and maintenance of AN (for systematic reviews see: Palmieri et al., 2021;Sun et al., 2017).For example, metacognitive beliefs such as positive beliefs about worry and negative beliefs about thought uncontrollability and danger predict the drive for thinness in AN (Davenport et al., 2015;McDermott & Rushford, 2011;Palomba et al., 2017).Moreover, metacognitive dysfunctions in the form of ruminations over distorted cognitions pertaining to food, weight, and shape hinder the ability to engage in helpful cognitive processes such as problem solving (Safdari et al., 2013;Tchanturia et al., 2013).Patients may also show aberrant explicit beliefs about their illness and its causes (termed clinical insight; David, 2004).Insight deficits, common in restrictive AN (Greenfeld et al., 1991;Konstantakopoulos et al., 2011Konstantakopoulos et al., , 2012)), indicate a specific metacognitive basis (Arbel et al., 2013).Additionally, beliefs about one's capacity to succeed in, or cope with different situations and contexts (termed self-efficacy; Bandura, 1977) can be affected in EDs (Goodrick et al., 1999;O'Leary, 1985).More recently, network analysis studies suggest that metacognitive beliefs, such as 'body mistrust' may also determine the association between interoceptive ability and ED symptomatology (Olatunji et al., 2018;Monteleone & Cascino, 2021).
Yet, despite the frequent association of interoception and metacognition deficits with AN (Cooper et al., 2021;Jenkinson et al., 2018;Khalsa et al., 2022), theoretical insights from psychophysical paradigms have not been integrated with insights from clinical studies on explicit, clinicallyrelevant beliefs.Moreover, while the importance of global (mostly retrospective) metacognition in mental health is getting some recognition among experimental traditions (Seow et al., 2021), such insights have not been extended to prospective beliefs, or to interoception research in EDs (see Stephan et al., 2016 for a first theoretical proposal and model in relation to depression and self-efficacy).Bridging these gaps was the central aim of our interdisciplinary study.
Specifically, we used a unifying Bayesian, computational approach (Friston, 2010;Petzschner et al., 2017;Smith et al., 2020;Smith, Mayeli et al., 2021;Stephan et al., 2016) to study under one continuous framework how interoceptive perception and local metacognition influence the updating of explicit global, prospective beliefs about interoception.We have previously utilised this approach in different contexts to characterise the continuity between perceptual and metacognitive beliefs in self-awareness (Kirsch et al., 2021;Krahé et al., 2024).Here, we applied this unifying Bayesian approach to explicit, prospective and retrospective capability beliefs in the interoceptive domain.We assessed how information from one's performance on a heartbeat counting task (HCT; Schandry, 1981) and related local and global retrospective beliefs about this performance (the level of accuracy-confidence correspondence) are combined to inform the updating of explicit, prospective metacognitive beliefs about one's ability to monitor cardiac signals in AN.These investigations deepen our understanding of AN, shedding light on both the perception and evaluation of bodily signals in the here-and-now experience of the patient ("How well did I feel my internal sensations?"), but also on the processes that allow patients to use such 'local' perception and evaluation to update their 'global' interoceptive ability beliefs ("How well do I perceive my internal sensations in general?").
Across two experiments we investigated how women at the acute AN stage, and post-acute AN phase (p-AN) and age-, ethnicity-and gender-matched healthy controls (HC) update their prospective (self-efficacy) beliefs about their heartbeat detection abilities after engaging in a modified HCT (Schandry, 1981).Comparisons across these three groups allowed us to disentangle state (e.g., changes present only during the acute AN phase as secondary neurocognitive, psychological, and physiological consequences to prolonged malnutrition) from trait mechanisms (premorbid deficits present in at-risk individuals, or deficits that endure beyond the acute phase, present during remission).While some theoretical (Barca & Pezzulo, 2020) and formal (Smith et al., 2020;Smith, Mayeli et al., 2021) approaches in EDs have used a similar Bayesian framework to characterise disruptions in interoceptive Bayesian inference in the perceptual domain (see Smith, Kirlic, et al., 2021 for a recent computational study showing precision-weighting differences between clinical groups, including a small (N = 14) ED sample, and healthy controls; and see Lavalley et al., 2023 for a recent replication), to our knowledge this approach has not been applied to explicit and prospective, or counterfactual metacognitive beliefs, typically identified as aberrant in EDs (see above).Thus, we developed a simplified interoceptive belief updating task and a corresponding Bayesian modelling approach to examine the key parameters involved in belief updating in the cardiac domain.When such metacognitive beliefs need to be updated, various sources of evidence, and corresponding precision and learning rate parameters are involved, and these include not only sensory signals and related beliefs but also cognitive beliefs about the underlying sensory beliefs and their precision (e.g., Kirsch et al., 2021;Krahé et al., 2024;Lavalley et al., 2023;Smith et al., 2020).Specifically, here we considered that the updating of explicit metacognitive beliefs can be influenced by at least two key sources of 'evidence': first, the perceptual performance itself (e.g., one's actual accuracy) and second, one's global retrospective beliefs about such performance (e.g., how accurate one thought they were after the task ends).Similarly, there can be different sources of 'evidence precision' and here we considered two experimental measures as 'proxies' for such precision -namely, individuals' confidence about their performance during the task (which can be regarded as state-like beliefs about the accuracy of their interoceptive abilities) and individuals' self-reported interoceptive abilities in everyday life as measured via a standardised questionnaire (which are more likely to be trait constructs, reflecting global beliefs about interoceptive abilities).This comparison of proxy measures of 'evidence' and of 'precision of evidence' allowed us to identify which combination best approximated the actual posterior beliefs of the participants.Crucially, we were able to explore if the clinical groups differ from controls in how much they take the 'evidence' into account (i.e., how much precision goes to the evidence vs. prior) when updating their prospective beliefs regarding cardiac interoceptive abilities.Moreover, although these two measures used as evidence precision proxies may appear as different at face value, they are the two measures we had of how people subjectively and retrospectively evaluate their interoceptive abilities retrospectively, either in everyday life (a traitlike measure of everyday, subjective evaluation of one's interoceptive abilities), or in the lab.Thus,

METHODS: EXPERIMENT 1 PARTICIPANTS
The sample consisted of N AN = 51, N p-AN = 47, and N HC = 63 women aged between 18 and 45 (full details on eligibility criteria, participant characteristics, and recruitment sites in Supplementary Material and Table S1).AN patients met the restrictive subtype AN DSM-5 criteria (American Psychiatric Association, 2013) and had a BMI < 18.5.Given growing concerns around weightrestoration criteria (e.g., Harrop et al., 2021;Khalsa et al., 2017;Lebow et al., 2018;Ralph et al., 2022), we chose a combination of objective and clinical criteria to best represent the patients' clinical reality (see also Jenkinson et al., 2023).Therefore, instead of relying only on BMI criteria, which may inadequately reflect the clinical complexity of the AN recovery stages and symptom evolution, p-AN participants were eligible if they no longer met the DSM-5 criteria for restrictive subtype AN criteria according to their psychiatrist and met at least two of the following: BMI > 16.5, clinical and behavioural signs of AN recovery (e.g., no restrictive eating patterns) for at least 6 months, and/or a global Eating Disorders Examination Questionnaire (EDE-Q; Fairburn & Beglin, 1994) score <4.Additionally, if an p-AN participant had a BMI between 16.5 and 18.5 their clinical status was further confirmed by their experienced clinical team.HCs had a BMI between 18.5 and 25 and were excluded if they or a first-degree relative had an ED history.

DESIGN AND DATA ANALYSIS
We used a revised version of the existing Heartbeat Counting Task (HCT; Schandry, 1981) to measure interoceptive belief updating.The task included the traditional measure of interoceptive accuracy, hereafter referred to as Performance, and three additional measures to examine participants' beliefs about their performance before and after completing the HCT.These measures were participants' (1) Prior Prospective Self-Efficacy Beliefs (i.e., how well they think they will do on the HCT), (2) Posterior Retrospective Self-Efficacy Beliefs (i.e., how well they think they performed on the task), and (3) Post-False Feedback Retrospective Self-Efficacy Beliefs (i.e., participants' second retrospective evaluation of their Performance after receiving arbitrary feedback).In other words, after participants gave their Posterior Retrospective Self-Efficacy Belief, half of the participants were told they did much better than the others, while the other half were told they did much worse.Participants then rated their Performance retrospectively.These measures allowed to examine how prospective beliefs about HCT are generated prospectively and how they are updated retrospectively, after completing the HCT.
In a linear regression we first examined the effect of Group (independent variable; IV) on Prior Prospective Self-Efficacy Beliefs (dependent variable' DV), expecting the AN group to be significantly more pessimistic about their heartbeat counting abilities than the HCs.Next, using a linear regression, we examined differences in Performance scores, expecting to not find significant group differences.To obtain a Performance percentage score we used the following Schandry (1981) transformation (1). 1 1 * 100 3

Recorded Heartbeats Counted Heartbeats Performance
Recorded Heartbeats Typically, Performance scores range from 0 (worst Performance) to 1 (best Performance), but here we multiplied the score by 100 to maintain consistency with our other measures, namely participants' Prior Prospective and Posterior Retrospective Self-Efficacy Beliefs.
Then, in a linear regression we assessed whether the three Groups (IV) differed in their Posterior Retrospective Self-Efficacy Beliefs when controlling for the Prior Prospective Self-Efficacy Beliefs and Performance.We also calculated the difference between Performance and Prior Prospective Self-Efficacy Belief (i.e., Prediction Error) and examined how Prediction Error explained the AN group's Posterior Retrospective Self-Efficacy Beliefs, and the between-group differences in Prediction Error.Finally, in within-group analyses we assessed whether false feedback influenced participants' Post False-Feedback Retrospective Self-Efficacy Beliefs (after first controlling that positive false feedback was not randomly given only to participants with higher Performance scores and negative false feedback to participants with low Performance scores; see Supplementary Material, Table S4).
Exploratory regressions with psychometric traits and clinical characteristics (e.g., illness duration and severity) were run but these are presented in detail in the Supplementary Material (see Tables S5-S7) given that the purposes of Experiment 1 (also preprinted and available here: https://psyarxiv.com/rntsf/; Saramandi et al., 2022) were to present our key results upon which we contextualised and based our preregistered Experiment 2.

MAIN EXPERIMENTAL MEASURES AND PROCEDURE
Following baseline, demographic, and psychometric assessments (see below), participants wore the Polar heart rate monitor (model RS 800CX; see Emanuelsen et al., 2015;Fischer et al., 2016) on their left wrist and the heart rate monitoring throughout the experiment was explained to them.
Next, participants silently sat on a chair with their legs uncrossed and their wrist gently resting on the table in front of them to obtain a 5-min baseline recording of their heart rate (used in Experiment 1 as a control measure; see Supplementary Material, Tables S1, and S3).
Then, participants provided a Prior Prospective Self-Efficacy Belief estimate and proceeded to complete the HCT in the same, relaxed position, with their eyes open or closed (depending on what felt comfortable to them), as they were during the baseline heart rate measurement.Participants were asked to not attempt any physical manipulation to facilitate heartbeat detection and only report the number of heartbeats they actually felt rather than guess how many heartbeats they think they felt.Participants completed three heartbeat counting trials (25s, 45s, and 65s, with 30s rest breaks in between) in a randomised order between participants and information about the length of counting phases or participant Performance was not given.Participants were prompted with 'Go' and 'Stop' signals at the start and end of each counting phase, respectively, and then verbally reported the number of felt heartbeats.After completing the HCT participants rated their Performance (Posterior Retrospective Self-Efficacy Belief).Finally, participants were given arbitrary false feedback regarding their Performance and were asked to provide a further estimate, namely the Post False-Feedback Retrospective Self-Efficacy Belief (due to clinical time constraints this measure is missing from N = 32 participants).Participants were fully debriefed at the end of the task and told that the feedback was for experimental purposes only, and not a reflection of their actual Performance during the task.

SUMMARY OF EXPERIMENT 1 RESULTS WHICH LED TO EXPERIMENT 2 AIMS AND DESIGN
Broadly, our preliminary findings suggest that AN patients have low self-efficacy about their cardiac, interoceptive abilities before they even engage with a task (prospectively), and seem to not be updating their self-efficacy beliefs retrospectively, despite not finding evidence of Performance group differences.Instead, they somehow adhere to their prospective, self-efficacy beliefs.Based on these results, we enhanced our task and preregistered the following experimental and computational study to investigate how AN, or p-AN groups compared to HCs use nested, local and global retrospective beliefs about HCT performance to update their explicit, prospective beliefs about their related abilities.

METHODS: EXPERIMENT 2 PARTICIPANTS
This experiment had a non-overlapping sample to Experiment 1 and the same eligibility (details on participant characteristics and recruitment sites in Supplementary Material).A total of N AN = 40, N p-AN = 40, and N HC = 121 participants were screened.Following exclusions (see Supplementary Material), the final sample consisted of N AN = 35, N p-AN = 40, and N HC = 117 participants (see Tables 1  and S8 for details on demographics and clinical characteristics).

BELIEF UPDATING TASK DESIGN AND MEASURES
Building upon the findings of Experiment 1, in Experiment 2, we examined how these self-efficacy beliefs are updated prospectively, when participants had to estimate their cardiac interoceptive abilities about a future HCT performance.Therefore, Experiment 2 examined how AN and p-AN, compared to HC women, form and update Prospective Self-Efficacy Beliefs (see Figure 1 and below) about interoception before and after completing a modified version of the HCT (Schandry, 1981).We used the measure of Performance as in Experiment 1, and three additional selfefficacy measures, each with corresponding subjective confidence ratings.These measures included participants' (1) Prior Prospective Self-Efficacy Beliefs (i.e., how well they think they will do on the HCT, as in Experiment 1), (2) Posterior Retrospective Self-Efficacy Beliefs (i.e., how well they think they performed on the task, as in Experiment 1), and (3) Posterior Prospective Self-Efficacy Beliefs (i.e., how well they think they would do in the future in the HCT; these self-efficacy beliefs about performance, explicitly sampled here for the first time, are identical to the sampled Prior Prospective Self-Efficacy Beliefs in that they require the participant to assess their future performance abilities) and hence allow us to examine how such prospective, global beliefs are updated after performance and local and global retrospective beliefs are generated; Figure 1).The Posterior Prospective Self-Efficacy Belief was the average of two scores: the first was a prospective rating of how well participants thought they would do if the four trials they had just completed were of half the duration in the future, while the second one asked them to rate how well they would do if the trials were of double the duration.In Experiment 2 participants did not receive arbitrary false feedback (as they did in Experiment 1) and thus we did not obtain a measure of Post False-Feedback Retrospective Self-Efficacy Beliefs.We also introduced various control measures (see below).

Behavioural
After Experiment 1, we examined why AN patients struggle to update their 'pessimistic' prospective self-efficacy beliefs despite a comparable performance to HCs on formal interoceptive tasks.Thus, the updating of these prospective, self-efficacy beliefs was the main measure of interest in Experiment 2 and the primary focus of our computational modelling analyses (see below).As preregistered, we predicted that AN and p-AN participants would have lower Posterior Prospective Self-Efficacy Beliefs than HCs, despite predicting that we would find no evidence of group differences on Performance (as in Experiment 1).
To examine Group differences in Performance (calculated using the aforementioned Performance score transformation; see equation (1); Schandry, 1981), we ran a preregistered multilevel model analysis (MLM).As preregistered, we also controlled for knowledge about heartbeats, time  Linear regressions were run to examine group differences, with HC as the intercept.As expected, we found between group differences on BMI and EDE-Q -such differences are axiomatic to our groups and consistent with our inclusion criteria.As we also observed expected group differences in psychometric traits, these were taken into account in our main analyses ( In addition to a frequentist approach, we supplemented our analysis with a preregistered Bayesian analysis, which presents the ratio of the likelihood of the alternative hypothesis relative to the likelihood of the null hypothesis.A Bayes Factor (BF 10 ) > 3 indicates evidence for the alternative hypothesis, whereas a BF 10 < 3 indicates evidence for the null hypothesis.A BF 10 between 0.3 and 3 indicates an inconclusive result which is not in favour of either hypothesis (Carey et al., 2021;Kass & Raftery, 1995).
Next, we tested between-group differences in self-efficacy beliefs after completing the HCT.
To do this, and as preregistered, we assessed the effect of Group on Posterior Prospective Self-Efficacy Beliefs, using Age as a control variable, and Study Site as a random effect (given our multi-site testing, see Supplementary Material).In preregistered, exploratory analyses we tested whether differences in traits and behaviours often seen in the AN population and found in the present study (e.g., depression, and anxiety; Table 1), explained the group differences in selfefficacy beliefs (details in Supplementary Material, see Figures S1 and S2, and Table S11).The analyses were conducted following the Baron and Kenny (1986) mediation analysis steps, as outlined in detail in the Supplementary Material.In non-preregistered analyses we explored the role of set-shifting difficulties, as measured via the Wisconsin Card Sorting Test (WCST; Grant & Berg, 1948; results in Supplementary Material), on Posterior Prospective Self-Efficacy Beliefs.
The WCST is used to assess cognitive flexibility and set-shifting: individuals need to categorise response cards based on different, shifting criteria, e.g., colour and shape (Kopp et al., 2021;Westwood et al., 2016).

Computational Modelling
The behavioural analyses were complemented with preregistered modelling analyses to account for the role of the nested nature of prior prospective and retrospective beliefs in the updating of such posterior prospective beliefs (see Introduction) and other parameters such as precision and learning rate.We first examined which model best predicted our key measure (Posterior Prospective Self-Efficacy Beliefs) by constructing and comparing between models that included the scores of different proxy-measures for evidence and for the precision of this evidence.We then compared how the winning model predicted our groups' actual posterior prospective beliefs.Furthermore, we examined our clinical groups' learning rate (using the winning model's measures of 'precision of prior beliefs' and 'precision of evidence'; see below), expecting it to be lower than the HCs' when controlling for more general mental flexibility deficits.In a preregistered analysis, we then compared the learning rates expected by the equations of the winning model against the actual learning rates performed by the groups (i.e., absolute difference between the actual and precision-weighted learning rates) to assess if there are statistically significant group differences.Therefore, we were able to examine the Bayesian optimality of our groups' learning rates based on the assumption -under a Bayesian belief updating mechanism -that an actual learning rate closer to the precision-weighted learning rate (where the latter describes the relative importance of evidence versus prior beliefs) is suggestive of a more Bayesian optimal learning (see below).
Specifically, we computed a posterior self-efficacy belief (m q|y ) using a generic Bayesian equation for belief updating when receiving new information (or, evidence) under a Gaussian model with conjugate prior (Friston, 2017;Mathys et al., 2014;Kirsch et al., 2021) ( ) where, m q|y was the posterior self-efficacy belief, m q was the prior prospective self-efficacy belief, y was the evidence (different measures per model; see below), p q was the proxy for the precision of the prior prospective self-efficacy belief, and p e was the proxy for the precision of the evidence (different measures per model; see below).Specifically, this equation allowed us to create a set of four target models to examine our hypotheses (see Supplementary Material Table S16 for full model description; and see Figures S7-S9 for model description and validation).For all four models we used participants' Prior Prospective Self-Efficacy Beliefs as the prior (m q ), and respective Prior Prospective Confidence estimates as a precision proxy of the prior (p q ).These models differed in the measures that were used as evidence (y; Performance versus Posterior Retrospective Self-Efficacy Beliefs) and as evidence precision proxies (p q ; Performance Confidence versus EDI-3-ID; Garner, 2004; the EDI-3-ID scores were rescaled as a success percentage rate to maintain consistency with the scoring of the other measures, e.g., Performance Confidence and Performance).For each model, we computed the Learning Rate (λ; also known as the Bayesian precision ratio) per participant: We also created two sets of baseline models (to validate our target models); the first two models assumed a perfect learning rate (λ =1, the participant uses the evidence as their posterior belief) and the third model assumed no learning (λ = 0, the participant uses the prior as their posterior belief).These baseline models were created as validation for the four main models, representing the boundary/extreme cases of learning rates being 1, or 0, instead of being modelled using the precision proxies for the prior and the evidence.They are presented in full in the Supplementary Material (Table S17).
For all the models we computed the Bayesian Information Criterion (BIC) and Mean Absolute Error (MAE) to measure model fit.As preregistered, we initially examined which variable was the best measure of evidence, in the HCs only, by comparing the fit of the model that used Performance Confidence as an evidence precision proxy and 'Posterior Retrospective Self-Efficacy Beliefs' as evidence with the alternative learning model that used the same precision proxy but Performance as evidence, predicting that the former model would show a better fit than the alternative learning model, particularly in the HCs (Prediction A).We repeated this across groups, and within each clinical group separately and in a non-preregistered analysis we explored the winning model's validity (details in Supplementary Material and Table S18a).
For the measure of evidence which was associated with the best model fit from the previous analyses, we performed a preregistered further modelling step wherein we examined whether subjective confidence ratings (Performance Confidence) or trait measures of interoceptive sensibility (EDI-3-ID; Garner, 2004) when used as precision proxies, best captured our groups' belief updating, expecting the clinical groups to be more influenced by the trait measures than HCs (Prediction B).We also complemented this precision-proxy comparison with two non-preregistered analyses to explore the winning model's validity (details in Supplementary Material and Table S18b).
Next, we examined the between-group differences on precision-weighted Learning Rates.
Assuming that the Bayesian Model of choice is a representation of actual learning, then a higher Learning Rate would suggest that participants consider the Prediction Error to a greater degree when updating their beliefs.We calculated one precision-weighted Learning Rate per precision proxy and looked at group differences in two separate analyses, expecting the clinical groups' Learning Rates to be lower and less Bayesian optimal (i.e., greater absolute difference between participants' actual and precision-weighted Learning Rates) than the HCs' (preregistered Prediction C).Actual Learning Rates (λ Actual ) were calculated using the following equation: λ Actual = (m q|y -m q )/(ym q ), (4) where m q|y represents participants' Posterior Prospective Self-Efficacy Beliefs, m q represents participants' Prior Prospective Self-Efficacy Beliefs, and y represents participants' Posterior Retrospective Self-Efficacy Beliefs.Given the effect of Depression and Stress on self-efficacy beliefs (see behavioural results), in non-preregistered analyses we explored whether Depression and Stress scores mediated participants' precision-weighted Learning Rates.
Finally, as preregistered, we examined group differences on actual Learning Rates, also expecting the Learning Rates of the clinical groups to be lower than those of the HCs (Prediction D; results from this analysis are presented in full in the Supplementary Material, Table S20).

EXPERIMENTAL MATERIALS AND PROCEDURE
Following baseline and demographic assessments (see below), participants were given an Empatica E4 watch (a medical-grade wearable device that records real-time physiological data; Empatica Srl, Italy; see https://www.empatica.com/research/e4/)to wear on their left wrist.The rest of the procedures were identical to those described in Experiment 1 with the following exceptions.In addition to a 5-minute baseline recording of heart rate, we also obtained a recording of heart rate variability (HRV; used in Experiment 2 as control measures; see Table 1, Supplementary Material and Table S3).The HCT instructions were the same, but here we also added one more counting phase of 35s, and after participants reported how many heartbeats they felt they also provided a confidence estimate (ranging from 0, not at all confident to 100, extremely confident) on the accuracy of each response (hereafter referred to as Performance Confidence).Participants also completed a time-estimation task before completing the heartbeat counting trials; they silently counted seconds until prompted to stop and then verbally reported how many seconds they counted (used in control analyses; see Supplementary Material).The duration of the timeestimation trials matched that of the heartbeat counting trials and they were also presented in a random order between participants.
At the end of the HCT participants reported how many heartbeats they think they typically have when at rest, and the general population average (per minute).The answers were used in control analyses (see Supplementary Material).Participants were not given any feedback on trial length or performance at any point.Finally, participants completed a series of psychometric questionnaires and cognitive flexibility task (see Supplementary Material) and were fully debriefed at the end.

HCT Performance did not differ significantly between our three groups
As predicted, the frequentist analysis did not yield a significant result (Figure 2; Table 2) on group differences in Performance.Moreover, the Bayes Factor analysis suggested that there is moderate evidence for equivalence regarding our groups' Performance (BF 10 = 0.28), indicating that as tested here, the AN and p-AN groups did not perform differently than HCs.We then ran preregistered control analyses to account for potential confounding variables and the pattern of results remained the same (see Supplementary Material and Table S10).

UPDATING PROSPECTIVE BELIEFS: BEHAVIOURAL AND COMPUTATIONAL ANALYSIS
AN participants gave significantly lower Posterior Prospective Self-Efficacy Beliefs compared to HCs, and the same effect was present as a statistical trend in the p-AN group (Figure 3; Table 3).That is, both clinical groups expected, on average, to perform worse in a future HCT with half or double the available time, compared to HCs.To further examine what explained this observed 'pessimism' in our clinical groups, we examined the potentially mediating effect of comorbid traits and behaviours (using all the variables in which we found a significant group difference in Table 1).
Only depression and stress, as measured via the Depression, Anxiety and Stress Scale (DASS-21; Lovibond & Lovibond, 1995) explained the clinical populations' pessimistic Posterior Prospective Self-Efficacy Beliefs (see Supplementary Material and Table S11a for details and see follow-up, exploratory analyses below).We also explored whether a more general set-shifting difficulty could have explained the pessimistic beliefs of the clinical groups (in comparison to the HCs') but found no significant effect of the WCST performance on Posterior Prospective Self-Efficacy Beliefs (Table S11b).We then examined, via computational modelling, the parameters which could affect the formation of these posterior prospective beliefs.Identifying the best measure for precision proxy of the evidence (Preregistered Prediction B) In a further modelling step, we examined which precision proxies best captured our groups' belief updating when Posterior Retrospective Self-Efficacy Beliefs were used as the evidence (winning model from above) and found that the model which used EDI-3-ID as a precision proxy (vs.our other competing precision proxy; Performance Confidence) was our winning model (both across groups and within each group; Figure 4 as predicted (see Supplementary Material and Table S18b for non-preregistered parameter validation analysis).

Group differences on precision-weighted Learning Rates and Bayesian Optimality with winning precision proxy of evidence (Preregistered Prediction C) and with the alternate precision proxy of evidence
Finally, for prediction C, we performed two further steps.Firstly, we looked at between-group differences in the precision-weighted Learning Rates (for both precision proxies of the evidence, given the inconclusive results of prediction B, but we Holm-corrected the p values due to the multiple comparisons).Secondly, we looked whether the Bayesian optimality of this rate, differed between groups, by assessing the absolute difference between our groups' actual and precisionweighted Learning Rates (|λ Actual -λ|).Specifically, we ran two separate linear regressions to explore the Group effect on the precision-weighted Learning Rate.When using Performance Confidence as an evidence precision proxy to compute participants' Learning Rates, we found that AN participants but not p-AN had a significantly lower Learning Rate than HCs (Table 4a; Prediction C).We found no significant between-group differences on Learning Rates when using EDI-3-ID as a precision proxy (Table 4a).We then ran two separate linear regressions to examine the optimality of the groups' Learning Rates (calculated as the absolute difference between their actual and precisionweighted Learning Rates).However, when examining how the actual learning rates approximated the precision-weighted Learning Rate (with Performance Confidence as the precision proxy of the evidence), we did not find a significant difference (Table 4b).This result suggests that while we have significant group differences on precision-weighted Learning Rates, we cannot suggest that the learning rate mechanism of the AN patients is less Bayesian optimal than that of the HCs.

Exploring the Role of Depression and Stress
Given the effects of Depression and Stress on the clinical groups' Posterior Prospective Self-Efficacy Beliefs (see Supplementary Material and Table S11) as well as the group differences on these psychometric trait measures (step 1 of the mediation analysis; see results in Table 1 and mediation analysis steps in the Supplementary Material), in non-preregistered analyses we explored whether Depression and Stress also explained the difference between the AN and HC groups' precisionweighted Learning Rates (when using Performance Confidence as an evidence precision proxy).First, we ran a linear regression using Depression and Stress as predictor variables (step 2 of the mediation analysis; see Supplementary Material): Learning Rate lowered as Depression and Stress scores increased (Table 4c).We then explored whether Depression and Stress explained the group effect (step 3 of the mediation analysis; see Supplementary Material): we found a non-significant trend of Depression and a significant effect of Stress on Learning Rates, which would explain the difference between the AN and HC groups' Learning Rates.We complemented with Holmcorrected linear regressions on the AN and p-AN groups separately, using Depression and Stress scores as predictors to examine trait and state effects.Higher Depression scores significantly influenced the p-AN group's Learning Rates.Although we found a non-significant effect on the AN group, we suggest this is due to the already high Depression scores within the entire sample (same as in the self-efficacy beliefs) which we discuss in detail in the Discussion.Next, given that the precision-weighted Learning Rates were computed using Performance Confidence as a precision proxy of evidence and that the AN group had a lower Performance Confidence than HCs (Table S15b), we also examined whether Depression and Stress mediated the group differences on Performance Confidence.Higher Depression and Stress scores explained lower Performance Confidence Estimates.This suggests that as the uncertainty on the evidence increases, there is less evidence-and more prior-weighting, explaining the observed lower Learning Rate of the AN group in comparison to that of HCs.For control purposes we examined whether these findings also applied when looking at the effects of Depression and Stress on the actual Learning Rates and not only on the modelled, precision-weighted Learning Rates, but we found a non-significant trend of Depression scores on actual Learning Rates, and no significant effect of Stress.Although Depression and Stress influenced the precision-weighted Learning Rates, they did not significantly influence the Learning Rates' Bayesian optimality (Table 4c).

GENERAL DISCUSSION
We examined how patients in the acute and post-acute anorexia nervosa phase (AN and p-AN, respectively), compared to HCs update their self-efficacy beliefs about their heartbeat counting abilities.In Experiment 1, AN patients showed lower self-efficacy beliefs than HCs before (prospectively) and after the task (retrospectively), despite performing comparably to HCs.Our preregistered Experiment 2 aimed to examine specifically how such pessimistic prospective beliefs are formed and maintained.As predicted, although AN patients performed comparably to HCs, they were more pessimistic than HCs when asked how they would do in a similar task in the future.Furthermore, computational analyses, revealed that AN patients seem to rely more on their pessimistic retrospective beliefs about interoceptive performance rather than their actual performance when updating their beliefs prospectively.AN patients also show low confidence in the accuracy of their performance, which when accounted for reveals a smaller learning rate in AN patients than controls, indicating that they make less use of prediction errors.Interestingly, the critical parameters revealed by our computational analyses were associated more with mood than with cognitive traits.
As expected, our groups did not perform differently to each other in the actual HCT (also supported by Bayesian equivalence testing) in either experiment (and in line with most studies of similar populations using the HCT; e.g., Kinnaird et al., 2020;Lutz et al., 2019;Fischer et al., 2016; although contrary findings also exist, e.g., Pollatos et al., 2008Pollatos et al., , 2016)).Study discrepancies may stem from the HCT's noted low validity and reliability and its confounds (Brener & Ring, 2016;Desmedt et al., 2018Desmedt et al., , 2020Desmedt et al., , 2022;;Legrand et al., 2022).For example, performance (interoceptive accuracy) may differ depending on task demands, time estimation abilities and heartbeat knowledge.
Here, control analyses on this confounding variables (see Supplementary Material) suggest these factors are unlikely to have influenced Performance in our sample (in line with Ferentzi et al., 2022).Sampling differences (e.g., comorbidities, treatment type, illness stage; Fischer et al., 2016;Pollatos et al., 2008;Richard et al., 2019) could also account for the between-study differences on interoceptive accuracy.For example, here we only included individuals who met restrictive AN criteria and had no other ED diagnosis (American Psychiatric Association, 2013), unlike others.Notwithstanding heterogeneity in treatment stage and type, BMI and clinical profiles did not affect our findings (see Supplementary Material), suggesting that these factors were unlikely to have influenced our sample's Performance.Despite controlling for some of the possible task confounds, future studies on interoceptive metacognition in AN could use the more recently developed tasks to capture interoceptive accuracy (see Desmedt et al., 2023;Garfinkel et al., 2022;Harrison et al., 2021;Legrand et al., 2022;Palmer et al., 2019;Plans et al., 2021), or use pharmacological (Khalsa et al., 2009(Khalsa et al., , 2015) ) or behavioural (Fitz-Clarke, 2007;Smith, Feinstein et al., 2021) heart rate manipulations to increase the signal strength.We note however, that as our emphasis here was on testing explicit belief updating and not interoceptive accuracy per se, the HCT has the advantage of good patient acceptability and was easily understood as a brief task that one can have meaningful self-efficacy beliefs about.These self-efficacy beliefs were focused on participants' interoceptive abilities (i.e., 'How well will you feel your heartbeat'), rather than selfefficacy beliefs about everyday scenarios as typically assessed via questionnaires (e.g., "If I am in trouble I can usually think of a solution"; General Self-Efficacy Scale; Schwarzer & Jerusalem, 1995).
A key finding here is that AN patients (and p-AN patients at trend levels) do not seem to place sufficient emphasis on their performance when forming prospective self-efficacy beliefs.Hitherto, questionnaire studies consistently reveal that AN patients are characterised by higher levels of worry, rumination, and other maladaptive, metacognitive beliefs (e.g., Berman, 2006;Davenport et al., 2015;Palmieri et al., 2021), but these studies have not examined beliefs about interoceptive accuracy, or the various interwoven parameters that may underlie these beliefs, as done in our computational modelling.Moreover, we demonstrated that our experimental measure of 'Posterior Prospective Self-Efficacy Beliefs' was associated with beliefs about everyday life, such as fear of gaining weight, and less hope for symptom improvement.These findings suggest that our task had good ecological validity and examining computationally the various parameters that influence such beliefs in our task could provide insights regarding patients' everyday negative metacognitive beliefs about interoception.
Specifically, our study leads to three main insights regarding such beliefs.First, AN patients' updating of explicit, prospective beliefs was influenced more by "pessimistic" (i.e., lower) global retrospective beliefs about such performance (e.g., how accurate one thought they were after the task ends) rather than the perceptual performance itself (e.g., how accurate one was), as detailed below.Specifically, model comparisons revealed that retrospective beliefs were better at predicting posterior prospective beliefs than performance, with the winning model (the one using retrospective beliefs as evidence) being better at predicting the posterior prospective beliefs of the HCs than those of our clinical populations.This observation was also supported by poorer retrospective, interoceptive awareness (as indexed by the Interoceptive Trait Prediction Error (ITPE) z-score analyses; see Supplementary Material) in the clinical groups, compared to HCs who evaluated their performance as better than it was.Moreover, the AN had lower Performance Confidence than HCs, and combined with lower, explicit global retrospective beliefs about performance, this suggests that AN patients struggle more than HCs in incorporating new evidence (from the HCT) to update beliefs retrospectively.This pattern of findings suggests that irrespective of any individual differences in interoceptive accuracy per se, AN patients may face difficulties in drawing metacognitive conclusions about their cardiac interoceptive performance retrospectively.
Moreover, these retrospective metacognition difficulties appear to extend to prospective metacognition.Specifically, when participants needed to then use such retrospective beliefs to update their self-efficacy prospectively, AN patients showed lower precision-weighted Learning Rates (with Performance Confidence as a precision proxy of evidence) than HCs, suggesting that patients consider the 'evidence' (here, Posterior Retrospective Self-Efficacy Beliefs) less than HCs when updating such beliefs.Consequently, it is plausible that AN patients rely more on prior prospective beliefs regarding their cardiac interoceptive abilities, and this ultimately influences their explicit, global (posterior) prospective beliefs.Therefore, neither local evidence from performance, nor retrospective beliefs following that performance (that themselves were poorly updated by prediction errors, and hence may be hard to target therapeutically) seem to be sufficient to counter and update pessimistic, self-efficacy beliefs in AN.Instead, it may be the 'precision-based', mechanism of belief updating itself that requires therapeutic targeting.Indeed, it has been previously hypothesised in two, transdiagnostic studies (Lavalley et al., 2023;Smith, Kirlic, et al., 2021; see Introduction) that an inappropriate weighting of prior beliefs versus new evidence, especially regarding interoception, may lead to the manifestation of several psychiatric symptoms (Barca & Pezzulo, 2020;Barrett & Simmons, 2015;Owens et al., 2018;Paulus et al., 2019;Petzschner et al., 2017).Indeed, while pessimistic estimates about interoceptive abilities have been documented by existing studies on an anticipatory and perceptual level (e.g., Crucianelli et al., 2016Crucianelli et al., , 2021;;Khalsa et al., 2015;Lutz et al., 2019), few have envisioned faulty updating at metacognitive, and prospective cognition levels (e.g., Stephan et al., 2016).
Second, we explored the potential role of other higher-order cognitive difficulties previously noted in AN, such as disruptions in mental flexibility, abstraction or set-shifting (Lang et al., 2014;Miles et al., 2020Miles et al., , 2022;;Tchanturia et al., 2012), which could have explained the reduced belief updating in AN.However, we found that cognitive, set-shifting abilities as tested here by the Wisconsin Card Sorting Task (WCST) in a sample subset did not explain group differences in posterior beliefs or learning rates.Indeed, prior studies have shown that such domain general, 'frontal' functions cannot explain some of the delusionality in AN (see Konstantakopoulos et al., 2011;2012 for delusional body image beliefs and insight correlates in AN; see also Introduction), and Experiment 1 showed that AN patients can update their beliefs based on external (albeit random) feedback (see also Kube et al., 2022).Notwithstanding, our study did not systematically test belief updating across different modalities and thus no conclusions about the interoceptive or more general domain-specificity of our findings are warranted.
Third, depression and stress scores predicted pessimism in self-efficacy beliefs, explaining why the clinical groups who notably had greater depression and stress levels also had more pessimistic Posterior Prospective Self-Efficacy Beliefs compared to HCs.Crucially, depression (but not stress) significantly influenced the p-AN group's (but not the AN's) posterior self-efficacy beliefs (i.e., the more severe the depression, the more pessimistic the beliefs).The AN group was overall more pessimistic and had more severe depression and stress than the HCs, making it difficult to disentangle which of these factors is more predictive of their lower self-efficacy beliefs, but our results showed the overall effect of depression and stress across all groups.Depression, anxiety and stress are key comorbidities with EDs, including AN (Andrés-Pepiñá et al., 2020;Eskild-Jensen et al., 2020;Himmerich et al., 2019); our findings also support previously noted associations between depression, negative metacognitive thoughts, and ED symptoms (e.g., Cooper et al., 2007;Palomba et al., 2017;see Introduction).Notably, some research traditions could interpret these findings as though negative mood could 'explain away' the pessimistic beliefs about interoception, from a more transdiagnostic, computational psychiatry perspective, our study can elucidate the mechanisms by which depression and stress symptoms can influence belief updating (e.g., Aylward et al., 2020;Katyal et al., 2023;Kim et al., 2020;Rupprechter et al., 2018;Stankevicius et al., 2014).Specifically, we explored if and how these two traits explained the observed group effect on the precisionweighted learning rates (when using Performance Confidence as a precision proxy of evidence).Interestingly, higher uncertainty on the precision of the evidence (as indexed by lower Performance Confidence estimates) was influenced by higher depression and stress scores, which in turn also explained the lower precision-weighted learning rates of the AN group compared to HCs, adding to previously found associations between local confidence and mood disorder symptom severity (Benwell et al., 2022;Rouault, Seow et al., 2018;Seow et al., 2021).Our findings, however, suggest that mood may relate to a dysfunctional precision-weighting mechanism extending beyond local metacognition to global, prospective beliefs about one's performance.Moreover, interestingly, a recent theoretical framework has posited depression as a consequence of chronic dysregulation in interoception, suggesting a chronic low self-efficacy regarding homeostatic/allostatic control (Stephan et al., 2016).Although our study was cross-sectional and did not target interoceptive control specifically (i.e., we cannot characterise the hypothesised chronic mechanisms here), the association between depressive symptoms and pessimistic posterior beliefs of the p-AN group compared to controls (at trend-level), suggests that this dimension of illness may be an enduring AN trait, beyond the acute state.
In addition to some of the aforementioned experimental methodology limitations, there were also clinical methodology limitations.Firstly, multi-site research is subject to between-site variability relating to different assessor, clinical, cultural, and practical restrictions.To address this, meticulous protocol translations and experimenter training were ensured, and study site as a random effect did not add significant variance or affect the main results.Moreover, the multi-site approach could be regarded as a strength of our study, in line with recent efforts for more representative and diverse samples (Naddaf, 2023).Moreover, patient populations also had varying pharmacotherapy based on national guidelines, making standardised control for its effects beyond assessment via selfreport questionnaires and medical records impossible.However, future studies should consider controlling for the role of pharmacotherapy in belief updating, in line with evidence on the effects of neuromodulators (e.g., dopamine and serotonin) in prediction errors, active inference, and precision-weighting mechanisms (Auksztulewicz & Friston, 2016;Daw & Doya, 2006;Haarsma et al., 2021;Iglesias et al., 2013;Schultz et al., 1997).Finally, the study focussed mostly on white women with AN due to the availability of these patients in the collaborating clinics.We suggest that future research should consider more diverse populations when studying interoception in AN, while accounting for the noted sex differences in AN diagnostic criteria, hormones and subsequent effect on interoception (e.g., Culbert et al., 2021;Gorrell & Murray, 2019;Grabauskaitė et al., 2017;Murphy et al., 2019;Suschinsky & Lalumière, 2012).

CONCLUSION
To conclude, we investigated explicit, interoceptive belief updating in AN, focussing on cardiac awareness and related, prospective self-efficacy beliefs using a computational Bayesian Learning Framework.Despite comparable HCT performance, AN participants failed to update their interoceptive, metacognitive self-efficacy beliefs after performance.Computational modelling 1

Figure 4
Figure 4 Main model comparison.MAE (Mean Absolute Error) and BIC (Bayesian Information Criterion) are two measures used to examine model fit, with smaller values suggesting better model fit.Panel A shows the model comparison across all participants (N = 183).Panel B shows the model comparison in the healthy control group only (N HC = 114).Panel C shows the model comparison in the acute Anorexia Nervosa group only (N AN = 34).Panel D shows the model comparison in the postacute Anorexia Nervosa group only (N p-AN = 35).

Table 3
Analysis for Group Differences on Posterior Prospective Self-Efficacy Beliefs, with Age as the Control Variable and Study Site as Random Effect.

Learning Rate with EDI-3-ID as a Precision Proxy
4a: Precision-Weighted Learning Rate ComparisonsLearning Rate with Performance Confidence as a Precision Proxy

Table 4
Main output of Bayesian Belief Updating Analyses for Experiment 2. Mediation Analyses to Explore the Role of Depression and Stress on Learning Rates** (Contd.)Saramandi et al.