Introduction

There are accumulating reports of alterations in functional connectivity (FC) and neural excitability in autism spectrum disorder (ASD) (; ; ; ). However, studies that have examined these phenomena have also reported inconsistent results (; ; ). These inconsistencies may be due in part to the fact that FC and neural excitability interact in a complex way to affect ASD symptoms (; ; ).

For such issues, computational psychiatry is expected to contribute by examining information processing alterations that cause psychiatric symptoms at the system level (). In computational psychiatry, predictive processing (or predictive coding) theory is one of the most promising computational theories of perception and cognition (). Predictive processing theory is linked to findings of altered neural activity in ASD, such as FC and neural excitability, by hypotheses and explanations presented using a neural network modeling approach (; ; ; ). However, biological validation of these hypotheses is still challenging. This is because, while there are some challenges (; ), there is no well-established methodology to validate simulation of a developmental learning process in which multiple parameters interact to cause ASD symptoms.

In the current study, we propose a novel framework for neural network modeling research that is followed by a biological application study. We first perform neural network simulation focusing on model parameters, which are assumed to be related to ASD, i.e., FC and neural excitability. Then, we investigate whether these parameters interact to cause ASD-like performance, i.e., failures in facial emotion recognition or generalization. Next, results of the neural network simulation were applied to actual functional magnetic resonance imaging (fMRI) datasets from ASD patients. By assuming that FC and neural excitability parameters of the neural network model correspond to functional connectivity and regional homogeneity (ReHo) () in fMRI, a neural network model with particular parameter conditions is assumed to represent a particular subject with corresponding brain characteristics of fMRI. We then examine whether subject subgroups corresponding to neural networks that exhibit ASD-like performance actually show ASD diagnoses/symptoms.

Methods

Overview of neural network simulation

The framework of the study is described in Figure 1. The neural network model is trained to minimize the precision-weighted prediction error of the next step value in the target sequence obtained from facial expression videos, based on predictive processing theory. It is important to note that no emotion labels are given to this model. Whether this model is able to recognize emotions is evaluated by observing the “self-organization” of clusters for each emotion in the higher-level neuron, i.e., parametric bias (PB) space, which will be explained later. The experimental procedure for evaluating ASD-like performance consists of a training phase, which is analogous to the developmental learning process, and a test phase, which is analogous to the emotion recognition process from unknown facial expressions.

Figure 1 

Framework of the current study and network architecture.

The research framework consists of a simulation study and an fMRI study. First, we performed a simulation study using S-CTRNNPB, which models a biological brain based on predictive processing theory. Effects of S-CTRNNPB parameters (experimental conditions) on ASD-like performance were investigated. Second, we mapped the S-CTRNNPB parameter set to the fMRI parameter set. We then examined whether the relationship between the S-CTRNNPB parameters and ASD-related performance measures could also be found between fMRI parameters and ASD diagnosis/symptoms.

Experimental conditions for the simulation study are FCmodel and Neural excitability homogeneity. FCmodel is the proportion of synaptic connections between neurons of different hierarchical level, set at 20–100%. In the figure, solid and dashed blue (orange) arrows represent the presence and absence of synaptic connections between neurons, respectively. Synaptic connections between neurons of the same hierarchy are unified in all experiments; there is no connection between PBs and full-connection between lower-level neurons. Neural excitability homogeneity is defined by the variance of the activity threshold of the lower-level neuron. See the Methods for details on experimental conditions.

For the model architecture, the number of PBs and lower-level neurons are set to 2 and 30, respectively. Note that these neurons model the firing frequency of a population of neurons, not the activity of individual neurons in the biological brain. See the Supplementary Methods for details on parameter setting.

Abbreviations. S-CTRNNPB, stochastic continuous time recurrent neural network with parametric bias; PB, parametric bias; PE, prediction error; ASD, autism spectrum disorder; MRI, magnetic resonance imaging; FC, functional connectivity; ReHo, regional homogeneity.

Neural network model

The main component of the neural network model used in the simulation is the stochastic continuous time recurrent neural network with parametric bias (S-CTRNNPB), which is analogous to the biological brain, based on predictive processing theory (; ). In the feed-forward prediction of S-CTRNNPB, the internal state of the ith neuron at time step t for the sth sequence is calculated as follows,

(1)
ut,i(s)={ut1,i(s)iIP1τi(jIIwijxt,j(s)+jILwijlt1,j(s)+jIPwijpt,j(s)+ai)+(11τi)ut1,i(s)iILjILwijlt,j(s)+aiiIM,IV

where II, IP, IL, IM, and IV are index sets of the input, parametric bias (PB), lower-level, predicted mean, and variance neurons, respectively; wij is the synaptic connection weight from the jth neuron to the ith neuron; xt,j(s) is the jth external input value at time step t of the sth sequence, lt,j(s) is the jth lower-level neuron, pt,j(s) is the jth PB activity; τi is the time constant of the ith neuron; and ai is the activity threshold of the th neuron. As represented in the above equation, S-CTRNNPB is a hierarchical neural network model, and higher-level neuron is called PB. Equation (1) indicates that the internal state of PB neurons does not change with time.

Parameter optimization is performed by minimizing the negative log-likelihood, which assumes a Gaussian distribution for observations, as shown in the following equation,

(2)
Lt,i(s)=ln(2πvt,i(s))2+(y^t,i(s)yt,i(s))22vt,i(s)

where vt,i(s) and yt,i(s) are the predicted mean and variance, respectively; y^t,i(s) is the target value (input value at the next time step). Minimizing this negative log-likelihood can be regarded as minimizing the precision-weighted prediction error. Details of feed-forward prediction, activation function, and parameter optimization are described in the Supplementary Methods.

Experimental procedures and ASD-like performance measures

In the training phase, parameter optimization is performed on the network structure, e.g., synaptic weights, and PB activities corresponding to each target sequence. Target sequences consisted of 6 basic emotions × 16 subjects and were prepared by extracting 9 features from facial expression videos (Supplementary Methods). After training, each target sequence was associated with a specific PB activity, and the relationships (similarities and differences) between target sequences were expected to be “self-organized” in the state space of the PB activity (referred to as “PB space” below). In the test phase, the neural network was required to predict unseen test target sequences. In this test phase, parameter optimization is only applied to PB activities such that the prediction error for test target sequences is minimized while the network structure remains fixed.

This PB update process for the test target sequence was considered “emotion recognition”, based on the similarity of the PB activity for the test sequence to the PB activity clusters for training sequences of the same emotion. This similarity of PB activities within the same emotion is quantitatively evaluated by a clustering index, an average silhouette width, called an “emotion recognition index” herein (Supplementary Methods). In addition to the emotion recognition index, we also included the prediction error of the test target sequence (test error) as an ASD-like performance measure. Based on predictive processing theory, the test error reflects impairments in generalization, a cognitive trait of ASD (). This corresponds to the cognitive tendency in ASD to focus on details of sensory information, making it difficult to extract abstract meaning, resulting in overfitting (). Eight-fold cross validation was used to calculate ASD-like performance.

Experimental conditions in neural network simulation

Based on previous studies (; ; ; ; ), we examined effects of the following two parameters on the ASD-like performance in the neural network model (Figure 1).

The first parameter is the FCmodel, which indicates the proportion of synaptic connections between groups of neurons at different hierarchical levels. FCmodel is assumed to correspond to FC as measured by fMRI (hereafter referred to as FCMRI in comparison to FCmodel). As shown in Figure 1, the FCmodel between the PB and the lower-level neuron and the FCmodel between the lower-level neurons and the input/prediction neurons are referred to as higher-level FCmodel and lower-level FCmodel, respectively, which were set to one of the following values: 100, 80, 60, 40, or 20%. The second parameter is network excitability homogeneity. The intrinsic homogeneity of network excitability is important for efficient information processing (; , ), and its alterations are thought to be related to ASD, i.e., altered “excitatory-inhibitory balance” (). In the current experiment, as shown in Equation (3), the activity threshold of lower-level neurons, i.e., ai in Equation (1), is initialized to follow a Gaussian distribution and fixed without being updated by learning. Note that in the training phase, the network structure (weight and PB) is optimized based on the distribution of these activity thresholds.

(3)
ai~N(0,  K)K=0.1, 1, 10

The K parameter in Equation (3) determines the homogeneity of intrinsic neuronal excitability, and as parameter K decreases, excitability of the network becomes more homogeneous. In the current analysis, the experimental conditions of K = 0.1, 1, and 10 are referred to as the highly homogeneous, modestly homogeneous, and heterogeneous network conditions, respectively.

Examining the association between simulation results and fMRI datasets

Based on the assumption that there is a correspondence between neural network parameters and fMRI parameters, we examined whether the relationship between neural network parameters and ASD-like performance in the neural network simulation could be validated between subject subgroups allocated on the basis of fMRI parameters and ASD diagnosis/symptoms in the fMRI dataset. As an fMRI dataset, we used resting state fMRI datasets for 849 subjects (410 ASD and 439 TD) from the publicly available Autism Brain Imaging Data Exchange (; ). See Supplementary Methods, Supplementary Table 1, and Supplementary Table 2 for details.

We determined the fMRI parameters corresponding to neural network parameters by the following procedure. First, referring to previous neural circuit studies, insula (; ; ; ; ), middle temporal gyrus (temporo-occipital part) (; ; ; ; ; ; ), and visual cortex (intracalcarine cortex) (; ; ) were selected as key anatomical regions for facial emotion recognition, and the three levels of hierarchical neural network model, i.e. PB, lower-level, and input neurons, in the neural network were assumed to correspond to each brain region, respectively. Based on these assumptions, higher-level FCmodel and lower-level FCmodel were assumed to correspond to the higher-level FCMRI, i.e., FCMRI between insula and middle temporal gyrus, and the lower-level FCMRI, i.e., FCMRI between middle temporal gyrus and visual cortex, respectively. In addition, network excitability homogeneity was assumed to correspond to the ReHo of the middle temporal gyrus (Figure 1). It should be noted that there are other choices regarding selection of anatomical regions. For example, other regions related to facial emotion recognition include the prefrontal cortex (7–9), amygdala (7–10), fusiform gyrus (7–9), thalamus (8–10), and parahippocampal gyrus (8, 9). Furthermore, it is sometimes difficult to determine whether a given region serves as a higher-level or lower-level neuron. Application of the simulation results to the fMRI dataset in this study is not intended to be exhaustive and is still at the proof-of-concept stage. The Harvard Oxford atlas was used to identify the above anatomical regions (). The aforementioned FCMRIs and ReHo were transformed to normal distributions using Fisher r-z and Box-Cox transformations, respectively, and were adjusted for covariates, i.e., age, sex, FIQ, mean framewise displacement, and sites.

Next, each experimental condition of neural network simulation was mapped to a subject subgroup on the basis of fMRI parameters as follows. In fMRI datasets, 849 subjects were divided into 5 × 5 × 3 = 75 subgroups by dividing them into 5, 5, and 3 groups of approximately the same numbers of subjects, according to their higher/lower-level FCMRI, and ReHo ranks, respectively. Each subgroup was assigned to one of 5 × 5 × 3 = 75 neural network parameter settings with higher/lower-level FCmodel of 20, 40, 60, 80, and 100% and network excitability homogeneity of highly homogeneous, modestly homogeneous, and heterogeneous, respectively.

Finally, we examined whether the subject group corresponding to the experimental condition that exhibited ASD-like performance in the neural network simulation actually had an ASD diagnosis/symptoms.

Results

Performance of a typical development (TD) model in neural network simulation

The network with the experimental condition in which network parameters were set at a heterogeneous network excitability and no reduction in FC, i.e. higher/lower-level FCmodel = 100%, is called a TD model. The learning curve for the TD model shows that both training and test error substantially decrease (Figure 2A). This indicates that the model not only successfully reproduces the training target sequence, but also generalizes to an unknown test target sequence.

Figure 2 

Analytical results for the TD model.

(A) Learning curves. (B) Examples of target, prediction, lower-level neuron, PB activity sequences. (C) PB activities corresponding to all target sequences (PB space).

Abbreviation. MSE, mean squared error; PB, parametric bias.

Examples of target, prediction, lower-level neuron and PB activity sequences in the test phase are shown in Figure 2B. The target sequence varies with emotion, and the prediction reproduces the target sequence well. Lower-level neuron activity corresponds to dynamics of the short-term target sequence, whereas PB activity corresponds to a more abstract level of characteristics of the target sequence.

PB activities corresponding to all target sequences, i.e., PB space, are shown in Figure 2C. In Figure 2C, PB activities corresponding to training target sequences seem to form emotion clusters, indicating that emotion clusters are self-organized in the predictive processing framework, even though no emotion labels are given in training. Furthermore, we can see that PB activities corresponding to the test sequence are located near PB clusters of the same emotion as the training sequences. This indicates that the model successfully recognizes facial emotions even for unknown test data based on a predictive processing framework.

Influence of neural network parameters on ASD-like performance and the interaction between parameters

The results of prediction error, one of the ASD-like performance indicators, are shown in Figure 3A (training error) and Figure 3B (test error). These figures show the change in prediction error when varying network excitability homogeneity and the higher/lower-level FCmodel. In the highly homogeneous network with the high FCmodel condition, the training error was small, but the test error was large, indicating that the model was overfitted to training sequences and failed to acquire generalization capability, i.e., ASD-like performance. On the other hand, in the other network excitability homogeneity conditions (modestly homogeneous and heterogeneous), both training and test errors were small, regardless of the FCmodel condition, indicating that the model successfully acquired generalization capability to predict emotional facial expressions, i.e., TD-like performance.

Figure 3 

Investigation of prediction error under various experimental conditions (A) Training error. (B) Test error.

Abbreviations. Highly Homo, highly homogeneous network condition; modestly Homo, modestly homogeneous network condition; hetero, heterogeneous network condition; FCmodel, functional connectivity in neural network model.

Next, to examine the other ASD-like performance measure, emotion recognition, i.e., the similarity of PB activity for test to training PB activity clusters within the same emotion, we illustrated PB spaces under different experimental conditions (Figure 4A–D and Supplementary Figure 1). Comparing these four PB spaces, in the heterogeneous network with the FCmodel = 100% (Figure 4C), PB activities for test were most clearly located within training PB activity clusters of the same emotion, i.e., most successful emotion recognition. In the highly homogeneous/heterogeneous network with FCmodel = 20% (Figure 4B and 4D), PB activities for test were not so far from training PB activity clusters, but boundaries between emotional PB activity clusters seem a little unclear, i.e. almost successful emotion recognition. In the highly homogeneous network with FCmodel = 100% (Figure 4A), PB activities for test were located farthest away from training PB activity clusters, i.e., most unsuccessful emotional recognition. From these findings, regarding network excitability homogeneity, the homogeneous network condition appears to show more unsuccessful emotion recognition, i.e., ASD-like performance. On the other hand, regarding FCmodel, performance in emotion recognition appears to be reversed, depending on network excitability homogeneity. That is, in a heterogeneous network, the high FCmodel condition shows more successful emotion recognition, i.e., TD-like performance, than the low FCmodel condition, while in a highly homogeneous network, the high FCmodel condition shows more unsuccessful emotion recognition, i.e., ASD-like performance than the low FCmodel condition. These intuitive findings are more clearly illustrated in Figure 4E by quantitative analyses using a clustering measure of PB activity by emotion called “emotion recognition index”, i.e., average silhouette width, described in Methods. Specifically, regarding network excitability homogeneity, the homogeneous network showed a lower emotion recognition index, i.e., more unsuccessful emotion recognition. Regarding FCmodel, in a heterogeneous network, the high FC condition showed a higher emotion recognition index than the low FC condition, while conversely, in a highly homogeneous network, the high FC condition showed lower emotion recognition index than the low FC condition.

Figure 4 

Investigation of emotion recognition performance under various experimental conditions.

(A)(B)(C)(D) Illustration of PB space when network excitability homogeneity and FCmodel differ. Note that FCmodel = 100% and FCmodel = 20% indicate that both higher-level FCmodel and lower-level FCmodel are 100% and 20%, respectively. Experimental conditions in (C) are identical to those in Figure 2(C), and these figures are identical. (E) Emotion recognition index. The emotion recognition index, i.e. average silhouette width, is a measure of the similarity between a test PB activity and a training PB activity of the same emotion. See Supplementary Methods for details.

Abbreviations. Highly Homo, highly homogeneous network condition; modestly Homo, modestly homogeneous network condition; hetero, heterogeneous network condition; FCmodel, functional connectivity in neural network model; PB, parametric bias.

From the above analysis, we found that the two ASD-like performance measures, test error and emotion recognition, share common trends. First, ASD-like performance is exacerbated as network excitability becomes more homogeneous. Second, the effect of FCmodel on ASD-like performance depends on network excitability homogeneity, i.e., FCmodel and network excitability homogeneity interact.

The mechanism by which changes in neural network parameter cause ASD-like performance

In order to clarify the mechanism by which network excitability homogeneity and FCmodel interact to cause ASD-like performance, we examined how each parameter affects neural representations obtained in developmental learning.

Regarding network excitability homogeneity, the aforementioned analysis revealed that heterogeneous network conditions tended to acquire generalization capability in that they had lower test error and were more successful in emotion recognition. We then hypothesized that the PB space in a heterogeneous network condition would tolerate subtle differences among target sequences for prediction, while the homogeneous network would be fragile to subtle differences due to overfitting to a particular sequence of the training sequences. To investigate this hypothesis, we evaluated the number of training sequences that could be well predicted, i.e., average mean squared error <0.005, based on a particular value of PB activity, i.e., closed loop analysis. See Supplementary Methods for details. This number of successfully predicted sequences would reflect the tolerability of PB space for application to different sequences. As expected, it is clear that individual PB activities in the heterogeneous network condition (Figure 5B) are able to predict more sequences with small errors than PB activities in the highly homogeneous network condition (Figure 5A), and this is confirmed by quantitative analysis, which calculated the average per network condition (Figure 5C).

Figure 5 

Mechanisms by which underlying ASD-like performance induced by changes in neural network parameters caused ASD-like performance.

(A)(B) Example of a scatterplot of the tolerance of PB space. Positions of points in the scatterplot represents the PB activity obtained by training, as shown in Figures 3C and 3E. Colors of dots indicate the number of training target sequences that can be predicted with low error (PE < 0.01) by providing that PB activity. Both higher-level and lower-level FCmodel were set to 100%. (C) Bar chart of the average number of training target sequences that can be reproduced with small error (PE < 0.01). (D) Prediction error on sensory input-driven generation by setting unreliable (random) PBs. (E) Prediction error on top-down only generation by closed-loop generation.

Abbreviations. Highly Homo, highly homogeneous network condition; modestly Homo, modestly homogeneous network condition; hetero, heterogeneous network condition; PB, parametric bias; PE, prediction error; FCmodel, connectivity proportion.

As mentioned above, network excitability and FCmodel interact, and network excitability affected the tolerability of PB space, which influenced the generalization of top-down prediction. Based on the fact that prediction in predictive processing system is based on both top-down prediction and bottom-up sensory input, we hypothesize that FCmodel determines whether the information processing is top-down prediction dependent or sensory input dependent (or hypo-prior ()). To investigate this hypothesis, prediction errors for the unreliable (random) PBs, i.e., sensory input-driven generation (Figure 5D), and prediction errors for closed-loop generation, i.e., top-down only generation (Figure 5E), were calculated. In the low FCmodel condition, the prediction error for sensory input-driven generation was small (Figure 5D), but the prediction error based on top-down only generation was large (Figure 5E), indicating that the low FCmodel condition induced sensory input-dependent information processing, while in the high FCmodel condition, the opposite was true: top-down prediction-dependent information processing.

The above examination provides the following explanation for the interaction of neural network parameters to exhibit ASD-like performance. Under the homogeneous network condition, PB space is intolerant, and top-down predictions are overfitted to subtle differences among sequences. As such, when top-down prediction is not accurate for test sequences, high FCmodel conditions with top-down prediction-dependence exhibit more ASD-like performance than low FCmodel conditions. On the contrary, under a heterogeneous network condition, PB space is tolerant, and the network provides accurate top-down predictions for test sequences. In this case, high FCmodel conditions with top-down prediction-dependence exhibit more TD-like performance than low FCmodel conditions.

In addition, there is a debate as to whether ASD information processing is top-down prediction or sensory input-dependent (hypo-prior) (; ; ; ; , ). This study suggests that both top-down prediction-dependent (high FCmodel) and sensory input-dependent (low FCmodel) information processing may exhibit outwardly ASD-like performance, i.e., impaired emotion recognition, depending on the network excitability homogeneity.

Applicability of neural network simulation results to fMRI data

Neural network simulation showed that parameters interacted to cause ASD-like performance, and provided an explanation of information processing mechanisms. Of these findings, the relationship between parameters and ASD-like performance can be examined for biological reproducibility using the following procedure. First, by assuming a correspondence between the neural network parameter, i.e., network excitability homogeneity and FCmodel, and the fMRI parameter, i.e., regional homogeneity and FCMRI, each experimental condition of the neural network is mapped to a subject subgroup identified by fMRI parameters (see Methods for details). As in the Methods, it should be noted that a significant simplification in the selection of anatomical regions has been made in this mapping. Second, we examined whether subject subgroups corresponding to neural networks that exhibited ASD-like performance actually showed more ASD diagnoses/symptoms.

Following the above procedure, in order to examine the correspondence between ASD-like performance of neural network and ASD diagnosis in subject subgroup, Figure 6A shows the proportion of ASD diagnoses for each subject subgroup by arranging fMRI parameters for each axis in a way that corresponds to Figure 3B (test error) and Figure 4E (emotion recognition index). Despite variations across subgroups, the trend in ASD diagnosis in the fMRI data set (Figure 6A) appears to be similar to ASD-like performance of neural network simulations (Figures 3B and 4G). Specifically, the overall trend is for ASD diagnoses to be more common in subgroups with higher ReHo (P = 0.021 for the main effect of logistic regression analysis). Furthermore, the impact of FCMRI on ASD diagnosis appears to differ between the high and low ReHo subject groups. That is, in the high ReHo group there seemed to be more ASD patients with high FCMRIs, while in the low ReHo group there seemed to be more ASD patients with low FCMRIs (P = 0.025 for interaction effect in logistic regression analysis).

Figure 6 

Validation of neural network simulation results using fMRI data.

(A) Proportion of ASDs belonging to each subgroup. Labels in this figure (fMRI parameters) correspond to labels in Figure 3A, 3B and 3G (neural network parameters). Specifically, neural network parameters, i.e., network excitatory homogeneity and higher/lower-level FCmodel, are mapped to subgroups in fMRI datasets based on fMRI parameters, i.e., regional homogeneity and higher/lower-level FCmodel, respectively. (B) Scatterplot of test error and emotion recognition index. (C) For Figure (B), the diagnostic information of ASD of the subject corresponding to the neural network of each point is added. (In the scatterplot, a little Gaussian noise is added to each point on the X- and Y-axis for visibility.) Histograms of ASD diagnosis are added about test errors (X-axis) and emotion recognition index (Y-axis). (D) Histogram for the number of ASDs and TDs in A, B, and C groups.

To further investigate the correspondence between the ASD-like performance of the neural network and ASD diagnosis in fMRI datasets, the scatterplot of test error and emotion recognition index created from the results of neural network simulation (Figure 6B) was colored to represent the ASD diagnosis of corresponding subjects based on the fMRI parameter (Figure 6C). In the histogram along the X-axis in Figure 6C, there appears to be a statistically significant trend (P = 0.003) toward more ASD among subjects corresponding to neural networks with larger test errors. Next, in the histogram in the Y-axis direction, the subject subgroups corresponding to neural networks with a large emotion recognition index tend to have more TDs, and this is statistically significant (P = 0.001). In the scatter plot in Figure 6C, clusters appear to be divided into the three groups, A, B, and C, shown in the figure, and the number of ASDs and TDs in each of these groups is shown in Figure 6D. There was a significant difference among the three groups (P = 0.007), with more TDs in the group with a high emotion recognition index and low test error (Group A) and more ASDs in the group with a low emotion recognition index and high test error (Group C).

While the above analysis showed an association between the ASD-like performance measure in neural networks and ASD diagnosis in fMRI datasets, we next hypothesized that emotion recognition performance in neural networks would be associated with impaired social interaction symptoms in ASD patients, because emotion recognition is essential for social interaction. To examine this hypothesis, we performed correlation analyses between ASD-like performance measures and each ASD symptom in the ASD population, evaluated by Autism Diagnostic Interview-Revised (ADI-R ()) (Table 1). As we hypothesized, the emotion recognition index was negatively correlated with impaired social interaction symptoms (r = –0.123; 95% CI = –0.233 to –0.008) with a P-value of 0.034. This result suggests similarities between characteristics of neural network performance and corresponding characteristics of subject ASD symptoms.

Table 1

Relationship between ASD-like performance of the neural network and ADI-R scores of corresponding subjects.


ADI-R SCALESNUMBERCORRELATION COEFFICIENTa WITH TEST ERROR (95% CI)CORRELATION COEFFICIENTa WITH EMOTION RECOGNITION INDEX (95% CI)

Language/Communication2970.014 (–0.105, 0.120)–0.088 (–0.195, 0.017)

Reciprocal Social Interactions296–0.009 (–0.118, 0.092)–0.123 (–0.233, –0.008)

Restricted, Repetitive, and Stereotyped Behaviors and Interests2960.080 (–0.032, 0.206)–0.111 (–0.225, 0.016)

a Spearman’s correlation coefficient.

Before the correlation analysis, mapping from the neural network parameter to the subject subgroup in fMRI datasets is adjusted by covariates, i.e., age, gender, FIQ, mean framewise displacement, and sites.

Abbreviations. S-CTRNNPB, stochastic continuous time recurrent neural network with parametric bias; ADI-R, Autism Diagnostic Interview-Revised; CI, confidence interval.

Discussion

In the current study, neural network simulation, which is analogous to the developmental learning process, showed that FC, i.e., FCmodel, and neural excitability, i.e., network excitability homogeneity, interacted to cause ASD-like performance. Behind this interaction, FC determines whether information processing depends on top-down prediction or sensory input, and which of these two types of information processing causes more ASD-like performance depends upon the generalization capability of top-down prediction determined by neural excitability homogeneity. Furthermore, the relationship among FCmodel, neural excitability homogeneity and ASD-like performance in network simulation was biologically validated using fMRI datasets as the relationship among FCMRI, ReHo and ASD diagnosis.

In the neural network simulation, by taking an approach that embodies predictive processing theory as system-level neural dynamics, we gained new insights into the relationship among neural excitability, FC, and ASD symptoms. While previous reports have shown that neural excitability alone affected ASD-like performance (; ), we found that neural excitability not only influences ASD-like performance directly, but also influences the direction of effect of FC, i.e., interacts with FC, with regard to ASD-like performance. While most computational simulation studies to date have examined the relationship between individual parameters, i.e., the strength of prior distribution, and ASD-like performance (; ), future research is expected to focus on interactions between multiple parameters.

Although the current study suggests that the interaction between neural excitability and FC underlies ASD symptoms, there are still only a few biological studies that investigate the relationship between these parameters in ASD. Since network excitability homogeneity corresponds to the homogeneity (or synchronization) of neural activities in a local network (; ), we assumed similarity with ReHo in fMRI datasets (). As another biological approach to examine the relationship between network excitability and FC, there are several studies that measured glutamate/glutamine concentration and FC simultaneously, based on Magnetic Resonance Spectroscopy and fMRI (; ). In these studies, the direction of the correlation between neural excitability and FC was reported to differ between ASD and TD (; ), which may be due to the interaction between FC and neural excitability, consistent with our findings. These studies are still small pilot studies, and larger biological studies are expected to examine the relationship between neural excitability and FC.

The current study had limitations in that it required a simplification of imaging test results to bridge the gap between simulation studies at an abstract level and complex biological studies. The simulation study used a simple model for clear depiction of the computational theory, but the actual biological brain is of course more complex. Specifically, the biological neural network involved in facial emotion recognition would be formed from a larger number of hierarchies and regions (; ; ). Since there are multiple brain region candidates for mapping neural networks, bias may occur in this process. To reduce this selection bias, it would be straightforward to perform simulations repeatedly to cover all candidate brain region combinations. However, such an exhaustive simulation here is difficult due to huge computational costs. Another way to reduce the bias while avoiding excessive computer costs may be data-driven approach such as machine learning (). However, if the model becomes too complex, it becomes difficult to illustrate information processing in an explainable manner. A possible future direction would be to map imaging data to neural network parameters while avoiding excessive model complexity by using latent variables obtained from unsupervised feature extraction, such as variational autoencoders.

In this study we prepared a static PB. The possibility that this simplification may have affected the final results needs to be discussed. For example, implementing a static PB might have reduced performance in predicting the dynamics of very complex sensory data compared to a model in which the higher-level neurons are dynamic. However, we believe that the impact of static PB settings on the final results is negligible. This is because PBs can bring functional hierarchy to neural networks in the same way as dynamic neurons. Functional hierarchy has been shown to emerge by setting multiple timescale property in neural network (). And static PBs have been shown to be able to perform as a higher-level neuron similar to that of dynamic higher-level neurons, as a case of infinitely large time constants (; ; ). Furthermore, this multiple time scale property is shown to be present in actual brain activity, as evidenced by recent biological and computational studies (). Thus, there is biological plausibility in modeling higher-level brain area by neurons with higher time constants.

The fact that the current study design only involved learning and testing once each, may be a limitation in simulating development of ASD. What this study succeeded in showing is that innate parameters (FC and neural excitability homogeneity) affect a single learning session and cause ASD-like performance. However, in the actual developmental process of ASD, learning and testing may be repeated from the beginning of development. As a result of these processes, ASD-specific perceptions and cognition may emerge, as well as compensatory behavioral changes. For example, in ASD, the impact of previous learning on subsequent learning may change during repeated learning (; ). Future computational modeling research of the developmental process based on repetitive learning should provide a better understanding of ASD.

Other studies have examined the association between complex cognitive process, e.g., emotion recognition, and fMRI using models of underlying computation, e.g., reinforcement learning and Bayesian inference (; ). A number of these studies have examined the relationship between hidden variable information obtained from the model and specific brain regions (). However, while reinforcement learning and Bayesian inference can be expressed with few parameters and easy to find corresponding brain regions, it has been increasingly recognized that neural network may be better suited to adequately explain complex cognitive or learning processes (; ; ). Nevertheless, the framework for biological validation of neural network models has only just begun to be explored. For example, recently, the similarity of the representation between the hidden layers of the Deep Q network and fMRI of healthy subjects has been examined during a video game task (). Compared with these previous studies, the current neural network approach is novel and significant in that it prepares a subject-specific neural network to clarify the correspondence between the model and the original subjects, and uses a model that can examine the developmental learning process behind the emergence of cognitive alterations in ASD.

Additional Files

The additional files for this article can be found as follows:

Supplementary File 1

Supplementary Methods. DOI: https://doi.org/10.5334/cpsy.93.s1

Supplementary File 2

Supplementary Tables 1 and 2. DOI: https://doi.org/10.5334/cpsy.93.s2

Supplementary File 3

Supplementary Figure 1. DOI: https://doi.org/10.5334/cpsy.93.s3