Modules for Automated Validation and Comparison of Models of Neurophysiological and Neurocognitive Biomarkers of Psychiatric Disorders: ASSRUnit—A Case Study

Christoph Metzner; Tuomo Mäki-Marttunen; Bartosz Zurowski; Volker Steuber

INTRODUCTION

Psychiatric nosology, for centuries widely untouched by findings from clinical neuroscience, is at the beginning of a transformation process (Friston, Redish, & Gordon, ) toward an interactive evolution of diagnostic and biological categories. This change of focus stems from the hope that biomarkers and endophenotypic measures show a better correspondence with genetic alterations identified by large genome-wide association studies (Meyer-Lindenberg & Weinberger, ) and promises to more readily shed light on the mechanisms underlying these disorders and to facilitate the discovery of novel therapeutic interventions (Siekmeier, ). Naturally, much effort has been put into translating these measures into practice using human studies (Perlis, ) and animal models (Markou, Chiamulera, Geyer, Tricklebank, & Steckler, ).

Computational approaches also have gained significantly more attention over the last years, and this has led to the emergence of computational psychiatry as a novel multidisciplinary and integrative discipline (see, e.g., Adams, Huys, & Roiser, ; Corlett & Fletcher, ; Friston, Stephan, Montague, & Dolan, ; Montague, Dolan, Friston, & Dayan, ; Stephan & Mathys, ; Wang & Krystal, ). This emergence can be attributed to three main factors: First, the earlier mentioned increase in experimental studies has provided a wealth of neuroscientific (including neurochemical, molecular, anatomic, and neurophysiological) data that are essential to building computational models; second, methodological and infrastructural advances, such as the various atlases, databases, and online tools from the Allen Brain Institute (http://brain-map.org/) or the BRAIN initiative (https://www.braininitiative.nih.gov/), have made it possible to analyze and process this enormous amount of data; third, the increase in computing power of high-performance computers as well as standard personal computers has made it possible (and affordable) to build and use models of increasingly high computational complexity. Therefore, the rapid growth of the field of computational psychiatry comes as no surprise. However, to fully exploit the potential that computational modeling offers, we have to identify systemic weaknesses in current approaches and take a look at other disciplines that use computational models (and have used them for much longer than psychiatry) and even look at disciplines, such as software development, that face similar challenges.

At the core of computational modeling lies the concept of validation, that is, the rigorous comparison of model predictions against experimental findings. Furthermore, for a model to be useful and provide a true contribution to knowledge, the validation has to use sound criteria and the experimental observations need to sufficiently characterize the phenomenon the model tries to reproduce. Hence, to develop a computational model, scientists need to have an in-depth understanding of the current, relevant experimental data; the current state of computational modeling in the given area; and the state of the art of statistical testing to choose the appropriate criteria with which the model predictions and experimental observations will be compared (Gerkin & Omar, ; Sarma et al., ). In a field where the number of both experimental and computational studies grows rapidly, as is the case for psychiatry, this becomes more and more impracticable. Furthermore, the increase in the number of modeling and experimental studies has made it harder for reviewers to judge not only whether a new model adequately replicates the full range of experimental observations but also how it compares to competing models. Again, reviewers need an in-depth knowledge of the modeling and experimental literature as well as profound statistical knowledge. Finally, because computational modeling aims to generate predictions that can be experimentally tested, experimental neuroscientists must be able to extract and assess predictions from a rapidly growing body of computational models, a task that is becoming more and more impracticable.

The problems described herein are not unique to the field of computational psychiatry but occur in all scientific areas that use computational models. Furthermore, building a computational model is in the end a software development project of sorts. Omar, Aldrich, & Gerkin () have therefore proposed a framework for automated validation of scientific models, SciUnit, which is based on unit testing, a technique commonly used in software development. SciUnit addresses the problems mentioned earlier by making the scope (i.e., the set of observable quantities about which it can generate predictions) of the model explicit and by allowing its validity (i.e., the extent to which its predictions agree with available experimental observations of those quantities) to be automatically tested (Omar et al., ).

In this article, we propose to adopt this framework for the computational psychiatry community and to collaboratively build common repositories of computational models, tests, test suites, and tools. As a case in point, we have implemented a Python module (ASSRUnit) for auditory steady-state response (ASSR) deficits in schizophrenic patients, which are based on observations from several experimental studies (Krishnan et al., ; Kwon et al., ; Vierling-Claassen, Siekmeier, Stufflebeam, & Kopell, ), and we demonstrate how existing computational models (Beeman, ; Metzner, ; Metzner, Schweikard, & Zurowski, ; Vierling-Claassen et al., ) can be validated against these observations and compared with each other.

THE SciUnit FRAMEWORK

The module we present here is based on the general SciUnit framework for the validation of scientific models against experimental observations (Omar et al., ; see Figure 1).

Figure 1.

Schematic of the SciUnit framework. Models can be tested against experimental observations using specific tests. These tests incorporate an experimental observation and interface with the model through capabilities. Tests can be grouped into so-called test suites. The execution of a test produces a score, which describes how well the model captures the experimental observations. SciUnit also provides methods to visualize the resulting score(s), for example, in a table.

In SciUnit, models declare and implement so-called capabilities, which the validation tests then use to interact with those models. By a capability of the model, we mean the ability of the model to describe certain biological phenomena that are possible to assess using physical quantities. Furthermore, the declaration and implementation of capabilities are separated, which allows for testing two different models that share the same capabilities on the same experimental observations using the same test. Tests then take the model, use its capabilities to generate data, compare these data to the experimental observations that are linked to the test, and create a score. This score, which can simply be a Boolean (pass/fail) or another more complex score type, describes if and to what extent the model data and the experimental observation(s) match.

Before we describe the actual implementations of capabilities, models, tests, and scores in our framework for ASSRs in schizophrenia, we first start with a summary of the experimental observations we included in the database, and then we describe the computational models that were realized.

THE ASSRUnit MODULE

The structure of the ASSRUnit module proposed here is shown schematically in Figure 2. As outlined earlier, the proposed module aims to provide three main functionalities: (a) a simple way of getting an overview of the experimental literature, (b) an easy and flexible way to automatically test computational models against experimental observations, and (c) an automated way of generating predictions from computational models. Functionality a is fully covered by the experimental database and its methods to query the database and visualize the results. Functionality b is provided by linking both the experimental database as well as the computational models to the SciUnit tests that cover the relevant experimental observations. The only action required from the user is, if the computational model has not yet been included in the model repository of the module, to provide an interfacing Python class (i.e., a class that allows the original model to be run and analyzed from within Python) for the model that implements all the required capabilities. Note that the model itself does not have to be written in Python; it only has to be executable from a shell. Once the model is included, the SciUnit framework allows for automated testing, and the visualization methods provided in the proposed module allow for a comprehensive and clear presentation of the results. Functionality c can be achieved by a set of SciUnit tests and capabilites that, instead of covering experimental observations, cover experiments that have not yet been performed. By running the computational models with these tests, the module can be used to generate new predictions from the models, which can then be used to populate a prediction database similar to the experimental database. The module is available on GitHub (https://github.com/ChristophMetzner/ASSRUnit/tree/CompPsychArticle).

Figure 2.

Schematic of the proposed framework highlighting the three main functions: (a) overview of experimental observations; (b) validation of computational models; (c) creation of a predictions database. At its core lies the SciUnit module, which provides the infrastructure for the automated validation of the computational models. In particular, through a set of suitable tests, the computational models can be compared against experimental observations queried from the experimental database. Another set of tests, the so-called prediction tests, are then employed to extract predictions from the computational models, thus populating the predictions database.

Experimental Observations Database

In patients suffering from schizophrenia, oscillatory deficits in general and ASSR deficits in particular have been extensively studied using electroencephalography (EEG) and magnetoencephalography (MEG; e.g., Brenner, Sporns, Lysaker, & O’Donnell, ; Hamm et al., ; Krishnan et al., ; Kwon et al., ; Light et al., ; Mulert, Kirsch, Pascual-Marqui, McCarley, & Spencer, ; O’Connell et al., ; Spencer, ; Spencer, Niznikiewicz, Nestor, Shenton, & McCarley, ; Spencer, Salisbury, Shenton, & McCarley, ; Vierling-Claassen et al., ; Zhang, Ma, Li, Yang, & Qin, ).

Neural oscillations have been hypothesized to subserve important functions in the brain and are critically involved in cognitive processes (see, e.g., the review of Başar, Başar-Eroglu, Karakaş, & Schürmann, ). Gamma oscillations, for example, have been demonstrated to underlie the formation of coherent percepts in different sensory modalities (e.g., Engel, Kreiter, König, & Singer, ; Jokeit & Makeig, ). Interestingly, abnormal power and synchrony in the gamma band have been found in schizophrenic patients in a number of different tasks and paradigms (e.g., Cho, Konecky, & Carter, ; Kwon et al., ; Spencer et al., ) and linked to the schizophrenic symptom profile (e.g., Gordon, Williams, Haig, Wright, & Meares, ; K.-H. Lee, Williams, Haig, & Gordon, ). Although deficits in the generation and maintenance of gamma oscillations are not a classical symptom of schizophrenia, given the importance of gamma oscillations in sensory processing and cognition and the link between deficits and symptoms, these biomarkers might reflect a characteristic trait of the disorder.

Here we focus on three of these studies, looking at entrainment deficits in the gamma and beta ranges. Kwon et al. () used a click train paradigm to study ASSRs at 20, 30, and 40 Hz in schizophrenic patients using EEG and found a prominent reduction of power at the driving frequency for 40 Hz drive but no changes of power at the driving frequency for 30 Hz and 20 Hz. Although Figure 3 in Kwon et al. () seems to show an increase of the subharmonic 20 Hz component for 40 Hz drive, no statistical comparison is presented in the article. Vierling-Claassen et al. () reproduced this reduction of power at the driving frequency for 40 Hz drive using the same paradigm with MEG. Additionally, they found an increase in power at the driving frequency during 20 Hz drive and changes of power at certain harmonic/subharmonic frequencies, namely, an increase of power at 20 Hz for 40 Hz drive and a decrease of power at 40 Hz for 20 Hz drive. Krishnan et al. () used a slightly different paradigm, which employed amplitude-modulated tones instead of click trains, and tested a wide range of driving frequencies from 5 to 50 Hz. They found reduction of power at the driving frequency in the gamma range (i.e., at 40, 45, and 50 Hz) and no changes at other frequencies. Furthermore, they did not find any changes of power at harmonic or subharmonic frequencies.

The experimental database is realized as a nested Python dictionary, with an entry for each study included (a dictionary is a special data structure in which you can access data or values by a key[word]; in a nested dictionary, the values themselves can be dictionaries). Each study entry consists of two entries (i.e., two value-key pairs), which describe the study observations, one in a quantitative way and the other in a qualitative way. We have included the qualitative description because often either computational models do not allow for a strict quantitative comparison with experimental data or publications of experimental studies do not provide enough detail on the results, and in these cases, only a qualitative comparison is possible.

Together with the database, ASSRUnit provides basic methods to query and visualize the content of the database. These methods include commands to retrieve all studies or observations in the database and a method to display an overview of the results for the whole database or for certain studies or observations. Finally, the metadata associated with each study (e.g., the number of participants, the modality, the patient group) can also be retrieved and displayed.

Prediction Database

The prediction database is also implemented as a nested Python dictionary. Similar to the experimental observation database, methods that retrieve and visualize the content of the database are included in ASSRUnit.

Models, Capabilities, Tests, and More

Models

To demonstrate the flexibility of the proposed framework, we included three different neural models of ASSR deficits.

The first model is based on a biophysically detailed model of primary auditory cortex by Beeman (). Our group has recently used it to study ASSR deficits (Metzner et al., ). The model was implemented using the neural simulator GENESIS (Bower, ; Bower & Beeman, ). Not only is this model a good example of a biophysically detailed model of ASSR deficits but its inclusion also demonstrates how models that are not written in Python can be used.

The second model is a reimplementation of the model of Beeman in NeuroML2, a simulator-independent markup language to describe neural network models developed by the NeuroML project (Cannon et al., ), which is featured in the open source brain model database (Gleeson et al., ). We included this model to demonstrate the ability of the proposed framework to incorporate state-of-the-art tools and databases for the design, implementation, and simulation of network models.

The last model we included is the simple model presented by Vierling-Claassen et al. (). The model is a simple network of two populations of theta neurons. The theta neuron model is a simple oscillator, where a single variable θ describes the phase angle of a point traveling around the unit circle. A detailed description of the model and its usefulness in the study of neural oscillations can be found in Börgers & Kopell (). We reimplemented the model in Python (for more details on the model and the replication, see Metzner, ). The model was included first of all to demonstrate that the framework is not limited to biophysically detailed models but can also be used with simpler, more abstract models. Additionally, the inclusion of the model demonstrates the simplest way of including a model, implementing the model in Python. This might not be the most common scenario, but because it is the simplest, we included it here.

We do not discuss the models in more detail here, because they have been described elsewhere (Beeman, ; Metzner, ; Metzner et al., ; Vierling-Claassen et al., ). Furthermore, our focus lies on the framework with which to use, validate, and compare models, not on the models themselves.

The three models mentioned herein are included in the SciUnit framework by wrapper classes (i.e., a class that encapsulates the functionality of the original model but is implemented in Python) that implement the necessary capabilities and make the models available to the tests. One important thing to note here is that, because we are dealing with models of neurofunctional deficits found in individuals with a particular disorder, a model as used in the module always means two configurations of a computational model, one representing the control configuration and one the disorder configuration. Therefore all wrapper classes take two sets of parameters as an argument describing the necessary parameters for the two configurations, respectively.

In addition to the standard model classes, we also implemented a second version of the model classes, which can simulate a certain number n of subjects and a certain number m of trials (realized by their …_plus methods). This allows for assessing the robustness of the results and can contribute in a major way to statistical rigor. The way in which these subjects and trials are implemented strongly depends on the model and its complexity. For example, for the simple model from Vierling-Claassen et al. (), which has all-to-all connectivity for all possible connection types, it is not possible to simulate different subjects (so n = 1), but different trials are simulated by changing the seed for the random number generator (RNG) that generates the background noise. In the case of the model from Beeman (), different subjects are realized by a change in seed for the RNG that is responsible for the formation of individual connections. This leads to each subject having a different connectivity on the level of individual connections, however, while preserving the connection probabilities for each connection type. Furthermore, by also changing the RNG seed that generates the background noise, several different trials for each subject can be realized.

Capabilities

Table 1 summarizes the experimental observations included in the module at this stage. All observations are similar in nature: the power value of the EEG/MEG at a certain frequency in response to auditory entrainment at a certain frequency. Therefore the only capability necessary for a model to produce output that can be compared to these observations is a method that produces the power at a certain frequency X of a simulated EEG/MEG signal in response to drive at a frequency Y. This capability, Produce XY, is included in ASSRUnit, and all models must implement it.

Table 1.

Summary of ASSR deficits in schizophrenic patients in the three studies considered here

	Fundamental			Harmonic	Subharmonic
Drive	40 Hz	30 Hz	20 Hz	20 Hz	40 Hz
Kwon et al.	↓	–	–	–	–
Vierling-Claassen et al.	↓	–	↑	↓	↑
Krishnan et al.	↓	–	–	–	–

Note. ↓ = significantly lower in patients; ↑ = significantly higher in patients; – = no significant difference between controls and patients. The tests included in the ASSRUnit module are based on this table. Krishnan et al. () tested more driving frequencies than the ones shown in the table. The table only shows measures that are common to all three studies.

Tests and scores

The five tests we implemented examine the five observations summarized in Table 1 individually. Furthermore, we implemented one prediction test, which tests 10 Hz power at 10 Hz drive. For the sake of simplicity, the test scores implemented so far are simple Boolean scores, indicating whether a model output fails or passes a test, that is, whether the difference between model output for the control and the schizophrenia-like network matches the experimental observation. In case of the model classes implementing sets of outputs, the mean difference is compared to the experimental observations. For the prediction test, we have chosen a RatioScore instead of a Boolean, which returns the ratio of the power for the schizophrenia-like configuration and the power for the control configuration.

Visualization, statistics, additional data

In addition to the main features of the SciUnit framework for the analysis and comparison of the models, we use the fact that SciUnit allows for passing additional data, beyond the test scores, to provide a class that offers tools for the visualization of the results. This class includes functions to display the test results in a table, plot the results from a set of model outputs as a box plot, and perform and visualize a student’s t test of the differences between control and schizophrenia-like networks.

Next, we describe three different use cases that show how the proposed module can be used for different purposes by experimentalists, modelers, and reviewers.

Use Case 1: Overview of the Experimental Literature

The first use case demonstrates how the experimental database can be used to get a comprehensive overview of the current experimental literature related to a neurophysiological or neurocognitive biomarker, in our case, ASSR deficits in patients suffering from schizophrenia. Figure 3 shows that with two simple commands, one can retrieve the names of all studies and all observations present in the database. These names will have to be used for all further queries of the database.

Figure 3.

Display all studies and all observations included in the database.

Figure 4 then shows how to get a complete overview of all observations of all studies in the database. As we can see in Figure 5, simply adding the parameter meta=true, to the command will additionally output the metadata associated with each study. This contains information on the subjects, modality, and so on. The overview command presents the data in a simple table and can be used to see which studies provided which observation and what the results were. However, as we can already see for our small demonstration database containing only three studies, a full overview is likely to become very large and therefore hard to grasp fully. By explicitly stating the studies and/or the observations in which one is interested, one can reduce the complexity of the table and get a clear and simple overview, as depicted in Figure 6. Note that in the examples, we have only used the qualitative description of the observations; the same functionality also applies to the quantitative descriptions. The functionality described here, along with more examples, can be explored in the accompanying Jupyter notebooks (https://github.com/ChristophMetzner/ASSRUnit/blob/CompPsychArticle/assrunit/Notebooks/Example_Experimental_Database.ipynb).

Figure 4.

Overview of the observations in the experimental literature. The command experimental_overview prints a table summarizing the results for all studies and all observations in the database. Note that by default, the qualitative study results are presented. This can be changed to the quantitative results setting the parameter entrytype to Full.

Figure 5.

**By setting the meta flag to True, additional information on the studies is displayed.**

Figure 6.

**The experimental_overview command allows for querying for specific studies and observations using the names retrieved with the get_studies and get_observations commands.**

This simple querying functionality allows the user to get a quick, clean, and comprehensive overview of the experimental literature, to identify observations that are supported by many studies (see, in our case, the reduction of gamma power for stimulation at gamma frequency) but also to detect controversial findings. Furthermore, the display of the associated metadata allows for checking, for example, whether identified common observations extend over different modalities and postprocessing techniques and also whether controversial findings might be explained by differences in the experimental setup or other related aspects. In the future, it will also be possible to look at more than one database and compare the same observations across different patient groups to highlight commonalities and differences between disorders.

Use Case 2: Model Comparisons

While our first use case only exploited the experimental database, we now show the additional benefits of joining experimental and modeling data.

Simple model comparison

By creating tests based on the model capabilities and grouping them into test suites, we can easily compare models against experimental data and against each other. Figure 7 demonstrates how we can use the module to create two different models along with several tests, run the models to produce the data relevant for the tests, and then judge the model outputs against experimental data and display the results together. Note that in this context, we use the term model as the in silico instantiation of a theoretical/conceptual model. Two different models may share the same code but differ only in parameter values. Again, the functionality described here, along with more examples, can be explored in an accompanying Jupyter notebook (https://github.com/ChristophMetzner/ASSRUnit/blob/CompPsychArticle/assrunit/Notebooks/Example_Model_Comparison.ipynb).

Figure 7.

Contrasting the results of comparing two models against experimental observations. First, the model instances are created and the parameters for the control network and the schizophrenia-like network are passed on together with a name. Then, appropriate tests are created and experimental observations are passed on. In this particular example, the observation is passed on as a “ratio,” which means that the value of the output of the schizophrenia-like simulation is divided by the value of the output of the control simulation. Afterward, the tests are grouped together to form a test suite, and the two example models are run against the test suite. The results of this run are stored in the matrix score_matrix, and by evoking the view method of the SciUnit score matrix, a comparison table is shown displaying the performance of each model against each test. Note that in this example, the two models and their resulting performance are purely hypothetical and do not reflect any actual model, and furthermore, the experimental observations do not reflect any actual findings.

Advanced modeling data and visualization

As already described in the Models subsection of Models, Capabilities, Tests, and More above, there is a second version of each model class that contains not only the standard methods that implement the necessary capabilities but also so-called …_plus methods, which can generate model data for different trials and/or subjects, depending on the type of model. Together with the methods from the visualization class, this additional model data can be used to better understand the model behavior, to judge the robustness of findings, and to statistically analyze model output. Figure 8 shows a simple example demonstrating the use of these classes/methods. When creating an instance of this class for the simple model from Vierling-Claassen et al. (), an additional parameter containing a number of RNG seeds is passed on. When the model is then run, a simulation is executed for each RNG seed, and the model output is a list containing the result for each simulation.

Figure 8.

Generating additional data. First, model instances are created and the produce_XY_plus method is used to run the simulation. The additional seed parameter contains a list of 20 RNG seeds, and a simulation is executed for each seed in that list. Thus, each simulation differs in background noise. The produce_XY_plus methods return the mean values of the outputs for the simulation runs (mcontrol4040 and mschiz4040 above), which can be used analoguously to the output of the standard produce_XY methods. However, the values of the output of each single simulation run are returned for each run (control4040 and schiz4040 above) and can then be visualized or further analyzed statistically. Note that the model parameters used in this example are not based on any actual experimental findings in schizophrenic patients and that they do not aim to reproduce any experimental observations; they are only used for demonstration purposes (for model parameters of this model that reproduce experimental observations, see the original article by Vierling-Claassen et al. []).

Use Case 3: Overview of Model Predictions

Finally, we show how predictions can be generated from existing models (see Figure 9). To generate the predictions, a set of prediction tests along with prediction capabilities, that is, capabilities the models must have for the models to generate the relevant data, needs to be created. For demonstration purposes, we have chosen to implement a single, simple prediction test. Because in ASSRUnit so far, we have only looked at experimental observations and computational models that cover gamma- and beta-range entrainment, the first test simply generates a prediction about how, in a given model, power in the alpha band (here at 10 Hz) differs between the control network and the schizophrenia-like network at 10 Hz drive. Note that this prediction test has been studied in the experimental literature, which means that it could have already been included in the experimental database and therefore does not represent a true prediction. However, we have chosen to include it for the purpose of demonstration. As before, more detailed information can be found in the accompanying Jupyter notebooks (https://github.com/ChristophMetzner/ASSRUnit/blob/CompPsychArticle/assrunit/Notebooks/Example_Prediction.ipynb).

Figure 9.

An overview of the workflow to generate predictions from a model. As before, a model is instantiated and the necessary parameters are passed on. Afterward, a prediction test is created in the same way as a standard test would be created, with the exception that prediction tests do not take experimental observations as arguments, because it is assumed that no experimental data exist yet. The test is then executed and returns a score. However, in the case of prediction tests, this score only contains the result of the model simulations (in this example, a ratio of the values of the output schizophrenia-like network and the control network). This score could, for example, be added to a prediction database.

DISCUSSION

The Potential Role of the Framework Within Computational Psychiatry

The use of computational approaches has seen a significant increase over the last decades in almost all areas of medicine and life sciences. Especially in psychiatry, it has become clear that the complex and often polygenic nature of psychiatric disorders might only be understood with the help of computational models (Adams et al., ; Corlett & Fletcher, ; Friston et al., ; Montague et al., ; Siekmeier, ; Stephan & Mathys, ; Wang & Krystal, ). Naturally, the number of computational models in the field of psychiatry has also increased significantly over the last years, and it has been argued that in silico instantiations of biomarkers are a crucial step toward understanding underlying disease mechanisms (Siekmeier, ). While this large increase in the number of modeling studies shows the importance of computational methods in the field, it also raises several issues that impede the community in exploiting these approaches to their full potential. For a computational model to be a substantial contribution to knowledge, it has to adequately instantiate experimental observations, correctly implement the mathematical equations of the model, and generate experimentally testable predictions. The approach presented here addresses two of these three requirements, namely, the instantiation of experimental observations and the generation of testable predictions. While correctness of the code is an equally important requirement, it was out of the scope of the current work, because it very strongly depends on the type of computational model and on the programming language used to implement the model. Nevertheless, the approach presented here offers significant benefits for not only the computational psychiatry community but the psychiatry community as a whole, while imposing little additional effort on users and contributors. It gives modelers a tool to query experimental observations on neurophysiological and neurocognitive biomarkers and therefore helps them include current relevant experimental data in their modeling efforts. It further enables them to validate their modeling output against experimental observations during model construction and to demonstrate the performance of their models, both with respect to the experimental literature and with respect to other competing models. In addition to the benefits it offers modelers, it also enables experimentalists to quickly gain insight into the current state of modeling and to extract experimentally testable predictions from the models. Last, but not least, it offers a tool to reviewers that allows them to judge a newly proposed model by making explicit its performance against experimental data and competing models.

The concept of automated code testing and validation has been successfully applied in computer science for many years now; however, it is only slowly finding its way into the computational branches of scientific fields. SciUnit attempts to satisfy this demand by providing a simple, flexible, yet powerful framework to address the earlier mentioned issues. The computational neuroscience community has started to adopt this framework for the automatic validation of single neuron models (NeuronUnit; Gerkin & Omar, ). We are not aware of any similar efforts in the field of psychiatry.

Because schizophrenia is a polygenic, multifactorial, and very heterogeneous disorder, it has been argued that the usefulness of biomarkers lies in their potential to dissect the disorder into subtypes, which might even be linked more closely to findings on the genetic level (Markou et al., ; Meyer-Lindenberg & Weinberger, ; Perlis, ). The proposed ASSRUnit module together with computational models of biomarkers and specifically designed test suites could strongly facilitate this process by providing mechanistic links between neurophysiological or neurocognitive biomarkers and changes at the synaptic, cellular, and/or network level.

Future Directions for ASSRUnit

The presented ASSRUnit module can be easily extended and modified by others to fit their needs (e.g., to include more specialized visualization tools). Our efforts for establishing ASSRUnit as a widely used tool will focus on three main areas. (a) We aim to cover the majority of existing experimental studies with our experimental database in the future. Furthermore, we hope to convince experimentalists to provide more detailed experimental data or to ideally create database entries themselves. (b) We also aim to cover the majority of current computational models that describe the cortical circuitry responsible for the ASSR. Again, we hope to encourage modelers to contribute actively to ASSRUnit. (c) We aim to extend our set of prediction tests and thus our prediction database.

The most straightforward extension, in our view, is to include information on phase locking in addition to pure power in certain frequency bands. Several studies have reported, additionally to a reduction in gamma power, a reduction in the phase-locking factor for patients suffering from schizophrenia (e.g., Brenner et al., ; Krishnan et al., ; Kwon et al., ; Light et al., ; Vierling-Claassen et al., ). These observations can very easily be incorporated into the existing module by including the experimental observations in the database, adding the necessary capabilities to the model classes, and adding the appropriate tests that link the experimental observations to the model capabilities.

Furthermore, the changes in oscillatory activity upon auditory stimulation are not limited to the gamma and the beta ranges for schizophrenic patients but also extend to lower-frequency bands, such as alpha, theta, and delta. For example, Brockhaus-Dumke, Mueller, Faigle, and Klosterkoetter () found reduced phase locking in the alpha and theta bands for schizophrenic patients in an auditory paired-click paradigm, and Ford, Roach, Hoffman, & Mathalon () found a reduction of phase locking in the delta and theta ranges for schizophrenic patients in an auditory oddball task. Abnormalities in these frequency bands have also been found in many other paradigms outside of the auditory system (see Basar & Guntekin []). To the best of our knowledge, ASSRs to entrainment stimuli in the theta and delta ranges have not been looked at in schizophrenia. Therefore ASSRUnit could be used to generate predictions in these frequency ranges, as demonstrated in use case 3.

However, an inclusion of the earlier mentioned observations together with computational models explaining these deficits is not straightforward, because either the paradigms are different from the ones used to elicit ASSRs and/or the mechanisms underlying the effect are different, and therefore the computational models are substantially different to models of ASSRs. Therefore these deficits are better explored in separate modules solely focusing on each paradigm/deficit. However, it would be very interesting to co-explore computational models that have the capabilities to explain both ASSR gamma/beta band and delta/theta/alpha phase-locking deficits. Such an analysis could highlight interactions between different mechanisms underlying different symptoms or biomarkers.

Another very interesting and promising extension of the current module would be to include data and models from different psychiatric disorders, because schizophrenia is not the only disorder where patients show entrainment deficits. Wilson et al. () explored gamma power in adolescents with psychosis and found reductions compared to normally developing controls. Their patient group consisted of patients suffering from schizophrenia and also from schizoaffective disorder and bipolar disorder. Interestingly, these disorders show overlapping symptoms, neurobiological substrates, and predisposing gene loci. Other studies have found reduced power and phase locking in the gamma range in patients with bipolar disorder (O’Donnell et al., ; Rass et al., ; Spencer et al., ). The presented module is perfectly suited to highlight commonalities and differences across disorders and to link those to mechanistic explanations via different theoretical and computational models.

Other Modules Beyond ASSRUnit

The approach presented here, combining an experimental database with a collection of models, tests, prediction tests, and a resulting predictions database, can be readily applied to a number of other neurophysiological biomarkers of schizophrenia as well as other psychiatric disorders.

In patients suffering from schizophrenia, a dysfunction of the auditory system has long been suspected. In fact, a large number of biomarkers for schizophrenia, other than ASSR deficits, involve auditory processing. Several alterations of event-related potentials (ERPs), such as mismatch negativity (MMN), N100, and P50, have been described in the literature (see Shi, ; Siekmeier, , for reviews of potential biomarkers and computational models thereof).

Naturally, our approach is well adaptable to brain circuits outside of the auditory system. Working memory deficits are probably one of the most robust and best described cognitive deficits in schizophrenic patients (reviewed in J. Lee & Park, ; Piskulic, Olver, Norman, & Maruff, ). Patients show a decrease in working memory capacity, that is, the capacity to maintain, manipulate, and use information online for a relatively short period of time, across a broad range of paradigms. Again, several theoretical and computational models have been proposed, aiming to provide mechanistic descriptions of the underlying mechanisms (e.g., Cano-Colino & Compte, ; Compte, Brunel, Goldman-Rakic, & Wang, ; Durstewitz, Seamans, & Sejnowski, ; Singh & Eliasmith, ; Wang, ; Wang, Tegnér, Constantinidis, & Goldman-Rakic, ).

All these deficits and alterations, along with relevant computational models, could be integrated into packages similar to the proposed ASSRUnit package. Such a unified framework would be of great benefit for the study of schizophrenia pathology due to the diversity of symptoms, biomarkers, and experimental observations linked to the mental disease.

CONCLUSION

We have proposed a framework for automated validation and comparison of computational models of neurophysiological and neurocognitive biomarkers of psychiatric disorders. The approach builds on SciUnit, a Python framework for scientific model comparison. As a case in point, we used this framework to develop ASSRUnit, a module comprising an experimental observations database, computational models, capabilities, tests/test suites, and visualization functions for ASSR response deficits in schizophrenia.

Our approach will facilitate the development, validation, and comparison of computational models of neurophysiological and neurocognitive biomarkers of psychiatric disorders by making the scope of models explicit and by making it easy for the user to assess a model’s validity and to compare a model against competing models. Furthermore, it is easy to use; straightforward to extend to more experimental observations, computational models, and analyses; and ready to apply to other biomarkers. Therefore the adoption of the proposed framework could be of great use for modelers, reviewers, and experimentalists in the field of computational psychiatry.

AUTHOR CONTRIBUTIONS

Christoph Metzner: Conceptualization, Methodology, Software, Writing original draft, Writing review & editing. Tuomo Mäki-Marttunen: Software, Writing review & editing. Bartosz Zurowski: Conceptualization, Writing review & editing. Volker Steuber: Conceptualization, Writing review & editing.

FUNDING INFORMATION

Christoph Metzner, Deutsche Forschungsgemeinschaft (http://dx.doi.org/10.13039/501100001659), Award ID: ME 4391/1-1. Tuomo Mäki-Marttunen, Norges Forskningsråd (http://dx.doi.org/10.13039/501100005416), Award ID: 248828.

Computational Psychiatry

Research Articles