Anomalies in Language As a Biomarker for Schizophrenia

Janna N. de Boer; Sanne G. Brederoo; Alban E. Voppel; Iris E.C. Sommer


Curr Opin Psychiatry. 2020;33(3):212-218. 

In This Article

Food for Thought

Language Versus Speech

Two important and notably distinct concepts in this line of research are 'language' and 'speech'. Language is the term used for the mental system underlying verbal behaviour, which includes meaning, grammar and form. Speech is the term used for the spoken output or the medium of the language, the way it is produced by the speech organs. Language can of course also be produced in writing or in gestures (sign language), which still requires similar cognitive processes to formulate sentences, without the use of the vocal tract (i.e. without articulation). Although communication difficulties in schizophrenia are currently described as 'disorganized speech', the literature discussed in this review clearly demonstrates that patients with schizophrenia display a wide variety of language disorders including broad disturbances in semantics, pragmatics and grammatical structures.[12,15] 'Disorganized speech'[77] would therefore, better be described as 'disturbed language', which may include, but is not limited to, speech.


The term biomarker is classically used for analytes of a human biological system (e.g. plasma, urine, cerebrospinal fluid) or for biological properties (i.e. mass concentration). However, the Biomarkers Definitions Working Group and other initiatives have advocated a broader, less ambiguous, definition of biomarkers, namely: 'a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes or pharmacological response'.[78,79] Language output fully adheres to this definition and can thus serve as a true biomarker for schizophrenia-spectrum disorders.

Current State of Research

Of note, in most classification models discussed in this review, the final model included one or several variables which are nonspecific to language. Examples of such general features are task duration (reading 400 words aloud)[76] and response time to a question.[75] These variables are most likely based on general cognitive deficits such as reductions in attention, working memory or general fatigue, which are common in schizophrenia.[80] The decision to add less specific measures to a model is presumably motivated by the aspiration of models with high diagnostic or prognostic accuracy and the pursuit of developing clinically valuable tools. However, whereas general cognitive measures may have high discriminatory power, employing them in an early stage forecloses improvement of our knowledge of language-related disturbances in schizophrenia. Further, including nonspecific measures in classification models reduces their power to detect early or subtle symptoms in spoken language that are specific to schizophrenia and may be used for differential diagnosis. While we endorse the ultimate goal of developing highly accurate diagnostic and prognostic tools, the aim to assess the value of purely linguistic measures should not be neglected. To this end, results of models with only linguistic features should be reported as well, even if they are less accurate than models that in addition include nonspecific factors.

A related point of discussion is that in extensive machine learning and deep learning models, features become abstract and an abounding number of features is fed to the model (e.g. 40 526 speech features were used in a model to detect post-traumatic stress disorder[81]), which renders it difficult to retrace a classification model to clinically recognizable symptoms or signs. A word of caution for this development is, therefore, in order. In an extreme example such tendencies could lead to a model that bases its classification of patients and healthy controls only on their use of antipsychotic medication. This of course would lead to (near) perfect classification scores, but such a model would have no diagnostic value. Similarly, algorithms might 'overfit' predictions due to for example multicollinearity or correlated predictors, producing unstable estimates. Such problems can be overcome by validation in a truly independent dataset; problems in the model fitting stage will show up as poor performance in a validation process. However, of the studies reviewed here, most use cross-validation to assess the generalizability of their models, which does not fully overcome this risk of overfitting. Few studies validated their models in a separate subset of their data[70] or in an independent dataset;[63] the latter of which should become the standard in this field of research.