Detection of COVID-19 by Machine Learning Using Routine Laboratory Tests

Hikmet Can Çubukçu, MD; Deniz İlhan Topcu, MD, PhD; Nilüfer Bayraktar, MD; Murat Gülşen, MD; Nuran Sarı, MD; Ayşe Hande Arslan, MD


Am J Clin Pathol. 2022;157(5):758-766. 

In This Article

Abstract and Introduction


Objectives: The present study aimed to develop a clinical decision support tool to assist coronavirus disease 2019 (COVID-19) diagnoses with machine learning (ML) models using routine laboratory test results.

Methods: We developed ML models using laboratory data (n = 1,391) composed of six clinical chemistry (CC) results, 14 CBC parameter results, and results of a severe acute respiratory syndrome coronavirus 2 real-time reverse transcription–polymerase chain reaction as a gold standard method. Four ML algorithms, including random forest (RF), gradient boosting (XGBoost), support vector machine (SVM), and logistic regression, were used to build eight ML models using CBC and a combination of CC and CBC parameters. Performance evaluation was conducted on the test data set and external validation data set from Brazil.

Results: The accuracy values of all models ranged from 74% to 91%. The RF model trained from CC and CBC analytes showed the best performance on the present study's data set (accuracy, 85.3%; sensitivity, 79.6%; specificity, 91.2%). The RF model trained from only CBC parameters detected COVID-19 cases with 82.8% accuracy. The best performance on the external validation data set belonged to the SVM model trained from CC and CBC parameters (accuracy, 91.18%; sensitivity, 100%; specificity, 84.21%).

Conclusions: ML models presented in this study can be used as clinical decision support tools to contribute to physicians' clinical judgment for COVID-19 diagnoses.


Coronavirus disease 2019 (COVID-19) was first seen in China after severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections in December 2019. The disease became a global health problem and was declared a pandemic on March 3, 2020, by the World Health Organization.[1]

Clinical chemistry (CC), hematology, coagulation, and specific protein test results of patients with COVID-19 changed during the illness. Zhang et al[2] reported that patients with COVID-19 could have lymphopenia (75.4%) and eosinopenia (52.9%). In this study, high C-reactive protein, procalcitonin, and D-dimer were associated with disease severity. Lippi and Plebani[3] stated that procalcitonin levels increased in in patients with COVID-19 who had bacterial coinfection. In a meta-analysis study from Lippi et al,[4] thrombocytopenia was associated with increased mortality risk in patients with COVID-19. Furthermore, higher neutrophil, D-dimer, prothrombin time, alanine aminotransferase (ALT), lactate dehydrogenase (LDH), total bilirubin, high sensitive troponin I, and lower lymphocyte and albumin levels were detected in patients with COVID-19 who were hospitalized in an intensive care unit (ICU).[5] Prothrombin time, fibrin degradation products, and D-dimer were higher in patients who died of COVID-19 pneumonia than in surviving patients.[6] The recommended laboratory tests and their changes during COVID-19 were extensively described by the International Federation of Clinical Chemistry (IFCC) and Laboratory Medicine Taskforce on COVID-19.[7] Although the literature covered which analytes were valuable in diagnosis, monitoring, and estimating prognosis, it did not quantitatively reflect the overall contribution and use of analytes in SARS-CoV-2 detection concerning the accuracy, sensitivity, and specificity.

Real-time reverse transcription–polymerase chain reaction (rRT-PCR) is a gold standard method that detects the SARS-CoV-2 RNA.[8] Errors originating from the preanalytical phase, such as improper handling and transportation of specimens, contamination, inadequate sample quality, the presence of PCR inhibitors, and misidentifications, lead to false-negative test results.[9–11] Thus, a negative test result cannot exclude infection where strong clinical suspicion exists.[12]

Machine learning (ML) models using routine laboratory results can provide valuable tools that support clinical decisions for detecting COVID-19 cases. The present study aimed to detect SARS-CoV-2 cases with high accuracy, sensitivity, and specificity with ML models using routine laboratory test results.