Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

BACKGROUND: Helicobacter pylori (H. pylori) is a bacterium that colonizes the stomach and is a major risk factor for gastric cancer, with an estimated 89% of non-cardia gastric cancer cases worldwide attributable to H. pylori. Prospective studies provide reliable evidence for quantifying the association between gastric cancer and H. pylori, as they circumvent the risk of a false negative due to possible reduction in antibody levels before cancer development. METHODS: In a large-scale prospective study within the China Kadoorie Biobank, H. pylori infection is being analysed as a risk factor for gastric cancer. The presence of infection is typically determined by serological tests. The immunoblot test, although well established, is more labour intensive and uses a larger amount of plasma than the alternative high-throughput multiplex serology test. Immunoblot outputs a binary positive/negative serostatus classification, while multiplex outputs a vector of continuous antigen measurements. When mapping such multidimensional continuous measurements onto a binary classification, statistical challenges arise in defining classification cut-offs and accounting for the differences in infection evidence provided by different antigens. We discuss these challenges and propose a novel solution to optimize the translation of the continuous measurements from multiplex serology into probabilities of H. pylori infection, using classification algorithms (Bayesian additive regressive trees (BART), multidimensional monotone BART, logistic regression, random forest and elastic net). We (i) calibrate and apply classification models to predict probabilities of H. pylori infection given multiplex measurements, (ii) compare the predictive performance of the models using immunoblot as reference, (iii) discuss reasons for the differences in predictive performance and (iv) apply the calibrated models to gain insights on the relative strengths of infection evidence provided by the various antigens. RESULTS: All models showed high discriminative ability with at least 95% area under the curve (AUC) estimates on the training and test data. There was no substantial difference between the performance of models on the training and test data. CONCLUSIONS: Classification algorithms can be used to calibrate the H. pylori multiplex serology test to the immunoblot test in the China Kadoorie Biobank. This study furthers our understanding of the applicability of classification algorithms to the context of serologic tests.

Original publication

DOI

10.1186/s41512-025-00202-x

Type

Journal article

Journal

Diagn Progn Res

Publication Date

11/08/2025

Volume

9

Keywords

Helicobacter pylori, Classification algorithms, Immunoblot, Multiplex serology, Prediction, Supervised learning