Interpretable machine learning models to predict cadmium in wheat for safe production and soil management
Fundamental Research
Article 2025 English
Authors
QL
Qixin Lü
ZT
Zhi-Xian Tang
ZT
Zhong Tang
Abstract
1 min read
Accurate prediction of cadmium (Cd) concentrations in wheat grain is essential for ensuring food safety and sustainable agriculture. Here, we developed a predictive model using nine machine learning (ML) algorithms based on a dataset of 1,339 soil-wheat grain pairs, with a focus on soil properties. The results showed that the eXtreme Gradient Boosting (XGBoost) model outperformed others, achieving superior predictive accuracy (R
2 = 0.90) compared to multiple linear regression (R
2 = 0.69). Through Shapley Additive Explanations (SHAP) analysis, soil total Cd (mean |SHAP| value, 0.20) and pH (0.08) were identified as key determinants, while soil Mn (0.06) and Zn (0.03) concentrations as minor determinants for wheat grain Cd. Soil Cd had a positive effect on grain Cd concentration, whereas soil pH, Mn and Zn showed negative effects. Extending the XGBoost model with 373 nation-scale paired data confirmed its robustness (R
2 = 0.86), and identified high-risk areas for Cd accumulation in southwest China and northwestern Henan province. An online application (https://wheat.cdpredict.cn) was developed for rapid Cd predictions in wheat. To ensure compliance with the wheat grain Cd limit of 0.1 mg/kg, soil Cd safety thresholds were established for different soil pH ranges. We further recommend that approximately 3.4% and 10.5% of cultivated soils should maintain Cd levels within 0.30 and 0.34 mg/kg, respectively. This interpretable ML model provides an actionable tool for managing soil contaminated with Cd to ensure the safe production of wheat.
Discussion(0)
No comments yet. Be the first to comment.