• 网站首页
  • 期刊简介
  • 编委会
  • 投稿须知
  • 绘图要求
  • 期刊订阅
  • 联系我们
  • English

用户登录

  • 作者登录
  • 审稿登录
  • 编辑登录
  • 读者登录

在线期刊

  • 当期目次

  • 过刊浏览

  • Email Alert

  • RSS

  • 文章点击排行

  • 文章下载排行

下载专区

  • 《地球环境学报》征稿简则

  • 《地球环境学报》绘图要求

  • 国标文献著录格式

  • 标点符号用法

友情链接

  • 中国科学院
  • 中国科学院地球环境研究所
引用本文:刘南健,周卫健,李国辉.2026.基于机器学习的西安市2018—2020年O3浓度预测[J].地球环境学报,(1):116-127
LIU Nanjian,ZHOU Weijian,LI Guohui.2026.Machine learning-based prediction of O3 concentration from 2018 to 2020 in Xi’an[J].Journal of Earth Environment,(1):116-127
【打印本页】   【下载PDF全文】   【HTML】   【查看/发表评论】  【下载PDF阅读器】  【关闭】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 333次   下载 246次 本文二维码信息
码上扫一扫!
分享到: 微信 更多
字体:加大+|默认|缩小-
基于机器学习的西安市2018—2020年O3浓度预测
刘南健1,2,周卫健1,3,李国辉1,4
1.中国科学院地球环境研究所 黄土科学全国重点实验室,西安 710061 2.中国科学院大学,北京 100049 3.西安加速器质谱中心 陕西省加速器质谱技术及应用重点实验室,西安 710061 4.中国科学院气溶胶化学与物理重点实验室,西安 710061
摘要:
臭氧(O3 )浓度受自然因素和人类活动影响,呈现复杂的非线性演化特征,准确预测其浓度对环境管理和决策至关重要。文章以西安市为对象,利用2018—2020年逐小时空气污染物数据及同期ERA5 气象再分析资料,构建卷积神经网络(CNN)、极端梯度提升机(XGBoost)、随机森林(RF)和多元线性回归(MLR)4种模型,进行24 h单步O3浓度预测。结果表明:基于树结构的XGBoost和RF模型整体预测性能优异,尤其在2020年全时段和该年夏季预测中表现突出,其中,XGBoost效果最佳;相比之下,经典的CNN模型并未展现出预期优势,而MLR模型在2020年及该年夏季预测中表现最差。所有模型对O3浓度预测均存在一定程度的高估与低估,特别是对下午时段较高浓度的O3浓度普遍低估,但树模型(XGBoost和RF)能更好地控制预测偏差幅度。进一步通过SHAP值解释2020年预测结果,发现历史 O3浓度、太阳辐射(SOL)和气压(PRS)是影响模型输出的前三大关键特征;在2020年的夏季预测中, O3浓度和辐射相关因子对模型决策贡献尤为显著。研究表明树集成模型在处理O3浓度预测的非线性特征时更具优势,为相关区域空气质量预报提供有效技术参考。
关键词:  O3浓度预测  机器学习  基于树的模型  神经网络
DOI:10.7515/JEE2023074
CSTR:32259.14.JEE2023074
分类号:
文献标识码:A
基金项目:中国科学院(B类)战略性先导科技专项项目(XDB40000000)
英文基金项目:
Machine learning-based prediction of O3 concentration from 2018 to 2020 in Xi’an
LIU Nanjian1,2,ZHOU Weijian1,3,LI Guohui1,4
1.State Key Laboratory of Loess Science, Institute of Earth Environment, Chinese Academy of Sciences, Xi’an 710061 , China2.University of Chinese Academy of Sciences, Beijing 100049 , China3.Shaanxi Key Laboratory of Accelerator Mass Spectrometry Technology and Application, Xi’an AMS Center, Xi’an 710061 , China4.Key Laboratory of Aerosol Chemistry and Physics, Chinese Academy of Sciences, Xi’an 710061 , China
Abstract:
Background, aim, and scope Due to the influence of natural and human activities, the change of ozone (O3) concentration is a complex nonlinear process, and accurate prediction of O3 concentration process is of great significance for the decision-making and management for relevant environmental protection department. This study aims to develop and compare four machine learning models for 24 h prediction of O3 concentrations in Xi’an while identifying key influencing factors through model interpretability. The scope is limited to Xi’an using hourly air pollutant and ERA5 meteorological data from 2018—2020, with performance evaluated for the full year and summer of 2020 focusing on prediction accuracy and interpretability. Materials and methods O3 in Xi’an was taken as the research object. Using hourly air-quality monitoring data from 2018 to 2020 and ERA5 meteorological reanalysis data, we constructed a convolutional neural network model (CNN), extreme gradient boosting (XGBoost), random forest model (RF) and multiple linear regression model (MLR) to perform singlestep prediction of O3 concentration for the next 24 h. Results The tree-based models (XGBoost and RF) showed strong prediction performance in 2020 and summer of 2020, with XGBoost performing best, while the classic convolutional neural network model did not exhibit excellent prediction performance, and the MLR model had the worst performance in both 2020 and summer of 2020. Both linear and the nonlinear models overestimated or underestimated O3 concentration in the study area to varying degrees, especially high O3 concentrations in the afternoon. However, the tree-based models better controlled the deviation of their estimated. Finally, in the 2020 prediction, SHAP plots of the two tree-based models (XGBoost and RF) revealed that O3 concentration, solar radiation (SOL), and pressure (PRS) at the previous 24 h timestep were the three most important factors affecting the model output, while in the summer prediction, O3 concentration and radiation factors at the previous 24 h timestep made a critical contribution to model decisions. Discussion Accurately predicting O3 concentration is challenging because it is influenced by complex human activities and weather conditions. In this study, the influencing factors we used were mainly dynamic. Therefore, future research should not only focus on dynamic factors such as meteorological conditions, but also on static variables such as terrain; adding more variables is expected to improve model prediction performance. In addition, this study focused on time-series prediction of O3 concentration, but air pollutants are generally distributed regionally; therefore, spatial dimension should be considered in addition to temporal prediction. The convolutional neural networks are famous for processing image signals, especially the function of extracting abstract features through hidden state operations, therefore, in spatial prediction tasks, deep learning models represented by convolutional neural networks may have great application potential, but it also requires considering computational costs and time costs. Finally, all the machine learning models in this paper underestimate the O3 concentration in the afternoon, however, this is a time of day when human activity is very high. On the one hand, it may be due to the model itself, and on the other hand, it may be due to the limited features used. Conclusions Both the tree -based machine learning models and the deep learning model have different degrees of overestimation or underestimation of O3 concentration in the study area, but in general, the XGBoost model has better predictive ability, while the prediction effect of the CNN model is not particularly outstanding, and the MLR model has the worst predictive performance. Recommendations and perspectives The results of the study can be used as a scientific basis for the prediction and early warning of O3 concentration in Xi’an. It is hoped that in the later work, the deep learning model can be used to predict in the spatial dimension. In addition, embedding physically-based chemical evolution of air pollutants into machinelearning models will greatly increase decision-makers’ confidence in applying them.
Key words:  O3 concentration prediction  machine learning  tree-based model  neural network
您是本站第  访问者
版权所有:《地球环境学报》编辑部 陕ICP备11001760号-3
主办:中国科学院地球环境研究所 地址:西安市雁塔区雁翔路97号 邮政编码:710061
电话:029-62336252 电子邮箱:jee@ieecas.cn
技术支持:北京勤云科技发展有限公司