RESEARCH PAPER
Improvement in classification capabilities of surface water samples based on analysis of multidimensional data from gas sensor array
More details
Hide details
1
Department of Applied Mathematics, Lublin University of Technology, Lublin, Poland
2
Department of Water Supply and Wastewater Disposal, Lublin University of Technology, Lublin, Poland
Corresponding author
Magdalena Piłat-Rożek
Department of Applied Mathematics, Lublin University of Technology, Nadbystrzycka 38, 20-618, Lublin, Poland
Ann Agric Environ Med. 2025;32(2):222-229
KEYWORDS
TOPICS
ABSTRACT
Introduction and objective:
It has been proven that e-noses can successfully differentiate between drainage and river water samples. However, it was supposed that the classification accuracy in the previous article from the series could have been refined. The aim of the article was to improve the classification accuracy of surface water samples analyzed with a gas sensor array.
Material and methods:
The multidimensional data on which the machine learning models were trained was derived from river water, drainage water and synthetic air samples measured using an array comprising 17 gas sensors. In this research, the unsupervised t-SNE and k-medians were used for dimensionality reduction, visualization on 2-dimensional plane, and clustering. Subsequently, supervised classificators XGBoost and AdaBoost.M1 were trained and compared with regard to the achieved quality of classification of objects into correct classes.
Results:
The visualization using t-SNE and clustering with k-medians clearly distinguished the observations from the water sample and different drainage samples. The applied supervised machine learning methods achieved 88.8% and 89.2% correct classifications on the test set for the XGBoost and AdaBoost.M1 models, respectively.
Conclusions:
Despite the absence of statistical significance in differences of medians in most of the multiple comparisons between sample groups for all the classical indicators, the electronic nose allows differentiating and correctly classifying surface water samples with high accuracy.
REFERENCES (28)
1.
Qin Y, Zhao Q, Zhou D, et al. Application of flash GC e-nose and FTNIR combined with deep learning algorithm in preventing age fraud and quality evaluation of pericarpium citri reticulatae. Food Chem X. 2024;21:101220. doi:10.1016/j.fochx.2024.101220.
2.
Paleczek A, Rydosz A. The effect of high ethanol concentration on E-nose response for diabetes detection in exhaled breath: Laboratory studies. Sensors Actuators B Chem. 2024;408:135550. doi:10.1016/j.snb.2024.135550.
3.
Borowik P, Dyshko V, Tkaczyk M, et al. Analysis of Wheat Grain Infection by Fusarium Mycotoxin-Producing Fungi Using an Electronic Nose, GC-MS, and qPCR. Sensors. 2024;24(2):326. doi:10.3390/s24020326.
4.
Nam SH, Lee J, Kim E, et al. Electronic tongue and nose sensor coupled with fluorescence spectroscopy to analyze aesthetic water quality parameters in drinking water distribution system. Process Saf Environ Prot. 2024;188:1201–1210. doi:10.1016/j.psep.2024.05.134.
5.
Piłat-Rożek M, Łazuka E, Majerek D, et al. Application of Machine Learning Methods for an Analysis of E-Nose Multidimensional Signals in Wastewater Treatment. Sensors. 2023;23(1):487. doi:10.3390/s23010487.
6.
Piłat-Rożek M, Łagód G. Feasibility of classification of drainage and river water quality using machine learning methods based on multidimensional data from a gas sensor array. Ann Agric Environ Med. 2024;31(4):513–519. doi:10.26444/aaem/196101.
7.
Raszewski G, Jamka K, Bojar H, et al. Endocrine disrupting micropollutants in water and their effects on human fertility and fecundity. Ann Agric Environ Med. 2022;29(4):477–482. doi:10.26444/aaem/156694.
8.
van der Maaten L. Accelerating t-SNE using Tree-Based Algorithms. J Mach Learn Res. 2014;15(93):3221–3245.
9.
Krijthe JH. Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation. Published online 2015.
https://github.com/jkrijthe/Rt... (access: 2025.05.19).
10.
Godichon-Baggioni A, Surendran S. A penalized criterion for selecting the number of clusters for K-medians. Published online September 8, 2022. doi:10.48550/arXiv.2209.03597.
11.
Godichon-Baggioni A, Surendran S. Kmedians: K-Medians. CRAN Contrib Packag. Published online September 6, 2022. doi:10.32614/CRAN.package.Kmedians.
12.
Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16. Association for Computing Machinery; 2016. p. 785–794. doi:10.1145/2939672.2939785.
13.
Freund Y, Schapire RE. Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on International Conference on Machine Learning. Inc., Morgan Kaufmann Publishers; 1996. p. 148–156.
14.
Alfaro E, Gamez M, Garcia N. adabag: Applies Multiclass AdaBoost. M1, SAMME and Bagging. CRAN Contrib Packag. Published online June 6, 2006. doi:10.32614/CRAN.package.adabag.
15.
Kassambara A. rstatix: Pipe-Friendly Framework for Basic Statistical Tests. CRAN Contrib Packag. Published online May 27, 2019. doi:10.32614/CRAN.package.rstatix.
16.
Ogle DH, Doll JC, Wheeler AP, et al. FSA: Simple Fisheries Stock Assessment Methods. CRAN Contrib Packag. Published online October 8, 2015. doi:10.32614/CRAN.package.FSA.
17.
R Core Team. R: A Language and Environment for Statistical Computing. Published online 2024.
http://www.r-project.org/ (access: 2025.05.19).
18.
Chari T, Pachter L. The specious art of single-cell genomics. Papin JA, ed. PLOS Comput Biol. 2023;19(8):e1011288. doi:10.1371/journal.pcbi.1011288.
19.
Díaz-González L, Rosales-Rivera M, Chávez-Almazán LA. Comprehensive assessment of groundwater quality in Mexico and application of new water classification scheme based on machine learning. Rev Mex Ing Química. 2023;22(2):1–30. doi:10.24275/rmiq/IA235.
20.
Christiaens L, Orban P, Brouyère S, et al. Tracking the sources and fate of nitrate pollution by combining hydrochemical and isotopic data with a statistical approach. Hydrogeol J. 2023;31(5):1271–1289. doi:10.1007/s10040-023-02646-1.
21.
Koray AM, Gyimah E, Metwally M, et al. Leveraging machine learning for enhanced reservoir permeability estimation in geothermal hotspots: a case study of the Williston Basin. Geotherm Energy. 2025;13(1):8. doi:10.1186/s40517-024-00323-4.
22.
Arrueta L, King K, Hanrahan B, et al. The Effect of Alfalfa on Subsurface Discharge and Nutrient Losses Mediated by Precipitation and Antecedent Moisture Conditions. JAWRA J Am Water Resour Assoc. 2025;61(2). doi:10.1111/1752-1688.70018.
23.
Tenneti S, Divya PD, Tejaswini ESS, et al. Interpretability and performance assessment of advanced machine learning models for α-factor prediction in wastewater treatment plants. J Water Process Eng. 2025;72:107637. doi:10.1016/j.jwpe.2025.107637.
24.
Rahu MA, Chandio AF, Aurangzeb K, et al. Toward Design of Internet of Things and Machine Learning-Enabled Frameworks for Analysis and Prediction of Water Quality. IEEE Access. 2023;11:101055–101086. doi:10.1109/ACCESS.2023.3315649.
25.
Soetedjo A, Hendriarianti E, Prasetya RP. Biological Oxygen Demand (BOD) and Chemical Oxygen Demand (COD) measurement of wastewater using Machine Learning regression techniques implemented on the embedded system. Int J Innov Comput Inf Control. 2023;19(5). doi:10.24507/ijicic.19.05.1407.
26.
Srivastava S, Chaudhri SN, Rajput NS, et al. A novel data-driven technique to produce multi- sensor virtual responses for gas sensor array-based electronic noses. J Electr Eng. 2023;74(2):102–108. doi:10.2478/jee-2023-0013.
27.
Liao K, You J, Han C, et al. Dissolved organic nitrogen depresses the expected outcome of wastewater treatment upgrading on effluent eutrophication potential mitigation: Molecular mechanistic insight. Water Res. 2024;267:122535. doi:10.1016/j.watres.2024.122535.
28.
Janik K, Ślósarczyk K, Sitek S. A study of riverbank filtration effectiveness in the Kępa Bogumiłowicka well field, southern Poland. J Hydrol Reg Stud. 2024;53:101834. doi:10.1016/j.ejrh.2024.101834.