Osh, Russian Federation
UDC 004
UDC 62
CSCSTI 20.00
CSCSTI 20.15
Russian Classification of Professions by Education 02.07.01
Russian Library and Bibliographic Classification 745
Russian Trade and Bibliographic Classification 5
Russian Trade and Bibliographic Classification 51
BISAC COM COMPUTERS
In the context of the rapid growth of information volumes, the problem of big data analysis is becoming particularly relevant. This article investigates the symbiosis of machine learning (ML) methods and fundamental mathematical models as a basis for effective knowledge extraction from large datasets. aree aim of the work is the development and comparative evaluation of a set of ML methods supported by a mathematical apparatus for classification and clustering tasks. Based on an experiment using a dataset from the UCI Machine Learning Repository, a comparative analysis of algorithms was conducted, including Logistic Regression, Support Vector Machine (SVM), Random Forest, and Multilayer Perceptron. The results show that neural networks (Accuracy: 0.92, F1-score: 0.89) and ensemble methods outperform classical algorithms when working with heterogeneous data. It is emphasized that mathematical models from the fields of optimization, linear algebra, and probability theory are an integral foundation that ensures the correctness and efficiency of ML algorithms. The conclusion is made about the feasibility of an integrated approach combining the computational power of ML and the rigor of mathematical models.
big data, machine learning, mathematical models, classification, clustering, neural networks, optimization
1. Chen, M., Mao, S., & Liu, Y. Big Data: A Survey. Mobile Networks and Applications. 2019. №19(2). Pp. 171–209. DOI: https://doi.org/10.1007/s11036-013-0489-0
2. Deisenroth, M. P., Faisal, A. A. & Ong, C. S. Mathematics for Machine Learning. Cambridge University Press. 2020. 398 p. DOI: https://doi.org/10.1017/9781108679930
3. Bottou, L., Curtis, F. E., & Nocedal, J. Optimization Methods for Large-Scale Machine Learning. SIAM Review. 2018. № 60(2). Pp. 223–311. DOI: https://doi.org/10.1137/16M1080173
4. Isakov, R. Zh., Abdykalykov, A. A. Possibilities of applying artificial intelligence in disease diagnostics in Kyrgyzstan. Bulletin of the Kyrgyz-Russian Slavic University. 2022. No. 22(5). Pp. 124–130.
5. Murphy, K. P. Probabilistic Machine Learning: An Introduction. MIT Press. 2022. 864 p.
6. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems. 2018. P. 31.
7. Goodfellow I., Bengio, Y. & Courville, A. Deep Learning. MIT Press. 2016. 800 p.
8. Dua, D. and Graff, C. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. 2019. [Electronic resource]. URL: http://archive.ics.uci.edu/ml



