APPLICATION OF MACHINE LEARNING METHODS AND MATHEMATICAL MODELS FOR BIG DATA ANALYSIS
Abstract and keywords
Abstract:
In the context of the rapid growth of information volumes, the problem of big data analysis is becoming particularly relevant. This article investigates the symbiosis of machine learning (ML) methods and fundamental mathematical models as a basis for effective knowledge extraction from large datasets. aree aim of the work is the development and comparative evaluation of a set of ML methods supported by a mathematical apparatus for classification and clustering tasks. Based on an experiment using a dataset from the UCI Machine Learning Repository, a comparative analysis of algorithms was conducted, including Logistic Regression, Support Vector Machine (SVM), Random Forest, and Multilayer Perceptron. The results show that neural networks (Accuracy: 0.92, F1-score: 0.89) and ensemble methods outperform classical algorithms when working with heterogeneous data. It is emphasized that mathematical models from the fields of optimization, linear algebra, and probability theory are an integral foundation that ensures the correctness and efficiency of ML algorithms. The conclusion is made about the feasibility of an integrated approach combining the computational power of ML and the rigor of mathematical models.

Keywords:
big data, machine learning, mathematical models, classification, clustering, neural networks, optimization
Text
Text (PDF): Read Download
References

1. Chen, M., Mao, S., & Liu, Y. Big Data: A Survey. Mobile Networks and Applications. 2019. №19(2). Pp. 171–209. DOI: https://doi.org/10.1007/s11036-013-0489-0

2. Deisenroth, M. P., Faisal, A. A. & Ong, C. S. Mathematics for Machine Learning. Cambridge University Press. 2020. 398 p. DOI: https://doi.org/10.1017/9781108679930

3. Bottou, L., Curtis, F. E., & Nocedal, J. Optimization Methods for Large-Scale Machine Learning. SIAM Review. 2018. № 60(2). Pp. 223–311. DOI: https://doi.org/10.1137/16M1080173

4. Isakov, R. Zh., Abdykalykov, A. A. Possibilities of applying artificial intelligence in disease diagnostics in Kyrgyzstan. Bulletin of the Kyrgyz-Russian Slavic University. 2022. No. 22(5). Pp. 124–130.

5. Murphy, K. P. Probabilistic Machine Learning: An Introduction. MIT Press. 2022. 864 p.

6. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems. 2018. P. 31.

7. Goodfellow I., Bengio, Y. & Courville, A. Deep Learning. MIT Press. 2016. 800 p.

8. Dua, D. and Graff, C. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. 2019. [Electronic resource]. URL: http://archive.ics.uci.edu/ml

Login or Create
* Forgot password?