Software implementation of a deterministic approach to analyzing large files
Abstract and keywords
Abstract:
this article offers an original deterministic approach to analyzing text files, which allows solving the problem of a sharp slowdown in the total data processing speed with an increase in file size over 500 MB. Two variants of this approach are considered: direct analysis of the file mapped to the virtual address space of the process using pointer arithmetic (as with the usual memory contents) and an approach based on alternating copying of blocks of the file mapped to memory into an additional RAM buffer and further analysis of the data already in the buffer. Each of the considered variants of the approach was programmatically implemented and the time characteristics of its execution on the same data set were obtained. The analysis of the results clearly indicates the advantage of using a file projection into memory and additional buffering when analyzing text files larger than 500 MB.

Keywords:
log file, files projected into memory, pointer, execution time, pointer arithmetic, text file analysis, buffering, data read cycle, prototype development, data caching, process address space.
References

1. Garifullin M.F. Processing of text and graphic information / M.F. Garifullin, Moscow: TECHNOSPHERE, 2019– 174 p.

2. Ivanov, D. A. Analysis of the effectiveness of algorithms for searching text information in heterogeneous file formats / D. A. Ivanov, E. A. Ivanishin, D. A. Shevtsov // Scientific thought. – 2025. – Vol. 25, No. 4-1(56). – pp. 54-56. – EDN TDMBOK.

3. Konovalov, G. G. Optimization of the data analysis process using regular expressions / G. G. Konovalov // Trends in the development of science and education. – 2023. – No. 104-14. – pp. 50-53. – DOIhttps://doi.org/10.18411/trnio-12-2023-775 . – EDN CVMKWH.

4. Nikekhin A.A. Fundamentals of C++ for modeling and calculations: textbook. the manual / A.A. Nikekhin. - St. Petersburg: NRU ITMO, 2014. 106 p.

5. Prata S. The C++ programming language. Lectures and exercises / S. Prata. — M. : Williams, 2012. — 1248 p.

6. Richter J. Windows for professionals. Creation of effective Win32-bit applications taking into account the specifics of the 64-bit version of Windows / J. Richter. - St. Petersburg: Peter, 2001. – 752 p.

7. Semenov, M. A. A software project for the syntactic analysis of text files using the c++ language / M. A. Semenov, R. S. Zaripova // Information technologies in construction, social and economic systems. – 2020. – № 3(21). – Pp. 52-54. – EDN TTIIMK.

8. Stevens, W. R. Unix. Professional programming / W. R. Stevens, St. Rago. – St. Petersburg: Peter, 2025. – 944 p.

9. Martin R. Pure architecture. The Art of Software Development / R. Martin. St. Petersburg: Pi-ter, 2021. 352 p.

10. Gulakov V. K. Structures and algorithms of multidimensional data processing: a monograph / V. K. Gulakov, A. O. Trubakov, E. O. Trubakov. St. Petersburg: Lan Publ., 2021, 356 p.

11. Kroshemor M. Algorithms of text processing: 125 problems with solutions / M. Kroshemor, T. Lecroc, V. Ritter. – M.: DMK Press, 2021. – 312 p.

12. Warren G. Algorithmic tricks for software. Moscow: Williams, 2004. — 288c.

13. McConnell S. Perfect code / S. McConnell. — M.: Russian edition, 2010. — 896 p.

14. Horton A. Visual C++ 2005: a basic course / A. Horton. Moscow: Williams, 2007. – 1152 p.

15. Hogenson G. C++/CLI: Visual C++ language for the environment .NET / G. Hogenson. – M.: Williams, 2007. – 464 p.

16. Shakhomirova, N. E. Proposals for modification of the system for viewing and analyzing diagnostic information based on the use of a hybrid database / N. E. Shakhomirova, A.V. Shakhomirov // System analysis and Logistics. – 2024. – No. S5(43). – pp. 60-66. – DOIhttps://doi.org/10.31799/2077-5687-2024-5-60-66. – EDN AHFCYY.

17. Sharunov, R. D. Analysis of documents by means of various artificial intelligence tools / R. D. Sharunov, O. A. Artemenko // High-tech technologies in instrument and mechanical engineering and the development of innovation in higher education institutions : Proceedings of the All-Russian Scientific and Technical Conference: in 2 volumes, Kaluga, November 19-21, 2024. Moscow: Federal State Budgetary Educational Institution of Higher Education "Bauman Moscow State Technical University (National Research University)", 2025. pp. 284-288. – EDN QDDHPK.

Login or Create
* Forgot password?