from 01.01.2014 to 01.01.2019
Bolashaq Academy (Department of Foreign Languages and Intercultural Communication, Senior Lecturer)
from 01.01.2014 to 01.01.2019
Karaganda, Kazakhstan
UDK 81 Лингвистика. Языкознание. Языки
UDK 31 Статистика. Демография. Социология
GRNTI 14.85 Технические средства обучения и учебное оборудование
GRNTI 16.31 Прикладное языкознание
OKSO 02.06.01 Компьютерные и информационные науки
OKSO 02.04.02 Фундаментальная информатика и информационные технологии
BBK 811 Прикладное языкознание
TBK 5004 Изобретательство. Рационализаторство
TBK 841 Общее и прикладное языкознание
TBK 8410 Общие вопросы. Лингвистика
TBK 8419 Прикладное языкознание
This article describes the author's approach to the analysis of the frequency of the biological terms in the printed texts of school textbooks of the full course of the biology of Kazakhstan secondary schools, allowed to reduce the cost of time for the processing in ten times. The authors of the article analyze the advantages and disadvantages of applications for text analysis and voice processing and offer their algorithm of fast counting of keywords in traditional texts. This research was carried out within the framework of the grant funding project of the Department of Science and Education of the Republic of Kazakhstan on the theme Creation of Trilingual Dictionary of Biological Terms with a Linguacultural Component.
frequency, frequency analysis, biological terms, counting, algorithm, applications, text mining, printed texts.
Introduction
Nowadays, text mining, aimed at processing textual data, still seems one of the vital core methods in numerous philological and corpus linguistics projects [1]. Text frequency analysis deals with words or their clusters used in documents, with the help of which one can identify similarities or difference as well as their relations to other variables of interest in the data mining project [1]. Text mining is used in such areas as linguistics, marketing, computer science, and social studies – wherever researchers use the frequency of lexical units for a better understanding of the internet users’ keywords search [2]. Plenty of digital resources deal with a text frequency analysis and, as a result, provide new ways of creating, processing, and analyzing such data through the computer. However, few methods suit the text mining of the printed sources; so such issue should be raised and solved to simplify the research flow and reduce the time consumption. Research Issues and Objectives Frequency analysis became imperative for the authors of this article and the participants of a grant funding project on the creation of a dictionary of the biological terms with a linguacultural component, designed for Kazakhstani secondary school students, studying biology in English, according to the program “The Trinity of Languages [3].” One of the project stages involved the biological terms frequency analysis in the Kazakhstani textbooks of the entire school course of biology that later would be used for the creation of a significant vocabulary database.Having found only six digital format biology course textbooks for 5th, 7th, and 8th grades, the researchers faced some challenges regarding the rest of all books, existed only in printed hardcopies. So, the calculation of terms frequency turned into quite a laborand time-consuming work. Although such software as Acrobat Reader has the function of text recognition, the process of one textbooks scanning took about one hour and a half. Moreover, Kazakh texts were poorly recognized as well as Russian and English pages scanned in somewhat sufficient quality. The researchers tried to count the words manually by looking through the books; but, it often led to mistakes caused by the attention distraction. Thus, the need for creating a comfortable and people-friendly method of implementing the qualitative
analysis of any printed texts built the foundation for the following study. Research Methods and Variables The quantitative and qualitative empirical research methods were applied in the study on the designing the brand-new and comfortable way of printed text mining was applied to the 7th, 8th, 9th, 10th, and 11th grades textbooks of natural course for Kazakhstani secondary schools. Free ten tools for text frequency analysis, both online and offline, have been examined and compared regarding their possibilities of the words and word combinations frequency.
1. www.statsoft.com, n.d. Text Mining (Big Data, Unstructured Data) [WWW Document]. Support Vector Machines (SVM). URL: http://www.statsoft.com/Textbook/Text-Mining#overview (accessed 5.17.18).
2. Kobayashi V.B., Berkers H.A., Mol S.T. 2017. Text Mining in Organizational Research [WWW Document]. Philosophy of the Social Sciences. URL: http://journals.sagepub.com/doi/full/10.1177/1094428117722619 (accessed 5.17.18).
3. Oficial'nyy sayt Parlamenta Respubliki Kazahstan [WWWDocument], n.d. [WWWDocument]. URL: http://www.parlam.kz/ru/presidend-speech/5 (accessed 5.16.18).
4. Semanticheskiy analiz teksta onlayn, seo-analiz teksta / Advego [WWW Document], n.d. [WWW Document]. Advego. URL: https://advego.com/text/seo/ (accessed 5.17.18).
5. Semanticheskiy naliz teksta onlayn [WWW Document], n.d. [WWW Document]. Semanticheskiy analiz teksta onlayn / istio.com - beloe SEO. URL: https://istio.com/rus/text/analyz/ (accessed 5.17.18).
6. wordTabulator [WWW Document], n.d. [WWW Document]. SourceForge. URL: http://wordtabulator.sourceforge.net/ (accessed 5.17.18).
7. Simagin, A., n.d. Semanticheskiy analiz teksta [WWW Document]. «Majento» - Prodvizhenie Web-proektov. URL: http://www.majento.ru/index.php?page=seo-analize/text-semantic/index (accessed 5.17.18).
8. SEO [WWW Document], n.d. [WWW Document]. 1Y.ru. URL: http://1y.ru/text.php (accessed 5.17.18).
9. Analiz teksta po zakonu Cipfa [WWW Document], n.d. [WWW Document]. PR-CY. URL: http://pr-cy.ru/zypfa/text (accessed 5.17.18).
10. Birzha kopiraytinga, proverka teksta na unikal'nost' [WWW Document], n.d. [WWW Document]. Text.ru. URL: https://text.ru/ (accessed 5.17.18).
11. [WWW Document], n.d. [WWW Document]. Semanticheskiy analiz teksta onlayn. URL: https://itop.media/tools.php?i=semantics (accessed 5.17.18).
12. TextAnalyzer: Universal'nyy analizator tekstov [WWW Document], n.d. [WWW Document]. Text Analyzer - Universal'nyy analizator teksta. URL: https://www.textanalyzer.ru/ (accessed 5.17.18).
13. Wladm, 2018. Lit Frequency Meter [WWW Document]. Software Informer. URL: http://litfrequencymeter.software.informer. com/5.2/ (accessed 5.17.18).
14. Speech to Text Online Notepad. Free [WWW Document], n.d. [WWW Document]. Speechnotes. URL: https://speechnotes.co/ (accessed 5.17.18).
15. ListNote Speech-to-Text Notes - Apps on Google Play [WWW Document], n.d. [WWW Document]. Google. URL: https:// play.google.com/store/apps/details?id=com.khymaera.android.listnotefree&hl=en (accessed 5.17.18).
16. Speech to Text Translator TTS - Apps on Google Play [WWW Document], n.d. [WWW Document]. Google. URL: https:// play.google.com/store/apps/details?id=com.fsm.speech2text&hl=en_US (accessed 5.17.18).
17. Voice Text - Apps on Google Play [WWW Document], n.d. [WWW Document]. Google. URL: https://play.google.com/store/apps/details?id=com.matthew.rice.voice.text&hl=en (accessed 5.17.18).