The application without knowledge of an ontological type allows updating considerably quality of problem solutions in natural language processing. A number of researchers use Wikipedia as a basis for the formation of such resources. This paper reports the formalization method of Wikipedia structures and linguistic ontology used in the developed by the authors system of the linguistic ontology formation a specified subject field from Wikipedia. The papers and references connecting them serve a purpose for formation of a weighted graph of ontology to the graph nodes correspond notions, and to the ribs of graph – fuzzy semantic relations between them. The references obtain different weights depending on entering this or that information unit on a page. By a graph of relations it is possible to estimate numerically the degree of semantic proximity of two arbitrary concepts. For this purpose it is possible to use different measures of semantic proximity. Recursive measures possess considerable computational complexity at insignificant improvement of quality in test problem solution in comparison with nonrecursive local measures of the Dice measure type that is unacceptable for the ontology large enough. From these considerations the Dice weighted measure is chosen as a basic one for the system under development.
linguistic ontology, lexical ontology, automated formation of ontology, ontology learning, Wikipedia, fuzzy semantic relations, semantic proximity
1. Janik, M. Trainingless ontologybased text categorization : PhD diss. / Maciej Janik. - University of Georgia, 2008. - 150 p.
2. Syed, Z. S. Wikipedia as an Ontology for Describing Documents / Z. S. Syed, T. Finin, A. Joshi // Proceedings of the Second International Conference on Weblogs and Social Media. - 2008. - P. 136-144.
3. Dobrov, B.V. Linguistic ontology on natural sciences and techniques for applications in the field of information retrieval / B.V. Dobrov, N.V. Lukashevich // The 10-th National Conf. on Artificial Intelligence with International Participation. - 2006. - pp. 489-497.
4. Lukashevich, N.V. Models and Methods for Automated Processing Non-Structured Information Based on Knowledge of Ontological Type: Thesis for D.Eng. Degree / N.V. Lukashevich. - М., 2014. - pp. 312.
5. Cimiano, P. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications / Philipp Cimiano. - Springer US, 2006.
6. Turdakov, D.Yu. Texterra: Infrastructure for TYext Analysis / D.Yu. Turdakov [et al.] // Proceedings of the Institute of System Programming RAS. - 2014. - Vol. 26. - № 1. - pp. 421-440.
7. Turdakov, D. Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation / D. Turdakov, P. Velikhov // In proceedings of the SYRCoDIS´2008. - 2008.
8. Varlamov, M.I. Computation of concept semantic proximity on basis of the shortest ways in references graph of Wikipedia / M.I. Varlamov, A.V. Korshunov // Proceedings of the Conf. IOI-2014: Intellectualization of Information Processing (October 5-10, 2014, Greece). - 2014. - pp. 1107-1125.
9. Fuzzy Multitudes in Models of Control and Artificial Intelligence / under the editorship of D.A. Pospelov. - М.: Science, General Editorship of Phys.-Math. Lit., 1986. - pp. 312.
10. Velikhov, P.E. Measures of semantic proximity of Wikipedia entries and their application at text processing / P.E. Velikhov // Information Technologies and Computer Systems. - 2009. - №. 1. - pp. 23-37.