ГРНТИ 50.07 Теоретические основы вычислительной техники
ББК 3297 Вычислительная техника
Processing and text mining are becoming increasingly possible thanks to the development of computer technology, as well as the development of artificial intelligence (machine learning). This article describes approaches to the analysis of texts in natural language using methods of morphological, syntactic and semantic analysis. Morphological and syntactic analysis of the text is carried out using the Pullenti system, which allows not only to normalize words, but also to distinguish named entities, their characteristics, and relationships between them. As a result, a semantic network of related named entities is built, such as people, positions, geographical names, business associations, documents, education, dates, etc. The word2vec technology is used to identify semantic patterns in the text based on the joint occurrence of terms. The possibility of joint use of the described technologies is being considered.
intelligent text analysis, natural language, neural networks
1. Word2Vec: how to work with vector representations ofwords [Electronic resource]. //https://neurohive.io/ru/osnovy-data-science/word2vecvektornye-predstavlenija-slov-dlja-mashinnogoobuchenija/ (appeal date 08/04/2019).
2. Word2Vec Tutorial - The Skip-Gram Model [Electronicresource]. //http://mccormickml.com/2016/04/19/word2vec-tutorialthe-skip-gram-model/ (appeal date 08/04/2019).
3. Ali Ghodsi, Lec 13: Word2Vec Skip-Gram [Electronicresource]. //https://www.youtube.com/watch?v=GMCwS7tS5ZM/(appeal date 08/04/2019).
4. models.word2vec - Word2vec embeddings [Electronicresource]. //https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.Word2Vec/ (appeal date08/04/2019).
5. Zolotarev OV, Sharnin MM, Klimenko SV, Kuznetsov KISystem PullEnti - extracting information from naturallanguage texts and automated building of informationsystems // Proceedings of the International Conference.Situation centers and class 4i information and analyticalsystems for monitoring and security tasks. SCVRT2015-16, Pushchino, TsarGrad, November 21-24, 2015-2016,Pushchino, pp. 28-35.
6. Deep Contextualized Word Representations / MatthewPeters, Mark Neumann, Mohit Iyyer et al. // Proceedings ofthe 2018 Conference of the North American Chapter of theAssociation for Computational Linguistics: HumanLanguage Technologies. — Association for ComputationalLinguistics, 2018. — Pp. 2227–2237..
7. Zolotarev OV, MM Sharnin, S.V. Klimenko, A.G.Matskevich. Research of methods of automatic formationof associative-hierarchical portrait of the subject area //Bulletin of the Russian New University. Series ""Complexsystems: models, analysis and management."" - 2018. № 1.- p. 91 96.
8. Distributed Representations of Words and Phrases and theirCompositionality. / Tomas Mikolov, Ilya Sutskever, KaiChen et al. // NIPS / Ed. by Christopher J. C. Burges, L´eonBottou, Zoubin Ghahramani, Kilian Q. Weinberger. —2013. — Pp. 3111–3119.
9. Enriching Word Vectors with Subword Information / PiotrBojanowski, Edouard Grave, Armand Joulin, TomasMikolov // Transactions of the Association forComputational Linguistics. — 2017. — Vol. 5. — Pp. 135–146.
10. Enriching Word Vectors with Subword Information /Piotr Bojanowski, Edouard Grave, Armand Joulin, TomasMikolov // Transactions of the Association forComputational Linguistics. — 2017. — Vol. 5. — Pp. 135–146.