Identification of Semantic Patterns in Full-text Documents Using Neural Network Methods

Золотарев Олег; Соломенцев Ярослав; Хакимова Аида; Шарнин Михаил Михайлович

doi:doi:10.30987/graphicon-2019-2-276-279

Identification of Semantic Patterns in Full-text Documents Using Neural Network Methods

Отправить рукопись

Цитировать

IDENTIFICATION OF SEMANTIC PATTERNS IN FULL-TEXT DOCUMENTS USING NEURAL NETWORK METHODS

Секция: ИСКУССТВЕННЫЙ ИНТЕЛЛЕКТ, КОГНИТИВНЫЕ ТЕХНОЛОГИИ, АВТОМАТИЗАЦИЯ И РОБОТОТЕХНИКА

Сборник: ТРУДЫ КОНФЕРЕНЦИИ ГРАФИКОН-2019. ТОМ 2

ГРНТИ 20.01 Общие вопросы информатики ГРНТИ 50.07 Теоретические основы вычислительной техники

ББК 3297 Вычислительная техника

Золотарев Олег ¹

Соломенцев Ярослав ²

Хакимова Аида ³

Шарнин Михаил Михайлович ⁴

Информация об авторах и публикации

Авторы:

1. Russian New University
Россия

2. Moscow Institute of Physics and Technology
Россия

3. Research Center for Physical and Technical Informatics
Россия

4. FRC CSC of the Russian Academy of Sciences
Россия

Тип:

Статья конференции

DOI:

https://doi.org/10.30987/graphicon-2019-2-276-279

Опубликовано:

20.11.2019

Классификаторы:

ГРНТИ 20.01 Общие вопросы информатики
ГРНТИ 50.07 Теоретические основы вычислительной техники
ББК 3297 Вычислительная техника

Язык материала:

английский

Ключевые слова:

intelligent text analysis, natural language, neural networks

Аннотация и ключевые слова

Аннотация (русский):
Processing and text mining are becoming increasingly possible thanks to the development of computer technology, as well as the development of artificial intelligence (machine learning). This article describes approaches to the analysis of texts in natural language using methods of morphological, syntactic and semantic analysis. Morphological and syntactic analysis of the text is carried out using the Pullenti system, which allows not only to normalize words, but also to distinguish named entities, their characteristics, and relationships between them. As a result, a semantic network of related named entities is built, such as people, positions, geographical names, business associations, documents, education, dates, etc. The word2vec technology is used to identify semantic patterns in the text based on the joint occurrence of terms. The possibility of joint use of the described technologies is being considered.

Ключевые слова:
intelligent text analysis, natural language, neural networks

Список литературы

1. Word2Vec: how to work with vector representations ofwords [Electronic resource]. //https://neurohive.io/ru/osnovy-data-science/word2vecvektornye-predstavlenija-slov-dlja-mashinnogoobuchenija/ (appeal date 08/04/2019).

2. Word2Vec Tutorial - The Skip-Gram Model [Electronicresource]. //http://mccormickml.com/2016/04/19/word2vec-tutorialthe-skip-gram-model/ (appeal date 08/04/2019).

3. Ali Ghodsi, Lec 13: Word2Vec Skip-Gram [Electronicresource]. //https://www.youtube.com/watch?v=GMCwS7tS5ZM/(appeal date 08/04/2019).

4. models.word2vec - Word2vec embeddings [Electronicresource]. //https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.Word2Vec/ (appeal date08/04/2019).

5. Zolotarev OV, Sharnin MM, Klimenko SV, Kuznetsov KISystem PullEnti - extracting information from naturallanguage texts and automated building of informationsystems // Proceedings of the International Conference.Situation centers and class 4i information and analyticalsystems for monitoring and security tasks. SCVRT2015-16, Pushchino, TsarGrad, November 21-24, 2015-2016,Pushchino, pp. 28-35.

6. Deep Contextualized Word Representations / MatthewPeters, Mark Neumann, Mohit Iyyer et al. // Proceedings ofthe 2018 Conference of the North American Chapter of theAssociation for Computational Linguistics: HumanLanguage Technologies. - Association for ComputationalLinguistics, 2018. - Pp. 2227-2237..

7. Zolotarev OV, MM Sharnin, S.V. Klimenko, A.G.Matskevich. Research of methods of automatic formationof associative-hierarchical portrait of the subject area //Bulletin of the Russian New University. Series ""Complexsystems: models, analysis and management."" - 2018. № 1.- p. 91 96.

8. Distributed Representations of Words and Phrases and theirCompositionality. / Tomas Mikolov, Ilya Sutskever, KaiChen et al. // NIPS / Ed. by Christopher J. C. Burges, L´eonBottou, Zoubin Ghahramani, Kilian Q. Weinberger. -2013. - Pp. 3111-3119.

9. Enriching Word Vectors with Subword Information / PiotrBojanowski, Edouard Grave, Armand Joulin, TomasMikolov // Transactions of the Association forComputational Linguistics. - 2017. - Vol. 5. - Pp. 135-146.

10. Enriching Word Vectors with Subword Information /Piotr Bojanowski, Edouard Grave, Armand Joulin, TomasMikolov // Transactions of the Association forComputational Linguistics. - 2017. - Vol. 5. - Pp. 135-146.

Отправить рукопись

Цитировать

Цитирований:

Подтверждение

Регистрация