A novel word embedding based stemming approach for microblog retrieval during disasters

Basu, Moumita; Roy, Anurag; Ghosh, Kripabandhu; Bandyopadhyay, Somprakash; Ghosh, Saptarshi

Please use this identifier to cite or link to this item: https://ir.iimcal.ac.in:8443/jspui/handle/123456789/1685

Full metadata record

DC Field	Value	Language
dc.contributor.author	Basu, Moumita
dc.contributor.author	Roy, Anurag
dc.contributor.author	Ghosh, Kripabandhu
dc.contributor.author	Bandyopadhyay, Somprakash
dc.contributor.author	Ghosh, Saptarshi
dc.date.accessioned	2021-08-26T06:23:44Z	-
dc.date.available	2021-08-26T06:23:44Z	-
dc.date.issued	2017
dc.identifier.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85017930397&doi=10.1007%2f978-3-319-56608-5_53&partnerID=40&md5=bd2df865f16c09422df202c7fb94b8d7
dc.identifier.uri	https://ir.iimcal.ac.in:8443/jspui/handle/123456789/1685	-
dc.description	Basu, Moumita, Indian Institute of Engineering Science and Technology, Shibpur, India, Indian Institute of Management, Calcutta, India; Roy, Anurag, Indian Institute of Engineering Science and Technology, Shibpur, India; Ghosh, Kripabandhu, Indian Institute of Technology, Kanpur, India; Bandyopadhyay, Somprakash, Indian Institute of Management, Calcutta, India; Ghosh, Saptarshi, Indian Institute of Engineering Science and Technology, Shibpur, India, Indian Institute of Technology, Kharagpur, India
dc.description	ISSN/ISBN - 3029743
dc.description	pp.589-597
dc.description	DOI - 10.1007/978-3-319-56608-5_53
dc.description.abstract	IR methods are increasingly being applied over microblogs to extract real-time information, such as during disaster events. In such sites, most of the user-generated content is written informally – the same word is often spelled differently by different users, and words are shortened arbitrarily due to the length limitations on microblogs. Stemming is a common step for improving retrieval performance by unifying different morphological variants of a word. In this study, we show that rule-based stemming meant for formal text often cannot capture the arbitrary variations of words in microblogs. We propose a context-specific stemming algorithm, based on word embeddings, which can capture many more variations of words than what can be detected by conventional stemmers. Experiments on a large set of English microblogs posted during a recent disaster event shows that, the proposed stemming gives considerably better retrieval performance compared to Porter stemming. © Springer International Publishing AG 2017.
dc.publisher	SCOPUS
dc.publisher	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.publisher	Springer Verlag
dc.relation.ispartofseries	10193 LNCS
dc.subject	Disasters
dc.subject	Microblog retrieval
dc.subject	Stemming
dc.subject	Word embedding
dc.subject	Word2vec
dc.title	A novel word embedding based stemming approach for microblog retrieval during disasters
dc.type	Conference Paper
Appears in Collections:	Management Information Systems

Files in This Item:

There are no files associated with this item.

Show simple item record