Please use this identifier to cite or link to this item: https://ir.iimcal.ac.in:8443/jspui/handle/123456789/1685
Title: A novel word embedding based stemming approach for microblog retrieval during disasters
Authors: Basu, Moumita
Roy, Anurag
Ghosh, Kripabandhu
Bandyopadhyay, Somprakash
Ghosh, Saptarshi
Keywords: Disasters
Microblog retrieval
Stemming
Word embedding
Word2vec
Issue Date: 2017
Publisher: SCOPUS
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Springer Verlag
Series/Report no.: 10193 LNCS
Abstract: IR methods are increasingly being applied over microblogs to extract real-time information, such as during disaster events. In such sites, most of the user-generated content is written informally – the same word is often spelled differently by different users, and words are shortened arbitrarily due to the length limitations on microblogs. Stemming is a common step for improving retrieval performance by unifying different morphological variants of a word. In this study, we show that rule-based stemming meant for formal text often cannot capture the arbitrary variations of words in microblogs. We propose a context-specific stemming algorithm, based on word embeddings, which can capture many more variations of words than what can be detected by conventional stemmers. Experiments on a large set of English microblogs posted during a recent disaster event shows that, the proposed stemming gives considerably better retrieval performance compared to Porter stemming. © Springer International Publishing AG 2017.
Description: Basu, Moumita, Indian Institute of Engineering Science and Technology, Shibpur, India, Indian Institute of Management, Calcutta, India; Roy, Anurag, Indian Institute of Engineering Science and Technology, Shibpur, India; Ghosh, Kripabandhu, Indian Institute of Technology, Kanpur, India; Bandyopadhyay, Somprakash, Indian Institute of Management, Calcutta, India; Ghosh, Saptarshi, Indian Institute of Engineering Science and Technology, Shibpur, India, Indian Institute of Technology, Kharagpur, India
ISSN/ISBN - 3029743
pp.589-597
DOI - 10.1007/978-3-319-56608-5_53
URI: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85017930397&doi=10.1007%2f978-3-319-56608-5_53&partnerID=40&md5=bd2df865f16c09422df202c7fb94b8d7
https://ir.iimcal.ac.in:8443/jspui/handle/123456789/1685
Appears in Collections:Management Information Systems

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.