Please use this identifier to cite or link to this item: https://ir.iimcal.ac.in:8443/jspui/handle/123456789/1670
Title: Utilizing online social media for disaster relief: Practical challenges in retrieval
Authors: Basu, Moumita
Keywords: Disaster Relief
Microblogs
Online social media
Issue Date: 2017
Publisher: SCOPUS
SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
Association for Computing Machinery, Inc
Abstract: In recent years, several disaster events (e.g., earthquakes in Nepal-India and Italy, terror attacks in Paris and Brussels) have proven the crucial role of Online Social Media (OSM) in providing actionable situational information. However, in such media, the crucial information is typically obscured by a lot of insignificant information (e.g., personal opinions, prayers for victims). Moreover, when time is critical, owing to the rapid speed and huge volume of microblogs, it is infeasible for human subjects to go through all the tweets posted. Hence, automated IR methods are needed to extract the relevant information from the deluge of posts. Though several methodologies have been developed for tasks like classification, summarization, etc. of social media data posted during disasters [5], there are still several research challenges that need to be addressed for effectively utilising social media data (e.g., microblogs) for aiding disaster relief operations. Research challenges: We have identified the following challenges in developing IR systems for OSM text posted during disasters. (i) Dealing with noisy vocabulary of OSM content: Microblogs often contains various spellings of the same word, that includes both English words (like 'epicentre' and 'epicenter') as well as non-English words (like 'gurudwara' and 'gurdwara'). Moreover, on microblogging sites like Twitter, words are often shortened arbitrarily owing to the strict restriction on the length of microblogs (e.g., 'operation' shortened to 'oper' or 'ops'). Hence IR methodologies need to be able to handle such arbitrary variations in spelling. (ii) Need for improved models for retrieval and ranking: The context and vocabulary of the microblogs are time-variant. For example, at the initial phase of the 2015 Nepal-India earthquake, terms like 'send' and 'airlifted' were being used to indicate availability of resources (e.g. "Haryana Govt. airlifted 20,000 food packets"). However, in the later phases, terms like 'distribute' and 'reach' were used to describe the availability of resources (e.g. "ISKON Kathmandu distributes food"). Thus retrieval schemes need to dynamically adapt to the fast-changing context and vocabulary for effective retrieval from OSM. (iii) Need for better evaluation measures: From our discussions with the NGOs who participate in relief works, we understood that a binary notion of relevance of OSM posts is not apt in disaster scenarios. For example, consider the two tweets "Urgent : Blood shortagein Nepal. For Blood donation call Dr. Manita at mobil-number at Nepal Red Cross" and "Lack of blood in blood bank in Nepal. @BBCWorld @ibnlive". Both the tweets convey the need of a resource (here, blood), however the first one is much more important and actionable than the second one. Developing benchmark collections containing such graded relevance is also a challenge. (iv) Need for retrieval from multiple sources: When a disaster strikes, responding authorities need to identify actionable information from multiple sources such as crowdsourced data from Twitter, Facebook, closed group communication data like WhatsApp chats among relief workers, and so on, for a comprehensive analysis of the situation. Further, data from different sources can be in different languages, e.g., English, and local languages of the region where the disaster has occurred. Hence, developing a common IR framework to extract data from such heterogeneous data sources is a challenge that is yet to be addressed. Work done: The main objective of this PhD work is to address the aforementioned practical IR problems. As initial efforts towards understanding the challenges, we have developed two datasets for retrieval and summarisation of microblogs related to disasters, and made them available to the research community [3, 4]. We have also developed a context-specific stemming algorithm for noisy microblogs, that enabled significantly better retrieval of microblogs (in English) as compared to the well-known Porter stemmer [2]. In another ongoing work, we have proposed word embedding based techniques for identifying tweets informing about resource needs / availabilities [1] - our proposed methodologies outperform prior pattern matching based techniques. We will continue to improve our proposed techniques in future. © 2017 ACM.
Description: Basu, Moumita, Department of CST, Indian Institute of Engineering Science and Technology Shibpur, India, Social Informatics Research Group, Indian Institute of Management Calcutta, India
pp.1385-
DOI - 10.1145/3077136.3084160
URI: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85029384850&doi=10.1145%2f3077136.3084160&partnerID=40&md5=ae546cce1eb67ea96070bc749fedc382
https://ir.iimcal.ac.in:8443/jspui/handle/123456789/1670
Appears in Collections:Management Information Systems

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.