OpenIDEO is an open innovation platform. Join our global community to solve big challenges for social good. Sign up, Login or Learn more

CrisisTracker: Real-time Social Media Curation

CrisisTracker is a web platform that extracts situation awareness reports from public tweets during humanitarian disasters. It combines automated processing with crowdsourcing to quickly detect new events and bring together related evidence.

Photo of Jakob Rogstadius
60 39

Written by

=== OPEN QUESTIONS ===

The proposed open-source system (see below) has been used in practice for conflict monitoring, but several research/design/development challenges remain. if you want to contribute, please post your thoughts below regarding the following questions.

I (Jakob) would also love to brainstorm about these challenges in a Skype call (jakob.rogstadius), in particular if you have decision making or analyst experience related to conflict monitoring or intervention.

What information actually leads to action in the domain of conflict monitoring and prevention?

Neither technology nor information automatically leads to action and there are several cases where atrocities during civil wars have been well known, but no action was taken to prevent them (e.g. Rwanda). I can also imagine cases where a summary of raw reports with limited context or explanation can actually trigger new violence. What information should a system like this provide to be meaningful and to lead to positive change?

Can open-access information management systems improve the safety of regular citizens or help them contribute to peace?

Conflict monitoring and atrocity prevention are traditionally approached from a top-down perspective. The role of information management systems targeted at expert analysts is well established,but can such system also be used to empower bottom-up efforts? Rather than discussing what information should be hidden from the public, is there any information that can help improve individual safety, or promote mindsets that lead to conflict reduction and long-term stability in conflict zones?

What decision making processes should be supported?

I am primarily a software engineer and I need to know more about the specific decisions that are made by decision makers in peacekeeping and conflict monitoring situations. What decisions need to be taken, when, and what information is required to make those decisions? This knowledge is extremely helpful to make design trade-offs and to prioritize different features in the system.

How can a crowd assist decision makers with meaningful analysis?

Evaluation of the system has shown that both volunteers who curate content and decision makers who wish to consume the information would prefer that volunteers work with more complex reasoning tasks, rather than just data annotation. What decisions are frequent and important enough that it would be efficient to offload the required data analysis to a skilled or semi-skilled crowd? How can volunteers sufficiently share their evidence, reasoning and conclusions with decision makers for their work to be trusted?

What quantitative indicators are needed?

Good quantitative indicators (things that can be measured numerically) are required to provide meaningful time series, and to rank content by 'importance'. However, when the raw data is social media content, it is very difficult to extract traditional quantitative indicators such as the number of affected unemployed women in rural areas. Other quantitative metrics such as the number of messages or the number of unique people discussing the event are readily available, but are far less meaningful. It's clear that some form of quantifier needs to be extracted, but what low-hanging fruit should we aim for to still provide helpful time series? A rough estimate of the number of people affected (1s, 10s, 100s, 1000s) per event? Number of new events of type X per day? Number of people discussing any event of type X per day?

What are the ethical implications of a system like this, in particular in conflict situations?

I believe sources are sufficiently protected, but do others agree with me? What if the system collects information that mostly benefits one side in the conflict? Are there any (new) risks that this tool introduces into decision making processes, or does this tool simply require the same skepticism as any other source?


=== CONCEPT DESCRIPTION - CRISIS TRACKER ===

During conflicts in recent years, online social media (mainly Twitter, Facebook and YouTube) has emerged as a means for conflict affected local populations to communicate their experiences to the world. With increasing technology adoption and free access to posted messages, online social media can now be used to leverage the reporting capacity of thousands or millions of people on the ground for large-scale real-time distributed sensing.

The Twitter microblogging service saw 500 million tweets being posted daily in October 2012, by over 200 million active users. Unlike for instance Facebook and SMS, the vast majority of these tweets is shared publicly and can be accessed in real-time though an application programming interface (API). The challenge however is sense-making. With so much content being generated, maintaining overview and history, and detecting patterns and actionable information, requires specialized information management tools.

CrisisTracker is an open-source online webplatform developed primarily by me during my PhD studies, which adds structureto millions of reports already available on Twitter. This additional layer ofstructure helps reduce information overload, making it much easier to use socialmedia as a rich source for real-time situational awareness.

CrisisTracker infers structure by makinguse of the repetition that occurs when multiple people independently reportimpactful events, in two ways. First, the greater the number of people thattalk about an event, the more likely that event is to be of interest to asystem user. This is not a perfect indicator,but with far more information being collected than what can be consumed, havingsuch a metric is critical. Second, the CrisisTracker platform uses an automatedreal-time clustering algorithm to group together tweets that are textually verysimilar. A cluster of messages (a “story”) typically refers to a singlewell-defined event, such as an attack on a protected object, artillery shellingof a location, a bombing, etc. Although individual tweets are both extremelybrief (up to 140 characters) and difficult to verify independently, stories inCrisisTracker capture the event from multiple viewpoints and provide areal-time index of published evidence in the form of images, video and newsarticles.

After reports have been clustered, theplatform uses crowdsourcing techniques to extract structured meta-data (type ofevent, geographic location and named entities) from the stories, which improvesthe quality of search and filtering in the system.

How does your idea gather AND verify information? How does your idea keep those who use it safe?

CrisisTracker currently uses only publicly available information posted on the Twitter microblogging service. Thus, as the information producers publish their reports knowing that the content will be accessible to anyone, they already need to take necessary precautions to not reveal information which they themselves consider sensitive. Unlike for instance Facebook, it is also easy to use Twitter anonymously.

CrisisTracker cannot automatically verify information. However, as the system clusters information into stories, it becomes possible to compare different tweets that talk about the same event. Disaster response experts who used the system have described how this makes it much easier to compare different versions of a story to make a more nuanced assessment of a situation, and to compare the available evidence for or against each claim.

The idea is not to replace existing conflict monitoring techniques with CrisisTracker, but rather to use the system as a complement. For instance, by providing cheap real-time country-wide monitoring of activities, the system can enable more accurate and earlier allocation of scarce resources such as trained observers and high-resolution satellites. The system also offers visibility into areas where no organizational presence can be maintained on the ground.

How might your idea be designed to scale and spread to help as many people as possible?

As CrisisTracker taps into information that is already being produced by affected populations, the data collection itself is inherently scalable. Detection and prioritizations of stories (tweet clusters) is fully automated, but if the system is not powered by a crowd of volunteer curators, search and filtering is limited to keywords and time.
If a pool of 10-100 human curators can be maintained, each working around 30 minutes per day, then the platform allows search and filtering also by geographic location, event type and named entities.

Although the system is capable of directing human curators to work on the most important stories, curator availability is an issue during prolonged crisis such as conflicts. I am therefore currently working on extending the platform with supervised machine learning algorithms (using the AIDR tool that I've helped develop at QCRI), so that the system can generalize the human curation behavior into event-specific rules. Such rules can then be applied instantly for each newly detected story to classify information at much greater scale. I also hope to integrate existing algorithms for automatic location extraction, to further reduce the dependence on human curators.

Once topic extraction and geo-location are both in place, the next step is to transform the data to work with structured events rather than clusters of textually similar messages. This will make it possible for example to graph the number of clashes between protesters and security forces in different areas over time, the number of people killed, or to alert when previously unseen types of events are detected in new locations. Current funding is however exhausted and I now rely on my current employer to let me work on projects that I can then integrate into the platform. With independent funding, progress towards this goal would be more direct.

How could you begin prototyping this idea in a simple way to begin testing and refining it? Who would use your idea and/or who is using it now? Is your idea technically easy medium or hard to implement?

An early version of the system has been deployed since April 2012 to track Syrian civil war, and is now in daily use by Syria Tracker to complement their network of eyewitnesses and their monitoring of mainstream media. According to Syria Tracker, this is the first system that successfully gives them a sense of overview of the social media space. Qualitative and quantitative evaluation has revealed that the system is capable of directing users’ attention to impactful events within 30 minutes of the first tweet being posted, whereas mainstream media often takes several hours. This is a median time, with instantaneous events such as bombings being detected quicker and armed clashes and political events gathering momentum more slowly.

Due to limited availability of human curators, Syria Tracker has however only been using the fully automated features in the system. This is why learning classifiers and increased automation are such important extensions.

An incident commander who tested the Syria deployment of the system stated that “I feel very confidently that those reports will come out ahead of CNN and BBC and that they will have the central nuggets of who, what, when, where, why. For an incident commander, it is the difference between learning something in 2-3 hours versus learning it in 6-8.” Furthermore, a GIS expert said that “you can see over a period of time where people are moving, how that relates to conflict areas. Water shortage, or food, you can almost anticipate where needs are going to be based on what you are seeing.”

For more information about this evaluation, please see http://hci.uma.pt/~jakob/files/Rogstadius_2013_CrisisTracker_Crowdsourced_Social_Media_Curation_for_Disaster_Awareness.pdf

Try the system at http://ufn.virtues.fi/crisistracker/

How is your idea adapted for conditions in hard-to-access areas, such as lack of internet and mobile access? Can users adopt it without much behavior change?

The system is only applicable when affected populations and the global community are actively using Twitter to discuss the conflict, as has been the case in particular in recent conflicts in North Africa and the Middle East. If Twitter has established a market presence, no further change of behavior is required in the monitored community. Conversely, any necessary marketing required to make the system work is already being handled indirectly by a leading major corporation in the social media space.

The technology behind CrisisTracker could likely be used with content from other online social networks or even SMS data, but I am not currently aware of any comparable source of openly available reports.

Evaluation results

16 evaluations so far

1. How scalable would this idea be across regions and cultures?

Looks like it’d be easy to spread across multiple regions and cultures - 50%

This idea could scale but it might need further iteration to make it widely relevant - 50%

Seems that this idea would best be suited for a single region/population - 0%

2. Would a lot of resources be required to create a pilot for this idea? (think time, capacity, money, etc)

This idea looks easy to pilot with minimal resources being invested - 31.3%

Feels like this idea could take a moderate amount of resources to pilot - 50%

Seems like piloting this idea would take a lot of resources - 18.8%

3. How suitable is this idea for various challenges on the ground such as lack of internet or mobile access?

Yep, it feels like it could work easily beyond internet or mobile access - 6.3%

Not so sure – it looks like it would require online or mobile connectivity - 25%

This idea definitely seems to rely on internet or mobile access - 68.8%

4. Could this idea put users or others at risk?

Nope, it looks like everyone would be safe - 6.3%

There are some potential concerns, but these could be addressed with further iteration - 87.5%

I can imagine some people being put at risk with this idea - 6.3%

5. Overall, how do you feel about this concept?

This idea rocked my world - 50%

I liked it but preferred others - 25%

It didn't get me overly excited - 25%

Attachments (1)

60 comments

Join the conversation:

Comment
Photo of Sidd
Team

The concept sounds good. The idea is great but needs some more thought. It seems to have some predispositions that may not be entirely true. Are you presuming that people during time of violence will have access to a desktop or even a technology such as a smart phone in these areas of violence? Who is going to end up using it?

There is too much information. I would recommend funneling down the idea into something as robust as Twitter. Twitter is simple. It allows you to Tweet using a phone. In your concept, however, I fail to see what a real-user would actually do. Do you have real use-cases and users who have tested your system?

Do you think this tracker can be adapted for any crisis such as Human Trafficking?

Great work though! This is an amazing start.

View all comments