Do Machines Replicate Humans? Toward a Unified Understanding of Radicalizing Content on the Open Social Web

Research output: Contribution to journalArticle

Abstract

The advent of the Internet inadvertently augmented the functioning and success of violent extremist organizations. Terrorist organizations like the Islamic State in Iraq and Syria (ISIS) use the Internet to project their message to a global audience. The majority of research and practice on web-based terrorist propaganda uses human coders to classify content, raising serious concerns such as burnout, mental stress, and reliability of the coded data. More recently, technology platforms and researchers have started to examine the online content using automated classification procedures. However, there are questions about the robustness of automated procedures, given insufficient research comparing and contextualizing the difference between human and machine coding. This article compares output of three text analytics packages with that of human coders on a sample of one hundred nonindexed web pages associated with ISIS. We find that prevalent topics (e.g., holy war) are accurately detected by the three packages whereas nuanced concepts (Lone Wolf attacks) are generally missed. Our findings suggest that naïve approaches of standard applications do not approximate human understanding, and therefore consumption, of radicalizing content. Before radicalizing content can be automatically detected, we need a closer approximation to human understanding.

Original languageEnglish (US)
JournalPolicy and Internet
DOIs
StateAccepted/In press - Jan 1 2019

Fingerprint

Internet
Syria
Iraq
Websites
Propaganda
Organizations
burnout
propaganda
coding
Research
Research Personnel
Technology

Keywords

  • LIWC
  • counter terrorism
  • latent dirichlet allocation
  • n-grams
  • social media
  • violent extremist organizations

ASJC Scopus subject areas

  • Health(social science)
  • Public Administration
  • Health Policy
  • Computer Science Applications

Cite this

@article{e858d58ebebd4b67b6dbfc82ad35e296,
title = "Do Machines Replicate Humans? Toward a Unified Understanding of Radicalizing Content on the Open Social Web",
abstract = "The advent of the Internet inadvertently augmented the functioning and success of violent extremist organizations. Terrorist organizations like the Islamic State in Iraq and Syria (ISIS) use the Internet to project their message to a global audience. The majority of research and practice on web-based terrorist propaganda uses human coders to classify content, raising serious concerns such as burnout, mental stress, and reliability of the coded data. More recently, technology platforms and researchers have started to examine the online content using automated classification procedures. However, there are questions about the robustness of automated procedures, given insufficient research comparing and contextualizing the difference between human and machine coding. This article compares output of three text analytics packages with that of human coders on a sample of one hundred nonindexed web pages associated with ISIS. We find that prevalent topics (e.g., holy war) are accurately detected by the three packages whereas nuanced concepts (Lone Wolf attacks) are generally missed. Our findings suggest that na{\"i}ve approaches of standard applications do not approximate human understanding, and therefore consumption, of radicalizing content. Before radicalizing content can be automatically detected, we need a closer approximation to human understanding.",
keywords = "LIWC, counter terrorism, latent dirichlet allocation, n-grams, social media, violent extremist organizations",
author = "Margeret Hall and Michael Logan and Ligon, {Gina S.} and Derrick, {Douglas C.}",
year = "2019",
month = "1",
day = "1",
doi = "10.1002/poi3.223",
language = "English (US)",
journal = "Policy and Internet",
issn = "1944-2866",
publisher = "John Wiley and Sons Ltd",

}

TY - JOUR

T1 - Do Machines Replicate Humans? Toward a Unified Understanding of Radicalizing Content on the Open Social Web

AU - Hall, Margeret

AU - Logan, Michael

AU - Ligon, Gina S.

AU - Derrick, Douglas C.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - The advent of the Internet inadvertently augmented the functioning and success of violent extremist organizations. Terrorist organizations like the Islamic State in Iraq and Syria (ISIS) use the Internet to project their message to a global audience. The majority of research and practice on web-based terrorist propaganda uses human coders to classify content, raising serious concerns such as burnout, mental stress, and reliability of the coded data. More recently, technology platforms and researchers have started to examine the online content using automated classification procedures. However, there are questions about the robustness of automated procedures, given insufficient research comparing and contextualizing the difference between human and machine coding. This article compares output of three text analytics packages with that of human coders on a sample of one hundred nonindexed web pages associated with ISIS. We find that prevalent topics (e.g., holy war) are accurately detected by the three packages whereas nuanced concepts (Lone Wolf attacks) are generally missed. Our findings suggest that naïve approaches of standard applications do not approximate human understanding, and therefore consumption, of radicalizing content. Before radicalizing content can be automatically detected, we need a closer approximation to human understanding.

AB - The advent of the Internet inadvertently augmented the functioning and success of violent extremist organizations. Terrorist organizations like the Islamic State in Iraq and Syria (ISIS) use the Internet to project their message to a global audience. The majority of research and practice on web-based terrorist propaganda uses human coders to classify content, raising serious concerns such as burnout, mental stress, and reliability of the coded data. More recently, technology platforms and researchers have started to examine the online content using automated classification procedures. However, there are questions about the robustness of automated procedures, given insufficient research comparing and contextualizing the difference between human and machine coding. This article compares output of three text analytics packages with that of human coders on a sample of one hundred nonindexed web pages associated with ISIS. We find that prevalent topics (e.g., holy war) are accurately detected by the three packages whereas nuanced concepts (Lone Wolf attacks) are generally missed. Our findings suggest that naïve approaches of standard applications do not approximate human understanding, and therefore consumption, of radicalizing content. Before radicalizing content can be automatically detected, we need a closer approximation to human understanding.

KW - LIWC

KW - counter terrorism

KW - latent dirichlet allocation

KW - n-grams

KW - social media

KW - violent extremist organizations

UR - http://www.scopus.com/inward/record.url?scp=85073950145&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073950145&partnerID=8YFLogxK

U2 - 10.1002/poi3.223

DO - 10.1002/poi3.223

M3 - Article

AN - SCOPUS:85073950145

JO - Policy and Internet

JF - Policy and Internet

SN - 1944-2866

ER -