Automatic Detection and Forecasting of Violent Extremist Cyber-Recruitment

Author:
Scanlon, Jacob, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Advisors:
Gerber, Matthew, Department of Systems and Information Engineering, University of Virginia
Brown, Donald, Department of Systems and Information Engineering, University of Virginia
Abstract:

Growing use of the Internet as a major means of communication has led to the formation of cyber-communities, which have become increasingly appealing to violent extremists due to the unregulated nature of Internet communication. Online communities enable violent extremists to increase recruitment by allowing them to build personal relationships with a worldwide audience capable of accessing uncensored content. This research presents methods for identifying and forecasting the recruitment activities of violent groups within extremist social media websites. Specifically, these methods employ techniques within supervised learning and natural language processing for automatically: (1) identifying forum posts intended to recruit new violent extremist members, and (2) forecasting recruitment efforts by tracking changes in an online community's discussion over time. We used data from the western jihadist website Ansar AlJihad Network, which was compiled by the University of Arizona's Dark Web Project. Multiple judges manually annotated a sample of these data, marking 192 randomly sampled posts as recruiting (Yes) or non-recruiting (No). We observed significant agreement between the judges' labels; the confidence interval of Cohen's kappa was (0.5,0.9) at p=0.01. We used naive Bayes models, logistic regression, classification trees, boosting, and support vector machines (SVM) to classify the forum posts in a 10-fold cross-validation experimental setup. Evaluation with receiver operating characteristic (ROC) curves shows that our SVM classifier achieves 89% area under the curve (AUC), a significant improvement over the 63% AUC performance achieved by our simplest naive Bayes model (Tukey's test at p=0.05). The forecasting task uses time series regression analysis to model the daily count of extremist recruitment posts. Evaluation with mean absolute scaled error (MASE) shows that employing latent topics as predictors can reduce forecast error compared to a naive (random-walk) model and the baseline time series model. To our knowledge, these are the first results reported on these tasks, and our analysis indicates that automatic detection and forecasting of online terrorist recruitment are feasible tasks. This research could ultimately help identify the impact of violent organizations, like terrorist groups, within the social network of an online community. There are also a number of important areas of future work including classifying non-English posts and measuring how recruitment posts and current events change membership numbers over time.

Degree:
MS (Master of Science)
Keywords:
text mining, machine learning, natural language processing, darkweb, recruitment, forecasting, extremist, cyber, classification, terrorism
Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2014/04/28