An approach for temporal analysis of email data based on segmentation

Parvathi Chundi, Mahadevan Subramaniam, Dileep K. Vasireddy

Research output: Contribution to journalArticle

8 Scopus citations

Abstract

Many kinds of information are hidden in email data, such as the information being exchanged, the time of exchange, and the user IDs participating in the exchange. Analyzing the email data can reveal valuable information about the social networks of a single user or multiple users, the topics being discussed, and so on. In this paper, we describe a novel approach for temporally analyzing the communication patterns embedded in email data based on time series segmentation. The approach computes egocentric communication patterns of a single user, as well as sociocentric communication patterns involving multiple users. Time series segmentation is used to uncover patterns that may span multiple time points and to study how these patterns change over time. To find egocentric patterns, the email communication of a user is represented as an item-set time series. An optimal segmentation of the item-set time series is constructed, from which patterns are extracted. To find sociocentric patterns, the email data is represented as an item-setgroup time series. Patterns involving multiple users are then extracted from an optimal segmentation of the item-setgroup time series. The proposed approach is applied to the Enron email data set, which produced very promising results.

Original languageEnglish (US)
Pages (from-to)1253-1270
Number of pages18
JournalData and Knowledge Engineering
Volume68
Issue number11
DOIs
Publication statusPublished - Nov 1 2009

    Fingerprint

Keywords

  • Clique pattern
  • Egocentric patterns
  • Item-set time series
  • Optimal segmentation
  • Segment difference
  • Segmentation difference

ASJC Scopus subject areas

  • Information Systems and Management

Cite this