Using term extraction patterns to discover coherent relationships from open source intelligence

William L. Sousan, Qiuming Zhu, Robin Gandhi, William Mahoney, Anup Sharma

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Unstructured open source information, especially the social, political, economic and cultural events described within web-based text/news articles, often contain possible motives for cyber security and trust issues. Automated processing of numerous open source intelligence sources requires the discovery of key domain terms, their conceptual hierarchies and the coherent relationships among them. A syntactic analysis of the word sequences in unstructured text documents allows for the extraction of subject-predicate-object triples, which form the basis for Term Extraction Patterns (TEP). In our research, we use TEPs to discover domain-specific multi-word entities which in turn, can be arranged in a taxonomy based on their semiotic inter-relationships. We explore the use of this method within the cyber security domain and analyze a collection of related news articles gathered from various public web sources. In this paper our initial results of term extraction and the semantic coherence derived from the TEP analyses are described. Our work extends beyond current methods, and our contribution is a novel methodology to extract semantics from unstructured text in domain specific open source information and its application to predict cyber attack outbreaks.

Original languageEnglish (US)
Title of host publicationProceedings - SocialCom 2010
Subtitle of host publication2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust
Pages967-972
Number of pages6
DOIs
StatePublished - Nov 29 2010
Event2nd IEEE International Conference on Social Computing, SocialCom 2010, 2nd IEEE International Conference on Privacy, Security, Risk and Trust, PASSAT 2010 - Minneapolis, MN, United States
Duration: Aug 20 2010Aug 22 2010

Publication series

NameProceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust

Conference

Conference2nd IEEE International Conference on Social Computing, SocialCom 2010, 2nd IEEE International Conference on Privacy, Security, Risk and Trust, PASSAT 2010
CountryUnited States
CityMinneapolis, MN
Period8/20/108/22/10

Fingerprint

Semantics
Semiotics
Taxonomies
Syntactics
Economics
Processing

Keywords

  • Conceptualization
  • Open source intelligence
  • Semantic relevance
  • Term extraction
  • Term extraction patterns

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems

Cite this

Sousan, W. L., Zhu, Q., Gandhi, R., Mahoney, W., & Sharma, A. (2010). Using term extraction patterns to discover coherent relationships from open source intelligence. In Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust (pp. 967-972). [5591400] (Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust). https://doi.org/10.1109/SocialCom.2010.143

Using term extraction patterns to discover coherent relationships from open source intelligence. / Sousan, William L.; Zhu, Qiuming; Gandhi, Robin; Mahoney, William; Sharma, Anup.

Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust. 2010. p. 967-972 5591400 (Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sousan, WL, Zhu, Q, Gandhi, R, Mahoney, W & Sharma, A 2010, Using term extraction patterns to discover coherent relationships from open source intelligence. in Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust., 5591400, Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust, pp. 967-972, 2nd IEEE International Conference on Social Computing, SocialCom 2010, 2nd IEEE International Conference on Privacy, Security, Risk and Trust, PASSAT 2010, Minneapolis, MN, United States, 8/20/10. https://doi.org/10.1109/SocialCom.2010.143
Sousan WL, Zhu Q, Gandhi R, Mahoney W, Sharma A. Using term extraction patterns to discover coherent relationships from open source intelligence. In Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust. 2010. p. 967-972. 5591400. (Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust). https://doi.org/10.1109/SocialCom.2010.143
Sousan, William L. ; Zhu, Qiuming ; Gandhi, Robin ; Mahoney, William ; Sharma, Anup. / Using term extraction patterns to discover coherent relationships from open source intelligence. Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust. 2010. pp. 967-972 (Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust).
@inproceedings{9a53502c10334a76a352348b59aef29f,
title = "Using term extraction patterns to discover coherent relationships from open source intelligence",
abstract = "Unstructured open source information, especially the social, political, economic and cultural events described within web-based text/news articles, often contain possible motives for cyber security and trust issues. Automated processing of numerous open source intelligence sources requires the discovery of key domain terms, their conceptual hierarchies and the coherent relationships among them. A syntactic analysis of the word sequences in unstructured text documents allows for the extraction of subject-predicate-object triples, which form the basis for Term Extraction Patterns (TEP). In our research, we use TEPs to discover domain-specific multi-word entities which in turn, can be arranged in a taxonomy based on their semiotic inter-relationships. We explore the use of this method within the cyber security domain and analyze a collection of related news articles gathered from various public web sources. In this paper our initial results of term extraction and the semantic coherence derived from the TEP analyses are described. Our work extends beyond current methods, and our contribution is a novel methodology to extract semantics from unstructured text in domain specific open source information and its application to predict cyber attack outbreaks.",
keywords = "Conceptualization, Open source intelligence, Semantic relevance, Term extraction, Term extraction patterns",
author = "Sousan, {William L.} and Qiuming Zhu and Robin Gandhi and William Mahoney and Anup Sharma",
year = "2010",
month = "11",
day = "29",
doi = "10.1109/SocialCom.2010.143",
language = "English (US)",
isbn = "9780769542119",
series = "Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust",
pages = "967--972",
booktitle = "Proceedings - SocialCom 2010",

}

TY - GEN

T1 - Using term extraction patterns to discover coherent relationships from open source intelligence

AU - Sousan, William L.

AU - Zhu, Qiuming

AU - Gandhi, Robin

AU - Mahoney, William

AU - Sharma, Anup

PY - 2010/11/29

Y1 - 2010/11/29

N2 - Unstructured open source information, especially the social, political, economic and cultural events described within web-based text/news articles, often contain possible motives for cyber security and trust issues. Automated processing of numerous open source intelligence sources requires the discovery of key domain terms, their conceptual hierarchies and the coherent relationships among them. A syntactic analysis of the word sequences in unstructured text documents allows for the extraction of subject-predicate-object triples, which form the basis for Term Extraction Patterns (TEP). In our research, we use TEPs to discover domain-specific multi-word entities which in turn, can be arranged in a taxonomy based on their semiotic inter-relationships. We explore the use of this method within the cyber security domain and analyze a collection of related news articles gathered from various public web sources. In this paper our initial results of term extraction and the semantic coherence derived from the TEP analyses are described. Our work extends beyond current methods, and our contribution is a novel methodology to extract semantics from unstructured text in domain specific open source information and its application to predict cyber attack outbreaks.

AB - Unstructured open source information, especially the social, political, economic and cultural events described within web-based text/news articles, often contain possible motives for cyber security and trust issues. Automated processing of numerous open source intelligence sources requires the discovery of key domain terms, their conceptual hierarchies and the coherent relationships among them. A syntactic analysis of the word sequences in unstructured text documents allows for the extraction of subject-predicate-object triples, which form the basis for Term Extraction Patterns (TEP). In our research, we use TEPs to discover domain-specific multi-word entities which in turn, can be arranged in a taxonomy based on their semiotic inter-relationships. We explore the use of this method within the cyber security domain and analyze a collection of related news articles gathered from various public web sources. In this paper our initial results of term extraction and the semantic coherence derived from the TEP analyses are described. Our work extends beyond current methods, and our contribution is a novel methodology to extract semantics from unstructured text in domain specific open source information and its application to predict cyber attack outbreaks.

KW - Conceptualization

KW - Open source intelligence

KW - Semantic relevance

KW - Term extraction

KW - Term extraction patterns

UR - http://www.scopus.com/inward/record.url?scp=78649297645&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78649297645&partnerID=8YFLogxK

U2 - 10.1109/SocialCom.2010.143

DO - 10.1109/SocialCom.2010.143

M3 - Conference contribution

SN - 9780769542119

T3 - Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust

SP - 967

EP - 972

BT - Proceedings - SocialCom 2010

ER -