Gray-box techniques for adversarial text generation

Prithviraj Dasgupta, Joseph Collins, Anna Buhman

Research output: Contribution to journalConference article

Abstract

We consider the problem of adversarial text generation in the context of cyber-security tasks such as email spam filtering and text classification for sentiment analysis on social media sites. In adversarial text generation, an adversary attempts to perturb valid text data to generate adversarial text such that the adversarial text ends up getting mis-classified by a machine classifier. Many existing techniques for perturbing text data use gradient-based or white-box methods, where the adversary observes the gradient of the loss function from the classifier for a given input sample, and uses this information to strategically select portions of the text to perturb. On the other hand, black-box methods where the adversary does not have access to the gradient of the loss function from the classifier and has to probe the classifier with different input samples to generate successful adversarial samples, have been used less often for generating adversarial text. In this paper, we integrate black-box methods where the adversary has a limited budget of the number of probes to the classifier, with white-box, gradient-based methods, and evaluate the effectiveness of the adversarially generated text in misleading a deep network classifier model.

Original languageEnglish (US)
Pages (from-to)17-23
Number of pages7
JournalCEUR Workshop Proceedings
Volume2269
StatePublished - Jan 1 2018
Event2018 AAAI Symposium on Adversary-Aware Learning Techniques and Trends in Cybersecurity, ALEC 2018 - Arlington, United States
Duration: Oct 18 2018Oct 20 2018

Fingerprint

Classifiers
Information use
Electronic mail

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Gray-box techniques for adversarial text generation. / Dasgupta, Prithviraj; Collins, Joseph; Buhman, Anna.

In: CEUR Workshop Proceedings, Vol. 2269, 01.01.2018, p. 17-23.

Research output: Contribution to journalConference article

Dasgupta, P, Collins, J & Buhman, A 2018, 'Gray-box techniques for adversarial text generation', CEUR Workshop Proceedings, vol. 2269, pp. 17-23.
Dasgupta, Prithviraj ; Collins, Joseph ; Buhman, Anna. / Gray-box techniques for adversarial text generation. In: CEUR Workshop Proceedings. 2018 ; Vol. 2269. pp. 17-23.
@article{95e19fba9acf492eaabd23e21ea20212,
title = "Gray-box techniques for adversarial text generation",
abstract = "We consider the problem of adversarial text generation in the context of cyber-security tasks such as email spam filtering and text classification for sentiment analysis on social media sites. In adversarial text generation, an adversary attempts to perturb valid text data to generate adversarial text such that the adversarial text ends up getting mis-classified by a machine classifier. Many existing techniques for perturbing text data use gradient-based or white-box methods, where the adversary observes the gradient of the loss function from the classifier for a given input sample, and uses this information to strategically select portions of the text to perturb. On the other hand, black-box methods where the adversary does not have access to the gradient of the loss function from the classifier and has to probe the classifier with different input samples to generate successful adversarial samples, have been used less often for generating adversarial text. In this paper, we integrate black-box methods where the adversary has a limited budget of the number of probes to the classifier, with white-box, gradient-based methods, and evaluate the effectiveness of the adversarially generated text in misleading a deep network classifier model.",
author = "Prithviraj Dasgupta and Joseph Collins and Anna Buhman",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
volume = "2269",
pages = "17--23",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "CEUR-WS",

}

TY - JOUR

T1 - Gray-box techniques for adversarial text generation

AU - Dasgupta, Prithviraj

AU - Collins, Joseph

AU - Buhman, Anna

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We consider the problem of adversarial text generation in the context of cyber-security tasks such as email spam filtering and text classification for sentiment analysis on social media sites. In adversarial text generation, an adversary attempts to perturb valid text data to generate adversarial text such that the adversarial text ends up getting mis-classified by a machine classifier. Many existing techniques for perturbing text data use gradient-based or white-box methods, where the adversary observes the gradient of the loss function from the classifier for a given input sample, and uses this information to strategically select portions of the text to perturb. On the other hand, black-box methods where the adversary does not have access to the gradient of the loss function from the classifier and has to probe the classifier with different input samples to generate successful adversarial samples, have been used less often for generating adversarial text. In this paper, we integrate black-box methods where the adversary has a limited budget of the number of probes to the classifier, with white-box, gradient-based methods, and evaluate the effectiveness of the adversarially generated text in misleading a deep network classifier model.

AB - We consider the problem of adversarial text generation in the context of cyber-security tasks such as email spam filtering and text classification for sentiment analysis on social media sites. In adversarial text generation, an adversary attempts to perturb valid text data to generate adversarial text such that the adversarial text ends up getting mis-classified by a machine classifier. Many existing techniques for perturbing text data use gradient-based or white-box methods, where the adversary observes the gradient of the loss function from the classifier for a given input sample, and uses this information to strategically select portions of the text to perturb. On the other hand, black-box methods where the adversary does not have access to the gradient of the loss function from the classifier and has to probe the classifier with different input samples to generate successful adversarial samples, have been used less often for generating adversarial text. In this paper, we integrate black-box methods where the adversary has a limited budget of the number of probes to the classifier, with white-box, gradient-based methods, and evaluate the effectiveness of the adversarially generated text in misleading a deep network classifier model.

UR - http://www.scopus.com/inward/record.url?scp=85058650829&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058650829&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85058650829

VL - 2269

SP - 17

EP - 23

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -