Benchmarking machine learning methods for comprehensive chemical fingerprinting and pattern recognition

Stephen E. Reichenbach, Claudia A. Zini, Karine P. Nicolli, Juliane E. Welke, Chiara Cordero, Qingping Tao

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Machine learning (ML) has been used previously to recognize particular patterns of constituent compounds. Here, ML is used with comprehensive chemical fingerprints that capture the distribution of all constituent compounds to flexibly perform various pattern recognition tasks. Such pattern recognition requires a sequence of chemical analysis, data analysis, and pattern analysis. Chemical analysis with comprehensive multidimensional chromatography is a maturing approach for highly effective separations of complex samples and so provides a solid foundation for undertaking comprehensive chemical fingerprinting. Data analysis with smart templates employs marker peaks and chemical logic for chromatographic alignment and peak-regions to delineate chromatographic windows in which analytes are quantified and matched consistently across chromatograms to create chemical profiles that serve as complete fingerprints. Pattern analysis uses ML techniques with the resulting fingerprints to recognize sample characteristics, e.g., for classification. Our experiments evaluated the effectiveness of seventeen different ML techniques for various classification problems with chemical fingerprints from a rich data set from 126 wine samples of different varieties, geographic regions, vintages, and wineries. Results of these experiments showed an accuracy range from 58% to 88% for different ML methods on the most difficult classification problems and 96% to 100% for different ML methods on the least difficult classification problems. Averaged over 14 classification problems, accuracy for the different methods ranged from 80% to 90%, with some relatively simple ML techniques among the top-performing methods.

Original languageEnglish (US)
Pages (from-to)158-167
Number of pages10
JournalJournal of Chromatography A
Volume1595
DOIs
StatePublished - Jun 21 2019

Fingerprint

Benchmarking
Pattern recognition
Learning systems
Dermatoglyphics
Wine
Chromatography
Machine Learning
Chemical analysis
Sequence Analysis
Experiments

Keywords

  • Classification
  • Comprehensive two-dimensional gas chromatography
  • Data mining
  • GCxGC
  • Machine learning

ASJC Scopus subject areas

  • Analytical Chemistry
  • Biochemistry
  • Organic Chemistry

Cite this

Benchmarking machine learning methods for comprehensive chemical fingerprinting and pattern recognition. / Reichenbach, Stephen E.; Zini, Claudia A.; Nicolli, Karine P.; Welke, Juliane E.; Cordero, Chiara; Tao, Qingping.

In: Journal of Chromatography A, Vol. 1595, 21.06.2019, p. 158-167.

Research output: Contribution to journalArticle

Reichenbach, Stephen E. ; Zini, Claudia A. ; Nicolli, Karine P. ; Welke, Juliane E. ; Cordero, Chiara ; Tao, Qingping. / Benchmarking machine learning methods for comprehensive chemical fingerprinting and pattern recognition. In: Journal of Chromatography A. 2019 ; Vol. 1595. pp. 158-167.
@article{2d78004842494f4ba20e93e76432bca7,
title = "Benchmarking machine learning methods for comprehensive chemical fingerprinting and pattern recognition",
abstract = "Machine learning (ML) has been used previously to recognize particular patterns of constituent compounds. Here, ML is used with comprehensive chemical fingerprints that capture the distribution of all constituent compounds to flexibly perform various pattern recognition tasks. Such pattern recognition requires a sequence of chemical analysis, data analysis, and pattern analysis. Chemical analysis with comprehensive multidimensional chromatography is a maturing approach for highly effective separations of complex samples and so provides a solid foundation for undertaking comprehensive chemical fingerprinting. Data analysis with smart templates employs marker peaks and chemical logic for chromatographic alignment and peak-regions to delineate chromatographic windows in which analytes are quantified and matched consistently across chromatograms to create chemical profiles that serve as complete fingerprints. Pattern analysis uses ML techniques with the resulting fingerprints to recognize sample characteristics, e.g., for classification. Our experiments evaluated the effectiveness of seventeen different ML techniques for various classification problems with chemical fingerprints from a rich data set from 126 wine samples of different varieties, geographic regions, vintages, and wineries. Results of these experiments showed an accuracy range from 58{\%} to 88{\%} for different ML methods on the most difficult classification problems and 96{\%} to 100{\%} for different ML methods on the least difficult classification problems. Averaged over 14 classification problems, accuracy for the different methods ranged from 80{\%} to 90{\%}, with some relatively simple ML techniques among the top-performing methods.",
keywords = "Classification, Comprehensive two-dimensional gas chromatography, Data mining, GCxGC, Machine learning",
author = "Reichenbach, {Stephen E.} and Zini, {Claudia A.} and Nicolli, {Karine P.} and Welke, {Juliane E.} and Chiara Cordero and Qingping Tao",
year = "2019",
month = "6",
day = "21",
doi = "10.1016/j.chroma.2019.02.027",
language = "English (US)",
volume = "1595",
pages = "158--167",
journal = "Journal of Chromatography A",
issn = "0021-9673",

}

TY - JOUR

T1 - Benchmarking machine learning methods for comprehensive chemical fingerprinting and pattern recognition

AU - Reichenbach, Stephen E.

AU - Zini, Claudia A.

AU - Nicolli, Karine P.

AU - Welke, Juliane E.

AU - Cordero, Chiara

AU - Tao, Qingping

PY - 2019/6/21

Y1 - 2019/6/21

N2 - Machine learning (ML) has been used previously to recognize particular patterns of constituent compounds. Here, ML is used with comprehensive chemical fingerprints that capture the distribution of all constituent compounds to flexibly perform various pattern recognition tasks. Such pattern recognition requires a sequence of chemical analysis, data analysis, and pattern analysis. Chemical analysis with comprehensive multidimensional chromatography is a maturing approach for highly effective separations of complex samples and so provides a solid foundation for undertaking comprehensive chemical fingerprinting. Data analysis with smart templates employs marker peaks and chemical logic for chromatographic alignment and peak-regions to delineate chromatographic windows in which analytes are quantified and matched consistently across chromatograms to create chemical profiles that serve as complete fingerprints. Pattern analysis uses ML techniques with the resulting fingerprints to recognize sample characteristics, e.g., for classification. Our experiments evaluated the effectiveness of seventeen different ML techniques for various classification problems with chemical fingerprints from a rich data set from 126 wine samples of different varieties, geographic regions, vintages, and wineries. Results of these experiments showed an accuracy range from 58% to 88% for different ML methods on the most difficult classification problems and 96% to 100% for different ML methods on the least difficult classification problems. Averaged over 14 classification problems, accuracy for the different methods ranged from 80% to 90%, with some relatively simple ML techniques among the top-performing methods.

AB - Machine learning (ML) has been used previously to recognize particular patterns of constituent compounds. Here, ML is used with comprehensive chemical fingerprints that capture the distribution of all constituent compounds to flexibly perform various pattern recognition tasks. Such pattern recognition requires a sequence of chemical analysis, data analysis, and pattern analysis. Chemical analysis with comprehensive multidimensional chromatography is a maturing approach for highly effective separations of complex samples and so provides a solid foundation for undertaking comprehensive chemical fingerprinting. Data analysis with smart templates employs marker peaks and chemical logic for chromatographic alignment and peak-regions to delineate chromatographic windows in which analytes are quantified and matched consistently across chromatograms to create chemical profiles that serve as complete fingerprints. Pattern analysis uses ML techniques with the resulting fingerprints to recognize sample characteristics, e.g., for classification. Our experiments evaluated the effectiveness of seventeen different ML techniques for various classification problems with chemical fingerprints from a rich data set from 126 wine samples of different varieties, geographic regions, vintages, and wineries. Results of these experiments showed an accuracy range from 58% to 88% for different ML methods on the most difficult classification problems and 96% to 100% for different ML methods on the least difficult classification problems. Averaged over 14 classification problems, accuracy for the different methods ranged from 80% to 90%, with some relatively simple ML techniques among the top-performing methods.

KW - Classification

KW - Comprehensive two-dimensional gas chromatography

KW - Data mining

KW - GCxGC

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=85062234529&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062234529&partnerID=8YFLogxK

U2 - 10.1016/j.chroma.2019.02.027

DO - 10.1016/j.chroma.2019.02.027

M3 - Article

C2 - 30833025

AN - SCOPUS:85062234529

VL - 1595

SP - 158

EP - 167

JO - Journal of Chromatography A

JF - Journal of Chromatography A

SN - 0021-9673

ER -