A dissimilarity function for geospatial polygons

Deepti Joshi, Leen-Kiat Soh, Ashok K Samal, Jing Zhang

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Similarity plays an important role in many data mining tasks and information retrieval processes. Most of the supervised, semi-supervised, and unsupervised learning algorithms depend on using a dissimilarity function that measures the pair-wise similarity between the objects within the dataset. However, traditionally most of the similarity functions fail to adequately treat all the spatial attributes of the geospatial polygons due to the incomplete quantitative representation of structural and topological information contained within the polygonal datasets. In this paper, we propose a new dissimilarity function known as the polygonal dissimilarity function (PDF) that comprehensively integrates both the spatial and the non-spatial attributes of a polygon to specifically consider the density, distribution, and topological relationships that exist within the polygonal datasets. We represent a polygon as a set of intrinsic spatial attributes, extrinsic spatial attributes, and non-spatial attributes. Using this representation of the polygons, PDF is defined as a weighted function of the distance between two polygons in the different attribute spaces. In order to evaluate our dissimilarity function, we compare and contrast it with other distance functions proposed in the literature that work with both spatial and non-spatial attributes. In addition, we specifically investigate the effectiveness of our dissimilarity function in a clustering application using a partitional clustering technique (e.g. (Formula presented.)-medoids) using two characteristically different sets of data: (a) Irregular geometric shapes determined by natural processes, i.e., watersheds and (b) semi-regular geometric shapes determined by human experts, i.e., counties.

Original languageEnglish (US)
Pages (from-to)153-188
Number of pages36
JournalKnowledge and Information Systems
Volume41
Issue number1
DOIs
StatePublished - Oct 1 2014

Fingerprint

Unsupervised learning
Supervised learning
Watersheds
Information retrieval
Learning algorithms
Data mining

Keywords

  • Dissimilarity function
  • Polygonal clustering
  • Polygons
  • Regionalization
  • Spatial data mining

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Hardware and Architecture
  • Artificial Intelligence

Cite this

A dissimilarity function for geospatial polygons. / Joshi, Deepti; Soh, Leen-Kiat; Samal, Ashok K; Zhang, Jing.

In: Knowledge and Information Systems, Vol. 41, No. 1, 01.10.2014, p. 153-188.

Research output: Contribution to journalArticle

@article{da287f9daa894936841ffb21014acc1f,
title = "A dissimilarity function for geospatial polygons",
abstract = "Similarity plays an important role in many data mining tasks and information retrieval processes. Most of the supervised, semi-supervised, and unsupervised learning algorithms depend on using a dissimilarity function that measures the pair-wise similarity between the objects within the dataset. However, traditionally most of the similarity functions fail to adequately treat all the spatial attributes of the geospatial polygons due to the incomplete quantitative representation of structural and topological information contained within the polygonal datasets. In this paper, we propose a new dissimilarity function known as the polygonal dissimilarity function (PDF) that comprehensively integrates both the spatial and the non-spatial attributes of a polygon to specifically consider the density, distribution, and topological relationships that exist within the polygonal datasets. We represent a polygon as a set of intrinsic spatial attributes, extrinsic spatial attributes, and non-spatial attributes. Using this representation of the polygons, PDF is defined as a weighted function of the distance between two polygons in the different attribute spaces. In order to evaluate our dissimilarity function, we compare and contrast it with other distance functions proposed in the literature that work with both spatial and non-spatial attributes. In addition, we specifically investigate the effectiveness of our dissimilarity function in a clustering application using a partitional clustering technique (e.g. (Formula presented.)-medoids) using two characteristically different sets of data: (a) Irregular geometric shapes determined by natural processes, i.e., watersheds and (b) semi-regular geometric shapes determined by human experts, i.e., counties.",
keywords = "Dissimilarity function, Polygonal clustering, Polygons, Regionalization, Spatial data mining",
author = "Deepti Joshi and Leen-Kiat Soh and Samal, {Ashok K} and Jing Zhang",
year = "2014",
month = "10",
day = "1",
doi = "10.1007/s10115-013-0666-2",
language = "English (US)",
volume = "41",
pages = "153--188",
journal = "Knowledge and Information Systems",
issn = "0219-1377",
publisher = "Springer London",
number = "1",

}

TY - JOUR

T1 - A dissimilarity function for geospatial polygons

AU - Joshi, Deepti

AU - Soh, Leen-Kiat

AU - Samal, Ashok K

AU - Zhang, Jing

PY - 2014/10/1

Y1 - 2014/10/1

N2 - Similarity plays an important role in many data mining tasks and information retrieval processes. Most of the supervised, semi-supervised, and unsupervised learning algorithms depend on using a dissimilarity function that measures the pair-wise similarity between the objects within the dataset. However, traditionally most of the similarity functions fail to adequately treat all the spatial attributes of the geospatial polygons due to the incomplete quantitative representation of structural and topological information contained within the polygonal datasets. In this paper, we propose a new dissimilarity function known as the polygonal dissimilarity function (PDF) that comprehensively integrates both the spatial and the non-spatial attributes of a polygon to specifically consider the density, distribution, and topological relationships that exist within the polygonal datasets. We represent a polygon as a set of intrinsic spatial attributes, extrinsic spatial attributes, and non-spatial attributes. Using this representation of the polygons, PDF is defined as a weighted function of the distance between two polygons in the different attribute spaces. In order to evaluate our dissimilarity function, we compare and contrast it with other distance functions proposed in the literature that work with both spatial and non-spatial attributes. In addition, we specifically investigate the effectiveness of our dissimilarity function in a clustering application using a partitional clustering technique (e.g. (Formula presented.)-medoids) using two characteristically different sets of data: (a) Irregular geometric shapes determined by natural processes, i.e., watersheds and (b) semi-regular geometric shapes determined by human experts, i.e., counties.

AB - Similarity plays an important role in many data mining tasks and information retrieval processes. Most of the supervised, semi-supervised, and unsupervised learning algorithms depend on using a dissimilarity function that measures the pair-wise similarity between the objects within the dataset. However, traditionally most of the similarity functions fail to adequately treat all the spatial attributes of the geospatial polygons due to the incomplete quantitative representation of structural and topological information contained within the polygonal datasets. In this paper, we propose a new dissimilarity function known as the polygonal dissimilarity function (PDF) that comprehensively integrates both the spatial and the non-spatial attributes of a polygon to specifically consider the density, distribution, and topological relationships that exist within the polygonal datasets. We represent a polygon as a set of intrinsic spatial attributes, extrinsic spatial attributes, and non-spatial attributes. Using this representation of the polygons, PDF is defined as a weighted function of the distance between two polygons in the different attribute spaces. In order to evaluate our dissimilarity function, we compare and contrast it with other distance functions proposed in the literature that work with both spatial and non-spatial attributes. In addition, we specifically investigate the effectiveness of our dissimilarity function in a clustering application using a partitional clustering technique (e.g. (Formula presented.)-medoids) using two characteristically different sets of data: (a) Irregular geometric shapes determined by natural processes, i.e., watersheds and (b) semi-regular geometric shapes determined by human experts, i.e., counties.

KW - Dissimilarity function

KW - Polygonal clustering

KW - Polygons

KW - Regionalization

KW - Spatial data mining

UR - http://www.scopus.com/inward/record.url?scp=84908134709&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908134709&partnerID=8YFLogxK

U2 - 10.1007/s10115-013-0666-2

DO - 10.1007/s10115-013-0666-2

M3 - Article

AN - SCOPUS:84908134709

VL - 41

SP - 153

EP - 188

JO - Knowledge and Information Systems

JF - Knowledge and Information Systems

SN - 0219-1377

IS - 1

ER -