Data Structures for Analyzing Geometric Data

Ali Mehrabi

Promotor: prof.dr. M.T. de Berg (TU/e)
Copromotor: dr. K.A. Buchin (TU/e)
Technische Universiteit Eindhoven
Date: 31 August, 2017, 16.00

Summary

In applications involving large data sets, it is often important that the data are stored in such a way that certain queries can be answered quickly. Hence, the art of designing efficient data structures has always been one of the core areas within algorithms research. In many applications the data is of a geometric nature. A typical example is geographic information science, where the data are usually objects in 2- or 3-dimensional space. Thus also within computational geometry, the branch of algorithms research dealing with spatial data, data structures play a prominent role. Traditional geometric data structures have two main characteristics: they store constant-complexity objects, and their corresponding query procedures are to report or count all the objects who lie inside a given query range. However, recent advances in technology and the ever increasing availability of data in need of analysis, make the traditional geometric data structures insufficient. In particular, the data to be stored are often no longer constant-complexity objects, and the queries to be answered are no longer simple reporting or counting queries. The goal of this thesis is to study various data-structuring problems arising from this general observation. In particular, we develop novel data structures for storing entire trajectories and we develop novel data structures for performing more complicated analysis tasks (namely clustering) on point sets, as discussed next.

Trajectory analysis is a relatively young but important line of research within several research communities. Ecologists study the trajectories of a flock of birds during an immigration season to better learn their behavior. Traffic analysts rely on the trajectory data collected by GPS devices to plan better navigational strategies. Inspired by such real-life applications, in Chapters 2 and 3 we design several efficient data structures for trajectory data. More precisely, in the first part of Chapter 2 we present schemes to preprocess a given trajectory into a data structure for fast straight-path queries: given a directed query segment st, report all subtrajectories starting near s and going in a more or less straight line to a point near t. In the second part of Chapter 2 we develop an efficient data structure for storing a set S of trajectories that can quickly answer several standard similarity queries on S with respect to the Frechet distance, for query trajectories consisting of a fixed number of edges. In Chapter 3, we present a data structure for storing a set of objects that can report all intersecting pairs of objects inside a query range. This can be applied to find potential locations where two moving objects may have met, for the case of discrete trajectories.

Chapter 4 of the thesis deals with designing efficient data structures for extracting more information about the points inside a given a query range, instead of just counting or reporting the points. Here we focus on clustering, one of the most important analysis tasks. In particular, we present data structures that can return a (near-)optimal clustering of the points inside a given query range, for a wide range of clustering criteria.

Despite the technological advances in data acquisition, there are often still inaccuracies and uncertainties in the measured locations. As a third contribution we therefore study a geometric data-analysis problem on imprecise data. (An imprecise point is a point of which we do not know the exact location, but only a region in which it is located.)

In Chapter 5 we study separability problems in an imprecise setting, which form (besides clustering) another important class of geometric data-analysis problems. We present algorithms that can decide if a give set of red imprecise points might be separable (or is always separable) from a given set of blue imprecise points.