This thesis focuses on different data abstraction and pattern identification methods particularly in the cases of large 1D time-series and 2D spatio-temporal time-series data which exhibit spatiotemporal discontinuity. Based on the dimensionality and characteristics of the data, this thesis proposes a variety of efficient data-adaptive and user-controlled data abstraction methods that transform the raw data into a symbol sequence. The transformation of raw time-series into a symbol sequence can act as input to different sequence analysis methods from data mining and machine learning communities to identify interesting patterns of user behavior.
In the case of very long duration 1D time-series, locally adaptive and user-controlled data approximation methods were presented to simplify the data, while at the same time retaining the perceptually important features. The simplified data were converted into a symbol sequence and a sketch-based pattern identification was then used to identify patterns in the symbolic data using regular expression based pattern matching. The method was applied to financial time-series and patterns such as head-and-shoulders, double and triple-top patterns were identified using hand drawn sketches in an interactive manner. Through data smoothing, the data approximation step also enables visualization of inherent patterns in the time-series representation while at the same time retaining perceptually important points.
Very long duration 2D spatio-temporal eye tracking data sets that exhibit spatio-temporal discontinuity was transformed into symbolic data using scalable clustering and hierarchical cluster merging processes, each of which can be parallelized. The raw data is transformed into a symbol sequence with each symbol representing a region of interest in the eye gaze data. The identified regions of interest can also be displayed in a Space-Time Cube (STC) that captures both the temporal and contextual information. Through interactive filtering, zooming and geometric transformation, the STC representation along with linked views enables interactive data exploration. Using different sequence analysis methods, the symbol sequences are analyzed further to identify temporal patterns in the data set. Data collected from air traffic control officers from the domain of Air traffic control were used as application examples to demonstrate the results.