SCIENTIA SINICA Informationis, Volume 47, Issue 11: 1523-1537(2017) https://doi.org/10.1360/N112017-00008

A trajectory feature extraction approach based on spatial coding technique

More info
  • ReceivedJan 8, 2017
  • AcceptedJun 26, 2017
  • PublishedNov 9, 2017


GPS data often have position deviations in precision and are apt to be affected by noise; hence, it is essential to extract features from trajectories before performing large-scale data mining. A GeoHash-based spatial coding technique called GeoHashTree was used to index spatiotemporal trajectory points in order to improve the efficiency of nearest-neighbor search. The GeoHashTree was applied in trajectory clustering and an improved density-based clustering algorithm was proposed to reduce the time complexity of nearest-neighbor search from $O(n^2)$ to $O(n\log{n})$. After extracting trajectory points with changing angles, the proposed clustering approach was employed to achieve deep-level feature extraction on trajectory points with changing angles, which aims to accurately identify feature points. Extensive experiments are conducted on real GPS data and the results demonstrate that the proposed trajectory-clustering algorithm based on the GeoHashTree spatial index structure can improve time performance by an average of 90.89% as well as guarantee the accuracy of clustering compared with the traditional clustering method. The visualization results show that the trajectory feature extraction approach can effectively find trajectory points with changing angles and discover a varying types of feature points from large-scale data sets. In addition, the proposed approach does not depend on road network data and can dynamically update with new incoming trajectory data as road networks change in real time.

Funded by








[1] Qiao S, Han N, Zhu W. TraPlan: An Effective Three-in-One Trajectory-Prediction Model in Transportation Networks. IEEE Trans Intell Transport Syst, 2015, 16: 1188-1198 CrossRef Google Scholar

[2] Zheng Y. Trajectory data mining: an overview. ACM Trans Intel Syst Technol, 2015, 6: 1--41. Google Scholar

[3] Bao J, Zheng Y, Wilkie D. Recommendations in location-based social networks: a survey. Geoinformatica, 2015, 19: 525-565 CrossRef Google Scholar

[4] Zheng K, Zheng Y, Yuan N J. Online Discovery of Gathering Patterns over Trajectories. IEEE Trans Knowl Data Eng, 2014, 26: 1974-1988 CrossRef Google Scholar

[5] Li X, Zhang J S, Ma J, et al. Feature extraction algorithm in consideration of the trend changing of track. J Comput Aid Des Comput Graph, 2016, 28: 1341--1349. Google Scholar

[6] Yuan N J, Zheng Y, Xie X. Discovering Urban Functional Zones Using Latent Activity Trajectories. IEEE Trans Knowl Data Eng, 2015, 27: 712-725 CrossRef Google Scholar

[7] Qiao S, Shen D, Wang X. A Self-Adaptive Parameter Selection Trajectory Prediction Approach via Hidden Markov Models. IEEE Trans Intell Transport Syst, 2015, 16: 284-296 CrossRef Google Scholar

[8] Yuan J, Zheng Y, Xie X, et al. An interactive-voting based map matching algorithm. In: Proceedings of the 11th International Conference on Mobile Data Management, Kansas City, 2010. 43--52. Google Scholar

[9] Shan Z, Wu H, Sun W, et al. COBWEB: a robust map update system using GPS trajectories. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Osaka, 2015. 927--937. Google Scholar

[10] Guo L, Li H W, Zhang Z J, et al. Geometry matching method for transportation road network data based on projection. Geometry Inf Sci Wuhan Univ, 2013, 38: 1113--1117. Google Scholar

[11] Zhao D B, Liu X M, Guo L. Real time map matching algorithm of floating car in support of spatial grid index. J Comput Aid Des Comput Graph, 2014, 26: 1550--1556. Google Scholar

[12] Feng W, Han C. A novel approach for trajectory feature representation and anomalous trajectory detection. In: Proceedings of the 18th International Conference on Information Fusion, Washington, 2015. 1093--1099. Google Scholar

[13] Zabalza J, Ren J, Zheng J. Novel Two-Dimensional Singular Spectrum Analysis for Effective Feature Extraction and Data Classification in Hyperspectral Imaging. IEEE Trans Geosci Remote Sens, 2015, 53: 4418-4433 CrossRef ADS Google Scholar

[14] Sauer F, Yu H, Ma K L. Trajectory-Based Flow Feature Tracking in Joint Particle/Volume Datasets.. IEEE Trans Visual Comput Graphics, 2014, 20: 2565-2574 CrossRef PubMed Google Scholar

[15] Chen C, Zhang D, Li N. B-Planner: Planning Bidirectional Night Bus Routes Using Large-Scale Taxi GPS Traces. IEEE Trans Intell Transport Syst, 2014, 15: 1451-1465 CrossRef Google Scholar

[16] Zhang D, Li N, Zhou Z H, et al. iBAT: detecting anomalous taxi trajectories from GPS traces. In: Proceedings of the 13th International Conference on Ubiquitous Computing, Beijing, 2011. 99--108. Google Scholar

[17] Chen C, Zhang D, Ma X, et al. Crowddeliver: planning city-wide package delivery paths leveraging the crowd of taxis. IEEE Trans Intel Transp Syst, 2017, 18: 1478--1496. Google Scholar

[18] Jin A, Cheng C Q, Song S H, et al. Regional query of area data based on Geohash. Geogr Geo-Inform Sci, 2013, 29: 31--35. Google Scholar

[19] Qiao S, Tang C, Jin H. PutMode: prediction of uncertain trajectories in moving objects databases. Appl Intell, 2010, 33: 370-386 CrossRef Google Scholar

[20] Jing Yuan , Yu Zheng , Xing Xie . T-Drive: Enhancing Driving Directions with Taxi Drivers Intelligence. IEEE Trans Knowl Data Eng, 2013, 25: 220-232 CrossRef Google Scholar

  • Figure 1

    (Color online) Working mechanism of trajectory feature extraction based on spatial coding technique

  • Figure 2

    Example of the data structure of a GeoHashTree

  • Figure 3

    Time cost comparison of distinct algorithms

  • Figure 4

    Number of clusters comparison of algorithms

  • Figure 5

    Clustering results of DBScan* algorithm under different cluster radiuses

  • Figure 6

    Clustering results of DBScan* under different minimum number of points

  • Figure 7

    Time cost of feature points extraction under different number of trajectories

  • Figure 8

    Relationship between feature extraction results and the size of extraction windows

  • Figure 9

    (Color online) Visualization of clustering results

  • Table 1   Example of latitude spatial coding method
    Area ($^\circ$) 0-area ($^\circ$) 1-area ($^\circ$) Code
    $-90\sim~90$ $-90\sim~0$ $0\sim~90$ 1
    $0\sim~90$ $0\sim~45$ $45\sim~90$ 0
    $0\sim~45$ $0\sim~22.5$ $22.5\sim~45$ 1
    $22.5\sim~45$ $22.5\sim~37.5$ $37.5\sim~45$ 1
    $33.75\sim~45$ $33.75\sim~39.375$ $39.375\sim~45$ 1
    $39.375\sim~45$ $39.375\sim~42.1875$ $42.1875\sim~45$ 0
    $39.375\sim~42.1875$ $39.375\sim~40.7812$ $40.7812\sim~42.1875$ 0
    $39.375\sim~40.7812$ $39.375\sim~40.0781$ $40.0781\sim~40.7812$ 0

    Algorithm 1 GeoHash(lat, lng, $n$)

    for each $i$=0 to $n$/2


    if lat $>~\overline{{\rm~la}}$ then

    $\alpha[i]\leftarrow~1$, $b\leftarrow~\overline{\rm~la}$;


    $\alpha[i]\leftarrow~0$, $t\leftarrow~\overline{\rm~la}$;

    end if


    if lng $>~\overline{\rm~ln}$ then

    $\beta[i]\leftarrow~1$, $l\leftarrow~\overline{\rm~ln}$;


    $\beta[i]\leftarrow~0$, $r\leftarrow~\overline{\rm~ln}$;

    end if

    end for


    return $c$.

  • Table 2   Description of trajectory datasets
    Parameter Value
    Time interval 2008/02/02 $\sim$ 2012/02/08
    No. of taxies 10357
    No. of trajectories $>$25000
    No. of trajectory points $>$15000000
    Length of trajectories $>$9000000 km

    Algorithm 2 searchNeighbors($p$, $e$)

    create a GeoHashTree;

    $c\leftarrow$ encode($p$);

    for each $i\in$ GeoHashTree($c$)


    end for

    for each $n\in$ getNeighbors($c$)

    for each $i\in$ GeoHashTree($n$)


    end for

    end for

    return $D$.


    Algorithm 3 基于DBScan的改进轨迹聚类算法 —— DBScan*($D$, $e$, $\theta$)


    for each unvisited point $p$ in $D$

    mark $p$ as Visited;


    if sizeof($N$) $<~\theta$ then

    mark $p$ as Noise;


    $C\leftarrow$ next cluster;


    end if

    end for

  • Table 3   Time cost of GeoHashTree generation
    No. of trajectory points 2500 5000 10000 20000 100000 250000 500000
    Time (ms) 15 31 63 98 266 679 2310
  • Table 4   Feature extraction accuracy under different number of trajectories
    No. of trajectories 1000 2000 3000 4000 5000 6000 7000 8000
    Accuracy (%) 81.3 83.4 84.1 84.8 80.9 76.3 75.8 76.1

    Algorithm 4 基于角度变化的轨迹特征点提取算法

    for each $t$ in $T$

    $P\leftarrow$ $t$.getPoints(); //$P$表示轨迹点集合

    for $i\in[0,|P|)$

    $\tau_i\leftarrow$ get$\_\tau_i(p_i$);

    if calAngle$(p_{\{i-\tau_i\}},~p_i,~p_{\{i+\tau_i\}})~>~\delta$ then


    end if

    end for

    end for

    $C\leftarrow$ DBScan*$(D,~e,~\theta)$; //$e$表示邻域半径, $\theta$表示最小邻域点数

    for each $c$ in $C$


    end for

    return $W$.

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号