logo

SCIENTIA SINICA Informationis, Volume 47, Issue 8: 1051-1065(2017) https://doi.org/10.1360/N112016-00310

Semantic analysis of spatial temporal trajectory in LBSNs

More info
  • ReceivedApr 7, 2017
  • AcceptedMay 10, 2017
  • PublishedAug 16, 2017

Abstract

Spatial temporal data has associated multidimensional features. Deep learning has attracted much attention due to its ability to perform high-level abstraction of complex data. In this paper, we give the definition of the track-data based on its characteristics, and build a spatial temporal semantic trajectory model using Word2vec as its foundation. We explore the semantics of different user-tracks under varying time periods by training position vectors in the model network. During the experiments, we use Top-$K$ neighbor prediction and cluster analysis to verify that the position vector has both good semantic meaning and structure. The vector is derived from a trajectory model that employs unsupervised learning. The results also test a word-vector-based language model that can be applied to the study of trajectory mining.


Funded by

国家重点基础研究发展计划 (973)(2015CB352502)

国家自然科学基金(61272092,61572289)

山东省自然科学基金(ZR2015FM002,ZR2016FB14)


References

[1] Dingqi Yang , Daqing Zhang , Zheng V W. Modeling User Activity Preference by Leveraging User Spatial Temporal Characteristics in LBSNs. IEEE Trans Syst Man Cybern Syst, 2015, 45: 129-142 CrossRef Google Scholar

[2] Yuan Q, Cong G, Ma Z, et al. Time-aware point-of-interest recommendation. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 2013. 363--372. Google Scholar

[3] Nguyen N T, Phung D Q, Venkatesh S, et al. Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Los Alamitos, 2005: 955--960. Google Scholar

[4] Monreale A, Pinelli F, Trasarti R, et al. Wherenext: a location predictor on trajectory pattern mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2009. łinebreak 637--646. Google Scholar

[5] He J, Li X, Liao L, et al. Inferring a personalized next point-of-interest recommendation model with latent behavior patterns. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, Menlo Park, 2016. 137--143. Google Scholar

[6] Song L J, Meng F R, Yuan G. Moving object location prediction algorithm based on Markov model and trajectory similarity. Comput App, 2016, 36: 39--43 . Google Scholar

[7] Long X, Jin L, Joshi J. Exploring trajectory-driven local geographic topics in foursquare. In: Proceedings of the ACM Conference on Ubiquitous Computing, New York, 2012. 927--934. Google Scholar

[8] Cai H N, Chen C, Wen J H, et al. Personalized location recommendation algorithm research based on user check-ins and geographical properties. Comput Sci, 2016, 43: 163--167 . Google Scholar

[9] Wang X. Action recognition using topic models. In: Visual Analysis of Humans. London: Springer, 2011. 311--332. Google Scholar

[10] Yang D, Zhang D, Yu Z, et al. Fine-grained preference-aware location search leveraging crowdsourced digital footprints from LBSNs. In: Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing, New York, 2013. 479--488. Google Scholar

[11] Yin P, Ye M, Lee W C, et al. Mining GPS data for trajectory recommendation. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, London, 2014. 50--61. Google Scholar

[12] Feng S, Li X, Zeng Y, et al. Personalized ranking metric embedding for next new POI recommendation. In: Proceedings of the International Joint Conference on Artificial Intelligence, Menlo Park, 2015. 2069--2075. Google Scholar

[13] Zhou N, Zhao W X, Zhang X. A General Multi-Context Embedding Model for Mining Human Trajectory Data. IEEE Trans Knowl Data Eng, 2016, 28: 1945-1958 CrossRef Google Scholar

[14] Xie M, Yin H, Wang H, et al. Learning graph-based POI embedding for location-based recommendation. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, New York, 2016. 15--24. Google Scholar

[15] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space,. arXiv Google Scholar

[16] Hinton G E. Learning distributed representations of concepts. In: Proceedings of the 8th Annual Conference of the Cognitive Science Society, London, 1986. 1--12. Google Scholar

[17] Mikolov T. Statistical language models based on neural networks. Google, 2012. http://www.fit.vutbr.cz/ imikolov/rnnlm/google.pdf. Google Scholar

[18] Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model. J Mach Learn Res, 2003, 3: 1137--1155. Google Scholar

[19] Le Q V, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, 2014. 1188--1196. Google Scholar

[20] Boyd S, Vandenberghe L. Convex Optimization. Cambridge: Cambridge University Press, 2004. 466--468. Google Scholar

[21] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. Adv Neural Inf Proc Syst, 2013, 26: 3111--3119. Google Scholar

  • Figure 1

    (Color online) Loc2Vec model network structure diagram

  • Figure 4

    (Color online) Error distance statistic diagram of abnormal data

  • Figure 5

    Semantic analysis framework of spatial temporal trajectory

  • Figure 6

    (Color online) Location vector $K$-means clustering distribution ($K$=10). (a) Absolute number; (b) relative proportion

  • Figure 7

    (Color online) Location vector $K$-means clustering geographic map. (a) All points; (b) top-30 points

  • Figure 8

    (Color online) Location vector hierarchical clustering diagrams. (a) Hierarchical clustering dendrogram; (b) clustering geographic map

  •   

    Algorithm 1 Loc2Vec模型训练算法

    Require:Position sample $({\rm Context}_{i-c}^{i+c} L_{t_i}, L_{t_i})$, vector dimension $m$;

    $e = 0$;

    ${\rm SL} = \sum_{k=i-c}^{2c}{\rm Context}(L_{t_k}) \in \mathbb{R}^m$;

    for $k = 2,3,\ldots,l$ do

    $g = \alpha[1-h_k-\sigma({\rm SL}^{\rm T}\theta_{k-1})]$;

    $e = e+ g\cdot\theta_{k-1}$;

    $\theta_{k-1} = \theta_{k-1}+g\cdot {\rm SL}$;

    end for

    for $L_{t_i} \in {\rm Context}_{i-c}^{i+c} L_{t_i}$ do

    $\widetilde{L_{t_i}} = \widetilde{L_{t_i}}+e$;

    end for

    Output:Position vector $L_{t_i}$

  • Table 1   Parameters of Huffman tree in output layer
    Symbol Definition
    $p$ The path from the root node to the leaf node $L_{t_i}$
    $l$ The number of nodes contained in the path $p$
    $p_1, p_2,\ldots, p_l$ $l$ nodes in the path $p$, where $p_1$ is the root node and $p_l$ is the leaf node
    $h_2, h_3,\ldots,h_l \in \{0, 1\}$ Leaf node $L_{t_i}$ in the path $p$ corresponds to the Huffman Coding,
    where $h_k$ represents the $k$th code, $p_1$ is not encoded
    $\theta_1, \theta_2,\ldots, \theta_l \in \mathbb{R}^m$ The vectorization of the non-leaf node in the path $p$,
    where $\theta_k$ represents the mapping vector for the $k^{th}$ non-leaf node
  • Table 2   Users' check-in data samples
    Attribute name CHS name Example
    User ID (anonymized) 匿名用户代码 32
    Venue ID* 签到街道代码 44af9feef964a5202b351fe3
    Venue category ID* 街道类型代码 4bf58dd8d48988d1c1941735
    Venue category name* 街道类别名称 Mexican Restaurant
    Latitude, Longitude 经纬度坐标 40.747738169430534, $-$73.98519814526952
    Time zone offset in minutes UTC时差 $-$240
    UTC time 世界标准时间 Tue Apr 03 18:15:33 +0000 2012
  • Table 3   Check-in location types (New York)
    Category CHS name Amount Example
    Food 餐饮类 67 Restaurant; Joint
    Shop & service 商贸服务类 61 Shop; Store; Service
    Outdoors & recreation 户外休闲类 26 Beach; Garden
    Travel & transport 旅游交通类 22 Airport; Travel lounge
    Public service 公共服务类 21 Bank; Temple
    Education 教育类 20 School; College
    Arts & entertainment 艺术娱乐类 20 Museum; Venue
    Industry 产业制造类 7 Factory; Facility
    Athletic & sport 体育运动类 4 Gym; Stadium
    Community 社区类 3 Home; Neighborhood
  • Table 4   Users' trajectory abnormal data samples
    Type VID LAT (Latitude) LON (Longitude) Offset (m)
    Same GPS 4b992b04f964a520726635e3 40.683120 $-$73.975979 0
    5089d4bce4b0f6951cdeb4f0
    Same ID 41390580f964a520dc1a1fe3 40.7420160433638 $-$74.005163366761 5.876
    40.7419894750989 $-$74.0051174588942
  • Table 5   Main parameters
    Parameter CHS name Corresponding parameter Defaults
    size 向量维数 $m$ 100
    window 训练窗口大小 2$c$ 5
    sample 高频降采样阈值 1E$-$3
    negative 负例采样数 5
    threads 程序线程数 12
    min-count 低频词截断阈值 5
    alpha 初始学习速率 $\alpha$ 0.05
    iter 迭代次数 5
  • Table 6   Top-$K$ ($K$=10) neighbor prediction of target location (Japanese Restaurant)
    Type Distance VID Category VCN GPS GEO
    distance
    Single 0.674 4b546885f964a52031ba27e3 Food Food & drink shop 35.654 139.544 24.828
    Single 0.638 4bea5deb6295c9b6c05b8608 Education College 35.656 139.544 230.291
    academic building
    Single 0.631 4b6f7d59f964a520a7f22ce3 Education University 35.657 139.541 400.026
    24 h 0.682 4ec09a4fbe7b04923ccd270d Outdoors & Sculpture garden 35.657 139.544 424.540
    Recreation
    24 h 0.678 4b698217f964a5202ea52be3 Food Chinese restaurant 35.653 139.544 63.646
    Workdays 0.680 4cca4775177c370483661534 Education College 35.658 139.544 468.477
    academic building
    Workdays 0.645 4ffee5dde4b04497d91e7c4c Community Home (private) 35.669 139.553 1806.987
    Weekends 0.685 4b5a88f1f964a52036ca28e3 Shop & Service Video store 35.652 139.546 143.953
    Weekends 0.612 4cb598229c7ba35db53c8b06 Outdoors & Recreation Bar 35.651 139.543 337.812
  • Table 7   Neighbor prediction sample analysis
    Type AVG distance (m) SD
    Single 251.205 150.886
    24 h 274.516 121.610
    Workdays 462.714 489.566
    Weekends 698.784 608.596

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1