MAP-Vis:基于MAP模型的时空点状大数据可视化方案
MAP-Vis: a spatio-temporal big data visualization method based on multi-dimensional aggregation
- 2019年24卷第5期 页码:816-826
收稿:2018-06-14,
修回:2018-10-11,
纸质出版:2019-05-16
DOI: 10.11834/jig.180378
移动端阅览

浏览全部资源
扫码关注微信
收稿:2018-06-14,
修回:2018-10-11,
纸质出版:2019-05-16
移动端阅览
目的
2
对于大数据挖掘,可视分析是一种非常重要的研究手段,有助于快速、直观地理解分析大数据蕴含的价值信息。但因其海量、时空、高维等特征,大数据可视化存在内存消耗大、渲染延迟高、可视效果差等问题。针对上述问题,以海量时空点数据为例,采用预处理可视化方案,设计并实现了一套高可扩展的分布式可视分析框架。
方法
2
借鉴瓦片金字塔模型提出一种多维度聚合金字塔模型(MAP),将瓦片金字塔的2D空间层级聚合扩展到时间/空间/属性多维度,同时支持时间、空间、属性的多维层级聚合。进而以Spark集群作为并行预处理工具,以HBase分布式数据库持久化存储MAP模型数据,实现了一套开源的分布式可视化框架(MAP-Vis)。
结果
2
以纽约出租车数据集为例,本研究实验证明能够支持时间/空间/属性多尺度、多维度联动的交互式可视化,同时具有高可扩展的预处理能力和存储能力。
结论
2
在分布式处理能力支持下,系统能实现亚秒级的查询响应,达到良好的交互式可视化效果,证明MAP-Vis是一种有效的大数据交互式可视化方案。
Objective
2
As data collection methods mature and diversify
data sources such as personal smart devices
floating car GPS
internet of things
and social media are becoming increasingly abundant
and the amount of data have been accumulating in an explosive manner. Big data hold spatio-temporal information and high-dimensional features. Spatial and temporal features refer to attribute fields with spatial position and time tags. High dimensional features mean that the target data often contain other valuable attributes. Visual analysis is a highly important method for big data research as it can quickly and intuitively help researchers analyze and understand intrinsic values. However
because of its massive volume
spatio-temporal correlation
and high dimensions
big data visualization poses many challenges to current implementations
including large memory consumption
high rendering delay
and poor visual effects.
Method
2
In this study
we propose a generic multi-dimension aggregation pyramid (MAP) model on the basis of the well-known 2D tile pyramid model. This MAP model can support the hierarchical aggregation of time
space
and attributes simultaneously and transform the aggregated results into discrete key-value pairs for scalable storage and efficient retrieval. Then
we use the high-performance Spark cluster as a parallel preprocessing platform and the distributed HBase as final storage to store the generated MAP data. Finally
with the generated MAP datasets
we design and implement an open-source distributed visualization framework (MAP-Vis).
Result
2
The experiments use the open New York taxi data
which cover 30 months from January 2014 to June 2016. A single record contains trip-related information
including the location and time of the taxi origin/destination
trip duration
and distance. The visualization interface is implemented on the MAP-Vis framework
which uses HTML
CSS
and JavaScript. Leaflet and OpenStreetMap are used for road network display; the timeline and attribute histogram sections use the d3 library to support user interaction. Three efficiency metrics are collected to evaluate the performance of the MAP model and MAP-Vis system in terms of model validation
storage scalability
and system scalability. In the experiment of model validation
as the size of the raw data increases
the response time curve remains flat and does not show a significant linear increase; the values slightly fluctuate between 0.7 s and 1 s. This result indicates that the MAP model can scale well with the size of spatio-temporal data sets
guarantee a sub-second response
and achieve a smooth interactive visualization experience. In the experiment of storage scalability
as the number of clusters increases
the overall response time decreases dramatically from 3.2 s to 0.9 s
and the parallel efficiency is improved by approximately 2.4 times. This finding can be attributed to distributed storage. More storage nodes are used and the possibility of access to only one region and the access queue time are reduced. Therefore
by increasing the number of HBase storage regions
the proposed framework enhances query efficiency
fully exploits the parallelism of distributed clusters
and significantly improves the visual interactive experience. In the experiment of system scalability
the number of worker nodes in the Spark cluster is changed to measure how the pre-processing time changes (excluding the time of importing the HBase database). An increase in the number of nodes leads to the reduction of pre-processing time from 360 min to 160 min
and the efficiency is improved by approximately 1.3 times. Therefore
with computation nodes
the Spark cluster uses worker nodes and executor processes to share pre-processing tasks
thereby significantly improving the pre-processing efficiency.
Conclusion
2
Given its large size
space-time properties
high dimension
and other characteristics
spatial-temporal big data face various challenges such as large memory consumption
high rendering delay
and poor visual effect. To solve this problem
we first propose a spatio-temporal big data organization model
namely
the MAP
which integrates the tile pyramid model and the key-value matching method. The MAP model can consider the time and space dimensions
attribute information
and the three aggregate aggregations step by step
thereby adapting to the rapid and high visualization of time and space big data. On the basis of the MAP model
an open-source visualization framework
MAP-Vis
is implemented on a Linux cluster. The MAP-Vis system uses Spark as a pre-processing tool and HBase as a distributed storage platform. Experiments validate the efficiency of the proposed MAP model
and the undrerlying distributed platforms provide high scalability for visualization and processing. With the cluster
the MAP-Vis realizes sub-second data query and achieves good interactive visualization. Future work can be conducted in the following aspects. 1) This framework has strong support for point type data
but visual elements
including line type elements
polygon type elements
images
etc. should be considered compatible with other data types as much as possible. 2) A simple visual display cannot fully explore the law and value of big data. Hence
joining data analysis modules could be taken into consideration to make the MAP-Vis framework function complete.
Li D R, Ma J, Shao Z F. The theory of space time big data and its application[J]. Satellite Application, 2015, (9):7-11.
李德仁, 马军, 邵振峰.论时空大数据及其应用[J].卫星应用, 2015, (09):7-11.
Ren L, Du Y, Ma S, et al. Visual analytics towards big data[J]. Journal of Software, 2014, 25(9):1909-1936.
任磊, 杜一, 马帅, 等.大数据可视分析综述[J].软件学报, 2014, 25(9):1909-1936.[DOI:10.13328/j.cnki.jos.004645]
Godfrey P, Gryz J, Lasek P. Interactive visualization of large data sets[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8):2142-2157.[DOI:10.1109/TKDE.2016.2557324]
Tigani J, Naidu S. Google BigQuery Analytics[M]. John Wiley&Sons, 2014.
Root C, Mostak T. MapD: a GPU-powered big data analytics and visualization platform[C ] //Proceedings of ACM SIGGRAPH 2016 Talks. Anaheim, California: ACM, 2016: #73.[ DOI: 10.1145/2897839.2927468 http://dx.doi.org/10.1145/2897839.2927468 ]
Gray J, Chaudhuri S, Bosworth A, et al. Data cube:a relational aggregation operator generalizing group-by, cross-tab, and sub-totals[J]. Data Mining and Knowledge Discovery, 1997, 1(1):29-53.[DOI:10.1023/A:1009726021843]
Sismanis Y, Deligiannakis A, Kotidis Y, et al. Hierarch ical dwarfs for the rollup cube[C ] //Proceedings of the 6th ACM International Workshop on Data Warehousing and OLAP. New Orleans, Louisiana, USA: ACM, 2003: 17-24.[ DOI: 10.1145/956060.956064 http://dx.doi.org/10.1145/956060.956064 ]
Liu Z, Jiang B, Heer J. imMens: real-time visual querying of big data[C]//Computer Graphics Forum. Oxford, UK: Blackwell Publishing Ltd, 2013, 32(3pt4): 421-430.
Lins L, Klosowski J T, Scheidegger C. Nanocubes for real-time exploration of spatiotemporal datasets[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(12):2456-2465.[DOI:10.1109/TVCG.2013.179]
Im J F, Villegas F G, Mcguffin M J. VisReduce: fast and responsive incremental information visualization of large datasets[C ] //Proceedings of 2013 IEEE International Conference on Big Data. Silicon Valley, CA, USA: IEEE, 2013: 25-32.[ DOI:10.1109/BigData.2013.6691710 http://dx.doi.org/10.1109/BigData.2013.6691710 ]
Cheng D L, Schretlen P, Kronenfeld N, et al. Tile based visual analytics for twitter big data exploratory analysis[C ] //Proceedings of 2013 IEEE International Conference on Big Data. Silicon Valley, CA, USA: IEEE, 2013: 2-4.[ DOI: 10.1109/BigData.2013.6691787 http://dx.doi.org/10.1109/BigData.2013.6691787 ]
Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computing with working sets[C]//Proceedings of the 2nd Usenix Conference on Hot Topics in Cloud Computing. Boston, MA, USA: ACM, 2010: 10.
Fox A, Eichelberger C, Hughes J, et al. Spatio-temporal indexing in non-relational distributed databases[C ] //Proceedings of 2013 IEEE International Conference on Big Data. Silicon Valley, CA, USA: IEEE, 2013: 291-299.[ DOI: 10.1109/BigData.2013.6691586 http://dx.doi.org/10.1109/BigData.2013.6691586 ]
Dimiduk N, Khurana A. HBase in Action[M]. New York:Manning Publications, 2012.
George L. HBase:the definitive guide[J]. Andre, 2011, 12(1):1-4.
Haklay M, Weber P. OpenStreetMap:user-generated street maps[J]. IEEE Pervasive Computing, 2008, 7(4):12-18.[DOI:10.1109/MPRV.2008.80]
Agafonkin V. Leaflet: an open-source JavaScript library for mobile-friendly interactive maps[EB/OL ] .[2 018-06-01 ] http://leafletjs.com http://leafletjs.com .
Bostock M, Ogievetsky V, Heer J. D 3 data-driven documents[J ] . IEEE Transactions on Visualization and Computer Graphics, 2011, 17(12):2301-2309.[DOI:10.1109/TVCG.2011.185 ]
相关作者
相关机构
京公网安备11010802024621