MAP-Vis:基于MAP模型的时空点状大数据可视化方案

谢冲; 关雪峰; 周炜轩; 吴华意

doi:10.11834/jig.180378

计算机图形学 | 浏览量 : 0 下载量: 12 CSCD: 0

PDF
导出
分享
收藏
专辑

MAP-Vis:基于MAP模型的时空点状大数据可视化方案
MAP-Vis: a spatio-temporal big data visualization method based on multi-dimensional aggregation
2019年24卷第5期页码：816-826
收稿：2018-06-14，

修回：2018-10-11，

纸质出版：2019-05-16
DOI： 10.11834/jig.180378
稿件说明：

移动端阅览

谢冲, 关雪峰, 周炜轩, 吴华意. MAP-Vis:基于MAP模型的时空点状大数据可视化方案[J]. 中国图象图形学报, 2019,24(5):816-826. DOI： 10.11834/jig.180378.

Chong Xie, Xuefeng Guan, Weixuan Zhou, Huayi Wu. MAP-Vis: a spatio-temporal big data visualization method based on multi-dimensional aggregation[J]. Journal of Image and Graphics, 2019, 24(5): 816-826. DOI： 10.11834/jig.180378.

摘要

目的

对于大数据挖掘，可视分析是一种非常重要的研究手段，有助于快速、直观地理解分析大数据蕴含的价值信息。但因其海量、时空、高维等特征，大数据可视化存在内存消耗大、渲染延迟高、可视效果差等问题。针对上述问题，以海量时空点数据为例，采用预处理可视化方案，设计并实现了一套高可扩展的分布式可视分析框架。

方法

借鉴瓦片金字塔模型提出一种多维度聚合金字塔模型（MAP），将瓦片金字塔的2D空间层级聚合扩展到时间/空间/属性多维度，同时支持时间、空间、属性的多维层级聚合。进而以Spark集群作为并行预处理工具，以HBase分布式数据库持久化存储MAP模型数据，实现了一套开源的分布式可视化框架（MAP-Vis）。

结果

以纽约出租车数据集为例，本研究实验证明能够支持时间/空间/属性多尺度、多维度联动的交互式可视化，同时具有高可扩展的预处理能力和存储能力。

结论

在分布式处理能力支持下，系统能实现亚秒级的查询响应，达到良好的交互式可视化效果，证明MAP-Vis是一种有效的大数据交互式可视化方案。

Abstract

Objective

As data collection methods mature and diversify

data sources such as personal smart devices

floating car GPS

internet of things

and social media are becoming increasingly abundant

and the amount of data have been accumulating in an explosive manner. Big data hold spatio-temporal information and high-dimensional features. Spatial and temporal features refer to attribute fields with spatial position and time tags. High dimensional features mean that the target data often contain other valuable attributes. Visual analysis is a highly important method for big data research as it can quickly and intuitively help researchers analyze and understand intrinsic values. However

because of its massive volume

spatio-temporal correlation

and high dimensions

big data visualization poses many challenges to current implementations

including large memory consumption

high rendering delay

and poor visual effects.

Method

In this study

we propose a generic multi-dimension aggregation pyramid (MAP) model on the basis of the well-known 2D tile pyramid model. This MAP model can support the hierarchical aggregation of time

space

and attributes simultaneously and transform the aggregated results into discrete key-value pairs for scalable storage and efficient retrieval. Then

we use the high-performance Spark cluster as a parallel preprocessing platform and the distributed HBase as final storage to store the generated MAP data. Finally

with the generated MAP datasets

we design and implement an open-source distributed visualization framework (MAP-Vis).

Result

The experiments use the open New York taxi data

which cover 30 months from January 2014 to June 2016. A single record contains trip-related information

including the location and time of the taxi origin/destination

trip duration

and distance. The visualization interface is implemented on the MAP-Vis framework

which uses HTML

CSS

and JavaScript. Leaflet and OpenStreetMap are used for road network display; the timeline and attribute histogram sections use the d3 library to support user interaction. Three efficiency metrics are collected to evaluate the performance of the MAP model and MAP-Vis system in terms of model validation

storage scalability

and system scalability. In the experiment of model validation

as the size of the raw data increases

the response time curve remains flat and does not show a significant linear increase; the values slightly fluctuate between 0.7 s and 1 s. This result indicates that the MAP model can scale well with the size of spatio-temporal data sets

guarantee a sub-second response

and achieve a smooth interactive visualization experience. In the experiment of storage scalability

as the number of clusters increases

the overall response time decreases dramatically from 3.2 s to 0.9 s

and the parallel efficiency is improved by approximately 2.4 times. This finding can be attributed to distributed storage. More storage nodes are used and the possibility of access to only one region and the access queue time are reduced. Therefore

by increasing the number of HBase storage regions

the proposed framework enhances query efficiency

fully exploits the parallelism of distributed clusters

and significantly improves the visual interactive experience. In the experiment of system scalability

the number of worker nodes in the Spark cluster is changed to measure how the pre-processing time changes (excluding the time of importing the HBase database). An increase in the number of nodes leads to the reduction of pre-processing time from 360 min to 160 min

and the efficiency is improved by approximately 1.3 times. Therefore

with computation nodes

the Spark cluster uses worker nodes and executor processes to share pre-processing tasks

thereby significantly improving the pre-processing efficiency.

Conclusion

Given its large size

space-time properties

high dimension

and other characteristics

spatial-temporal big data face various challenges such as large memory consumption

high rendering delay

and poor visual effect. To solve this problem

we first propose a spatio-temporal big data organization model

namely

the MAP

which integrates the tile pyramid model and the key-value matching method. The MAP model can consider the time and space dimensions

attribute information

and the three aggregate aggregations step by step

thereby adapting to the rapid and high visualization of time and space big data. On the basis of the MAP model

an open-source visualization framework

MAP-Vis

is implemented on a Linux cluster. The MAP-Vis system uses Spark as a pre-processing tool and HBase as a distributed storage platform. Experiments validate the efficiency of the proposed MAP model

and the undrerlying distributed platforms provide high scalability for visualization and processing. With the cluster

the MAP-Vis realizes sub-second data query and achieves good interactive visualization. Future work can be conducted in the following aspects. 1) This framework has strong support for point type data

but visual elements

including line type elements

polygon type elements

images

etc. should be considered compatible with other data types as much as possible. 2) A simple visual display cannot fully explore the law and value of big data. Hence

joining data analysis modules could be taken into consideration to make the MAP-Vis framework function complete.

关键词

Keywords

references

Li D R, Ma J, Shao Z F. The theory of space time big data and its application[J]. Satellite Application, 2015, (9):7-11.

李德仁, 马军, 邵振峰.论时空大数据及其应用[J].卫星应用, 2015, (09):7-11.

Ren L, Du Y, Ma S, et al. Visual analytics towards big data[J]. Journal of Software, 2014, 25(9):1909-1936.

任磊, 杜一, 马帅, 等.大数据可视分析综述[J].软件学报, 2014, 25(9):1909-1936.[DOI:10.13328/j.cnki.jos.004645]

Godfrey P, Gryz J, Lasek P. Interactive visualization of large data sets[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8):2142-2157.[DOI:10.1109/TKDE.2016.2557324]

Tigani J, Naidu S. Google BigQuery Analytics[M]. John Wiley&Sons, 2014.

Root C, Mostak T. MapD: a GPU-powered big data analytics and visualization platform[C ] //Proceedings of ACM SIGGRAPH 2016 Talks. Anaheim, California: ACM, 2016: #73.[ DOI: 10.1145/2897839.2927468 http://dx.doi.org/10.1145/2897839.2927468 ]

Gray J, Chaudhuri S, Bosworth A, et al. Data cube:a relational aggregation operator generalizing group-by, cross-tab, and sub-totals[J]. Data Mining and Knowledge Discovery, 1997, 1(1):29-53.[DOI:10.1023/A:1009726021843]

Sismanis Y, Deligiannakis A, Kotidis Y, et al. Hierarch ical dwarfs for the rollup cube[C ] //Proceedings of the 6th ACM International Workshop on Data Warehousing and OLAP. New Orleans, Louisiana, USA: ACM, 2003: 17-24.[ DOI: 10.1145/956060.956064 http://dx.doi.org/10.1145/956060.956064 ]

Liu Z, Jiang B, Heer J. imMens: real-time visual querying of big data[C]//Computer Graphics Forum. Oxford, UK: Blackwell Publishing Ltd, 2013, 32(3pt4): 421-430.

Lins L, Klosowski J T, Scheidegger C. Nanocubes for real-time exploration of spatiotemporal datasets[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(12):2456-2465.[DOI:10.1109/TVCG.2013.179]

Im J F, Villegas F G, Mcguffin M J. VisReduce: fast and responsive incremental information visualization of large datasets[C ] //Proceedings of 2013 IEEE International Conference on Big Data. Silicon Valley, CA, USA: IEEE, 2013: 25-32.[ DOI:10.1109/BigData.2013.6691710 http://dx.doi.org/10.1109/BigData.2013.6691710 ]

Cheng D L, Schretlen P, Kronenfeld N, et al. Tile based visual analytics for twitter big data exploratory analysis[C ] //Proceedings of 2013 IEEE International Conference on Big Data. Silicon Valley, CA, USA: IEEE, 2013: 2-4.[ DOI: 10.1109/BigData.2013.6691787 http://dx.doi.org/10.1109/BigData.2013.6691787 ]

Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computing with working sets[C]//Proceedings of the 2nd Usenix Conference on Hot Topics in Cloud Computing. Boston, MA, USA: ACM, 2010: 10.

Fox A, Eichelberger C, Hughes J, et al. Spatio-temporal indexing in non-relational distributed databases[C ] //Proceedings of 2013 IEEE International Conference on Big Data. Silicon Valley, CA, USA: IEEE, 2013: 291-299.[ DOI: 10.1109/BigData.2013.6691586 http://dx.doi.org/10.1109/BigData.2013.6691586 ]

Dimiduk N, Khurana A. HBase in Action[M]. New York:Manning Publications, 2012.

George L. HBase:the definitive guide[J]. Andre, 2011, 12(1):1-4.

Haklay M, Weber P. OpenStreetMap:user-generated street maps[J]. IEEE Pervasive Computing, 2008, 7(4):12-18.[DOI:10.1109/MPRV.2008.80]

Agafonkin V. Leaflet: an open-source JavaScript library for mobile-friendly interactive maps[EB/OL ] .[2 018-06-01 ] http://leafletjs.com http://leafletjs.com .

Bostock M, Ogievetsky V, Heer J. D 3 data-driven documents[J ] . IEEE Transactions on Visualization and Computer Graphics, 2011, 17(12):2301-2309.[DOI:10.1109/TVCG.2011.185 ]