Current Issue Cover
Docker容器化下的遥感算法程序集成方法

张杰1,2, 郑柯1, 唐娉1, 张正1, 李宏益1(1.中国科学院遥感与数字地球研究所, 北京 100101;2.中国科学院大学, 北京 100049)

摘 要
目的 近年来,随着我国遥感技术的快速发展,遥感数据呈现出大数据的特点,遥感数据的时效性增强,针对新环境下遥感算法编程语言众多,程序运行和部署环境需求多样,程序的集成和部署困难的问题,提出了一种遥感算法程序快速封装与Docker容器化系统集成架构。方法 该系统架构主要包括:1)遥感算法程序的镜像自动化封装制作;2)镜像的分发管理,达到算法程序镜像的共享;3)遥感信息产品生产流程的容器化编排服务,将相关联的算法程序镜像串联,以满足特定遥感信息产品的生产;4)容器的调度运行,调用镜像,实现特定遥感产品的容器化运行。本文在上述容器化系统集成架构下,以Landsat5数据的NDVI、NDWI信息产品的生产作为容器化生产实例,并同物理机、KVM (kernel-based virtual machine)虚拟机在运行时间、内存占用量、部署效率等性能进行了对比。结果 Docker容器虚拟化环境下的产品生产和物理机环境下在运行时间和内存占用量上几无差别,优于KVM虚拟机。Docker容器虚拟化环境和KVM虚拟机环境下在部署上能够节省大量时间,相比于物理机环境能够提高部署效率。结论 容器化的系统集成方式能够有效解决遥感算法程序集成和部署困难的问题,有利于遥感算法程序的复用和流程的共享,提高系统集成效率,具备较强的遥感数据实时快速处理能力。
关键词
Integration of remote sensing algorithm program using Docker container technology

Zhang Jie1,2, Zheng Ke1, Tang Ping1, Zhang Zheng1, Li Hongyi1(1.Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, China;2.University of Chinese Academy of Sciences, Beijing 100049, China)

Abstract
Objective Remote sensing data have obvious big data characteristics owing to the continuous improvement in the spatial, temporal, spectral, and radiometric resolution for remote sensing data. The seamless integration and deployment of remote sensing image processing algorithms are becoming a big challenge in the era of remote sensing big data and cloud computing due to diverse and complex remote sensing image processing algorithms. Virtualization technology provides a feasible solution to the above problems. Docker is a new open source virtualization container technology. Compared with traditional virtualization technologies, such as KVM (kernel-based virtual machine), Docker container is a virtualizing operating system and has the advantages of lightweight and resource efficiency. In this paper, we propose a system framework for rapid integration of remote sensing algorithms on the basis of Docker container. Method The framework consists of an automated image encapsulation mechanism for remote sensing algorithms, a unified image distribution management, a containerized orchestration service for production of remote sensing information product, and a container scheduling scheme about daemons. We use the Dockerfile files to package automatically the base image, program dependencies, and remote sensing algorithm programs to build a new image layer by layer. A container is made up of a number of readable layers and a readable and writable layer. The Docker image is a read-only Docker container template that contains the file system structure, and its contents needed to start the Docker container. An image is the basis for starting a container. The Docker image is a static view of the Docker container, which is the running state of the Docker image. We upload the image to the repository such as "DockerHub" via the "Docker push" command. Users on other machines download the corresponding image to the local via the "Docker pull" command. The image arrangement refers to the serial connection of the associated containers in a logical sequence according to the production of the remote sensing information product. The image is represented by a compose file, which consists of three parts:version, services, and networks. The compose file is used to reproduce and share about the running process of flow. We design a container operation scheme based on the JAVA platform. First, the user sends an order task to the back end through the front-end interactive interface and backend parsing commands. The backend creates a specific compose.yaml file based on the template file. Finally, the running image of the remote sensing algorithm program is containerized. Production of NDVI and NDWI information products based on Landsat5 data is an example of prototype system and containerized production. In this experiment, remote sensing data are distributed storage using the distributed file system GlusterFS. The system consists of three server hosts and one client host. The server host is used to store data in a distributed manner. The client host reads the data of the server host by mounting. The client host is also the environment where the Docker container is integrated and running. In this experiment, Landsat5 DN value data are used as input data. The final output data are the binarized product data. The algorithm programs used include radiation correction programs written in C++, typical feature inversion index programs written in Python, and binarization programs written in MATLAB. We perform computational and deployment performance experiments, such as runtime, memory usage, and deployment efficiency in Docker virtualization environment, KVM virtual machine environment, and physical machine environment. Deployment efficiency refers to the configuration complexity required to deploy the same multiple applications in different environments. Run time is the time that takes to cycle through the same application in three environments. Memory usage refers to the number of memory usage of the same application running in parallel in the three environments. Result Almost no differences in performance are observed in system load metrics, such as operational efficiency and memory footprint, running applications in Docker container virtualization environments, and physical machine environments. The three environments are better than the KVM virtual machine environment. However, when installing and deploying a production environment on a new machine, the Docker container environment and the KVM virtual machine environment can reduce the amount of configuration and facilitate program migration and reuse compared with the physical machine environment. Conclusion The containerized system integration method can solve effectively the problem of difficulty in the integration and deployment of remote sensing algorithm programs. The Docker container runs different remote sensing algorithm programs, which enable resource isolation and eliminate the problem of dependency conflicts among different programs. The deployment of applications using the Docker container enables one-click deployment and operation, which facilitates the reusability of remote sensing algorithm programs and facilitates software-level sharing. Our containerized framework is promising for improving the efficiency and seamlessness of system integration compared with physical and virtual machine environments.
Keywords

订阅号|日报