Current Issue Cover
大小模型端云协同进化技术进展

王永威1, 沈弢1, 张圣宇1, 吴帆2, 赵洲1, 蔡海滨3, 吕承飞4, 马利庄2, 杨承磊5, 吴飞1(1.浙江大学;2.上海交通大学;3.华东师范大学;4.阿里巴巴有限公司;5.山东大学)

摘 要
生成式基座大模型正在引发人工智能领域的重大变革,在自然语言处理、多模态理解与内容合成等任务展现通用能力。大模型部署于云侧提供通用智能服务,但面临时延大、个性化不足等关键挑战,小模型部署于端侧捕捉个性化场景数据,但存在泛化性不足难题。大小模型协同技术旨在结合大模型通用能力和小模型专用能力,以协同交互方式学习演化进而赋能下游垂直行业场景。本文以大语言模型和多模态大模型为代表梳理生成式基座大模型的主流架构、典型预训练技术和适配微调等方法,介绍在大模型背景下模型剪枝、模型量化、知识蒸馏等大模型小型化关键技术的发展历史和研究近况,依据模型间协作目的及协同原理异同,提出大小模型协同训练、协同推理、协同规划的协同进化分类方法,概述端云模型双向蒸馏、模块化设计、生成式智能体等系列代表性新技术、新思路。总体而言,本文从生成式基座大模型、大模型小型化技术、大小模型协同方式三个方面探讨大小模型协同进化的国际和国内发展现状,对比优势和差距,并从应用前景、模型架构设计、垂直领域模型融合、个性化、安全可信挑战等层面分析基座赋能发展趋势。
关键词
Advances in edge-cloud collaboration and evolution for large-small models

Wang Yongwei, Shen Tao1, Zhang Shengyu1, Wu Fan2, Zhao Zhou1, Cai Haibin3, Lv Chengfei4, Ma Lizhuang2, Yang Chenglei5, Wu Fei1(1.Zhejiang University;2.Shanghai Jiao Tong University;3.East China Normal University;4.Alibaba Group;5.Shandong University)

Abstract
Generative foundation models are facilitating significant transformations in the field of artificial intelligence, demonstrating general artificial intelligence in diverse research fields including natural language processing, multimodal content understanding, imagery and multimodal content synthesis. Generative foundation models often consist of billions or even hundreds of billions of parameters, thus they are often deployed on the cloud side to provide powerful and general intelligent services. In practice, however, this type of service can be confronted with crucial challenges such as high latency induced by communications between the cloud and local devices, and insufficient personalization capabilities due to the fact that servers often do not have access to local data considering privacy concerns. In contrast, low-complexity lightweight models are located at the edge side to capture personalized and dynamic scenario data, yet they may suffer from poor generalization. Large and lightweight (or large-small) model collaboration aims to integrate the general intelligence of large foundation models and the personalized intelligence of small lightweight models, empowering downstream vertical domain-specific applications through the interaction and collaboration of both types of intelligent models. Large and small model collaboration has recently attracted increasing attention and become the focus of research and development in both academia and industry, and it was predicted to be an important trend in technology. We therefore try to make a thorough investigation of this area, highlighting recent progress and bringing potential inspirations for related research. In this study, we first overview representative large language models and large multimodal models, focusing on their mainstream Transformer-based model architectures including encoder-only models, decoder-only models and encoder-decoder models, corresponding pre-training technologies such as next sentence prediction, sequence-to-sequence modeling, contrastive learning, and parameter-efficient fine-tuning methods with representatives as low-rank adaptation and prompt tuning. We then review the development history and the latest advancement of model compression techniques, including model pruning, model quantization, and knowledge distillation in the era of foundation models. Based on the differences in terms of model collaboration purposes and mechanisms, we propose a new classification method as well as taxonomies for the large-small model collaboration study, namely, collaborative training, collaborative inference, and collaborative planning. More specifically, we summarize recent and representative methods that consist of dual-directional knowledge distillation between large models at the cloud side and small models deployed at the edge side, modular design of intelligent models that split functional models between the cloud and edge, and generative Agents which collaborate to complete more complex tasks in an autonomous and intelligent manner. In collaborative training, a main challenge is how to deal with the heterogeneity in data distribution as well as model architectures between the cloud side and the client side. Besides, data privacy may also be a concern during collaborative training, particularly in privacy sensitive cases. Despite much progress in collaborative inference, it remains challenging how to slice and complete a complicated task in a collective way automatically. Besides, the communication costs between computing facilities might be another concern. Collective planning is a new paradigm that gains attention with the increasing study and promising progress of large language model-centric agents (LLM Agents). This paradigm often involves multiple LLM Agents who compete or cooperative together to complete a challenging task. It often leverages emerging capabilities such as in-context learning and chain-of-thoughts of large language models to automatically dive a complicated task into several subtasks. By completing and assembling different subtasks, the global task can be done in a collaborative manner. This scheme finds diverse applications such as developing games and simulating social societies, yet, it may suffer from drawbacks inherent in large language models, including hallucination and adversarial vulnerabilities. Thus, more robust and reliable collaborative planning schemes remain to be investigated. In summary, this work surveys the large-small model collaboration techniques from the perspectives of generative foundation models, model compression, and heterogeneous model collaboration via large language model-centric agents. Besides, this work compares the advantages as well as disadvantages between international and domestic technology developments in this research realm. We conclude that even though the gaps are narrowing between domestic and advanced international studies in this area, particularly for newly emerging LLM Agents, we may still lack original and major breakthroughs. Some notable advantages of domestic progress are closely related with industrial applications due to its rich data resources from industries. Therefore, the development of domain specific large language models is advanced. In addition, this study envisions the applications of large-small model collaboration and discuss some key challenges as well as promising directions in this topic including: 1) the design of efficient model architectures such as how to develop new model architectures that can achieve low-complexity inference speed while maintaining efficient long-sequence modeling abilities as Transformers and how to further improve the scalability of mixture-of-experts based architectures, 2) current model compression methods were mainly designed for vision models, then it is important to develop techniques specifically for large language models and large multimodal models to preserve their emergent abilities during compression, 3) existing personalization methods specially focus on discriminative models, and due attention needs to be paid for efficient personalization for generative foundation models, 4) generative intelligence often suffers from fraudulent contents such as generated fake imagery and deepfake videos, fake news, and different types of attacks such as adversarial attacks, the jailing breaking attacks, the Byzantine attacks, thus raising the security and trustworthy issues arising in the practical applications. Therefore this study also advocates a deeper investigation of these emerging security threats, then develops effective defenses accordingly to countermeasure these crucial issues during large-small model collaboration to empower vertical domains more safely.
Keywords

订阅号|日报