严昊1, 刘禹良1, 金连文2, 白翔1(1.华中科技大学人工智能与自动化学院, 武汉 430074;2.华南理工大学电子与信息学院, 广州 510640)
生成式人工智能技术自 ChatGPT 发布以来,不断突破瓶颈,吸引了资本规模投入、多领域革命和政府重点关注。本文首先分析了大模型的发展动态、应用现状和前景,然后从以下 3 个方面对大模型相关技术进行了简要介绍:1)概述了大模型相关构造技术,包括构造流程、研究现状和优化技术;2)总结了 3 类当前主流图像-文本的大模型多模态技术;3)介绍了根据评估方式不同而划分的 3 类大模型评估基准。参数优化与数据集构建是大模型产品普及与技术迭代的核心问题;多模态能力是大模型重要发展方向之一;设立评估基准是比较与约束大模型的关键方法。此外,本文还讨论了现有相关技术面临的挑战与未来可能的发展方向。现阶段的大模型产品已有强大的理解能力和创造能力,在教育、医疗和金融等领域已展现出广阔的应用前景。但同时,它们也存在训练部署困难、专业知识不足和安全隐患等问题。因此,完善参数优化、优质数据集构建、多模态等技术,并建立统一、全面、便捷的评估基准,将成为大模型突破现有局限的关键。
The development,application,and future of LLM similar to ChatGPT
Yan Hao1, Liu Yuliang1, Jin Lianwen2, Bai Xiang1(1.School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China;2.School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510640, China)
Generative artificial intelligence(AI)technology has achieved remarkable breakthroughs and advances in its intelligence level since the release of ChatGPT several months ago, especially in terms of its scope, automation, and intelligence.The rising popularity of generative AI attracts capital inflows and promotes the innovation of various fields.Moreover, governments worldwide pay considerable attention to generative AI and hold different attitudes toward it.The US government maintains a relatively relaxed attitude to stay ahead in the global technological arena, while European countries are conservative and are concerned about data privacy in large language models(LLMs).The Chinese government attaches great importance to AI and LLMs but also emphasizes the regulatory issues.With the growing influence of ChatGPT and its competitors and the rapid development of generative AI technology, conducting a deep analysis of them becomes necessary.This paper first provides an in-depth analysis of the development, application, and prospects of generative AI.Various types of LLMs have emerged as a series of remarkable technological products that have demonstrated versatile capabilities across multiple domains, such as education, medicine, finance, law, programming, and paper writing.These models are usually fine-tuned on the basis of general LLMs, with the aim of endowing the large models with additional domainspecific knowledge and enhanced adaptability to a specific domain.LLMs(e.g., GPT-4)have achieved rapid improvements in the past few months in terms of professional knowledge, reasoning, coding, credibility, security, transferability, and multimodality.Then, the technical contribution of generative AI technology is briefly introduced from four aspects:1) we review the related work on LLMs, such as GPT-4, PaLM2, ERNIE Bot, and their construction pipeline, which involves the training of base and assistant models.The base models store a large amount of linguistic knowledge, while the assistant models acquire stronger comprehension and generation capabilities after a series of fine-tuning.2)We outline a series of public LLMs based on LLaMA, a framework for building lightweight and memory-efficient LLMs, including Alpaca, Vicuna, Koala, and Baize, as well as the key technologies for building LLMs with low memory and computation requirements, consisting of low-rank adaptation, Self-instruct, and automatic prompt engineer.3)We summarize three types of existing mainstream image -text multimodal techniques:training additional adaptation layers to align visual modules and language models, multimodal instruction fine-tuning, and LLM serving as the center of understanding.4)We introduce three types of LLM evaluation benchmarks based on different implementation methods, namely, manual evaluation, automatic evaluation, and LLM evaluation.Parameter optimization and fine-tuning dataset construction are crucial for the popularization and innovation of generative AI products because they can significantly reduce the training cost and computational resource consumption of LLMs while enhancing the diversity and generalization ability of LLMs.Multimodal capability is the future trend of generative AI because multimodal models have the ability to integrate information from multiple perceptual dimensions, which is consistent with human cognition.Evaluation benchmarks are the key methods to compare and constrain the models of generative AI, given that they can efficiently measure and optimize the performance and generalization ability of LLMs and reveal their strengths and limitations.In conclusion, improving parameter optimization, highquality dataset construction, multimodal, and other technologies and establishing a unified, comprehensive, and convenient evaluation benchmark will be the key to achieving further development in generative AI.Furthermore, the current challenges and possible future directions of the related technologies are discussed in this paper.Existing generative AI products have considerable creativity, understanding, and intelligence and have shown broad application prospects in various fields, such as empowering content creation, innovating interactive experience, creating "digital life, " serving as smart home and family assistants, and realizing autonomous driving and intelligent car interaction.However, LLMs still exhibit some limitations, such as lack of high-quality training data, susceptibility to hallucinations, output factual errors, uninterpretability, high training and deployment costs, and security and privacy issues.Therefore, the potential research directions can be divided into three aspects:1)the data aspect focuses on the input and output of LLMs, including the construction of general tuning instruction datasets and domain-specific knowledge datasets.2)The technical aspect improves the internal structure and function of LLMs, including the training, multimodality, principle innovation, and structure pruning of LLMs.3)The application aspect enhances the practical effect and application value of LLMs, including security enhancement, evaluation system development, and LLM application engineering implementation.The advancement of generative AI has provided remarkable benefits for economic development.However, it also entails new opportunities and challenges for various stakeholders, especially the industry and the general public.On the one hand, the industry needs to foster a large pool of researchers who can conduct systematic and cutting-edge research on generative AI technologies, which are constantly improving and innovating.On the other hand, the general public needs to acquire and apply the skills of prompt engineering, which can enable them to utilize existing LLMs effectively and efficiently.