Current Issue Cover
二维码位流长度最小化算法

袁泰凌1,2, 徐昆1,2(1.清华大学计算机科学与技术系, 北京 100084;2.普适计算教育部重点实验室, 北京 100084)

摘 要
目的 快速响应矩阵码(quick response code,QR code)简称二维码,是一种由深色和浅色模块组成的正方形符号。给定输入数据,不同编码算法可能输出不同的位流。位流长度决定了二维码的版本,进而决定了二维码每条边上的模块数量。减小二维码的版本能够在不减小模块大小的前提下节省面积,或者在不改变面积的前提下增大模块大小。为了减小二维码面积、提高二维码识读率,本文提出了位流长度最小化算法。方法 首先,根据二维码位流可以分段切换编码模式的特点,归纳了6种编码状态;然后,根据二维码位流编码标准推导了状态转移关系,从而将位流长度最小化问题转换成动态规划问题;最后,通过求解动态规划问题,计算出最短位流。针对统一资源定位符(uniform resource locator,URL)类型数据,利用其部分字段对大小写不敏感、部分字段可以转义的性质,提出了统一资源定位符的最短位流计算算法,进一步缩短位流。结果 本文构建了一个测试集,包含603个编码了非URL数据的二维码,以及1 679个编码了URL数据的二维码。实验结果表明,本文算法与二维码标准相比,对于非URL测试集,位流长度减小的二维码占比9.1%,版本减小的二维码占比1.2%;对于URL测试集,位流长度减小的二维码占比98.4%,版本减小的二维码占比31.7%。结论 二维码位流长度最小化算法输出的位流长度最短,输出的二维码版本最小,能在兼容标准二维码解码器且不影响纠错能力的前提下提升二维码的数据容量。同时,本文算法运行速度快,易于使用,没有需要调节的参数。
关键词
Minimization of bit stream length of QR codes

Yuan Tailing1,2, Xu Kun1,2(1.Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;2.Key Laboratory of Pervasive Computing, Ministry of Education, Beijing 100084, China)

Abstract
Objective Quick response code (QR code) is a kind of widely used 2D barcode nowadays. A QR code is a square symbol consisting of dark and light modules. There is a great need to accommodate more data in the target area or encode the same input data in a smaller area. The input data is first encoded into a bit stream. Different encoding algorithms may output different bit streams via assigned input data. The length of bit stream determines the version of a QR code, and the version determines the amount of modules per side. A QR code with a smaller version takes a smaller area without the size of modules changing, or has a larger module size without changing the area. The bit stream consists of one or multiple segments, and the encoding mode of each segment is chosen from three modes separately, i.e., numeric mode, alphanumeric mode and byte mode. The numeric mode can only encode digits, and every 3 digits are encoded into 10 bits. The alphanumeric mode can encode digits, upper case letters, and 9 kinds of punctuations, and every 2 characters are encoded into 11 bits. The byte mode can encode any kind of binary data, but each byte is encoded into 8 bits. Compared to using a single mode to encode the overall input data, different modes interchange may result in a shorter bit stream. But different modes switching leads to redundancy on bit stream length. Method The key to minimizing the version of QR code is to minimize the length of bit stream. The minimization should balance the redundancy of data encoding and the mode switching efficiency. The QR code specification gives "optimization of bit stream length" in annex. However, the illustration has proposed that the optimization method may not output the minimum bit stream. The demonstration has mentioned algorithms as below. The first algorithm is called "minimization algorithm", which converts the minimization of bit stream length to a dynamic programming problem. The algorithm can output the bit stream with minimum length by resolving the dynamic programming problem. Time cost is bounded in a linear function to the length of input data. The second algorithm is called "URL minimization algorithm", which is optimized for cases that the input data is a uniform resource locator (URL) further. The customized optimization for URL makes use of two properties:1) the scheme field and the host field of a URL are case-insensitive, and 2) the scheme-specific part of a URL can be escaped. The properties have been guaranteed via request for comments (RFC) 1 738 and RFC 1 035. A lower case letter can only be encoded in byte mode. By converting a lower case letter in a case-insensitive field to an upper case letter, the letter can be encoded in either byte mode or alphanumeric mode. The property provides more options in dynamic programming. In addition, each character in the scheme-specific part of a URL may be converted to an escape sequence. The escape sequence contains 3 alphanumeric mode characters. Hence, it can be encoded in alphanumeric mode or a combination of alphanumeric mode and numeric mode. This kind of conversion also provides more choices in dynamic programming. The URL minimization algorithm takes advantage of such two kinds of conversions to calculate the bit stream of a URL with minimum length. Result A QR code data set (also called a test set) is constructed to verify the efficiency of the proposed algorithms. The test set is collected from 6 web image search engines, i.e., Baidu, Sogou, 360, Google, Bing, and Yahoo. QR codes with different bit streams or different error correction levels are regarded as different QR codes. The test set contains 2 282 distinct QR codes, where 603 QR codes encode non-URL data and the other 1 679 QR codes encode URL data. Four algorithms are compared on the test set, i.e., 1) the optimization method given in the QR code specification, 2) encoding input data in a single segment, 3) the minimization algorithm, and 4) the URL minimization algorithm. The bit stream lengths of initial QR codes are offered for comparison as well. The demonstrations show that for the non-URL test set, average bit stream length reduces 0.4% (compared to the optimization in the QR code specification), QR codes with reduced bit stream lengths account for 9.1%, and QR codes with reduced versions account for 1.2%; for the URL test set, average bit stream length reduces 13.9%, QR codes with reduced bit stream lengths account for 98.4%, and QR codes with reduced versions account for 31.7%. An ablation study on the two components of URL minimization has been implemented, i.e., 1) utilizing case-insensitive fields and 2) converting characters to escaping sequences. The calculating results have shown that each component has an effect and combining both components achieves the best performance, i.e., the minimum bit stream length and the minimum version. In the context of URL test set, average bit stream length reduces only 0.5% using the minimization algorithm, while the value reduces 9.1% using the URL minimization algorithm; QR codes with reduced bit stream lengths account for 10.4% using the minimization algorithm, but the value is 98.4% for the URL minimization algorithm; QR codes with reduced versions account for 1.3% using the minimization algorithm, while the value is 31.7% for the URL minimization algorithm. The average time cost to encode a message is 2.45 microsecond. Conclusion The proposed algorithms minimize the length of bit stream. Data capacity of QR codes has been increased without changing QR code format or revising the error correction capability. The URL minimization algorithm is qualified under the huge amounts of QR codes based on URL encoded data. The proposed algorithms are friendly-used, i.e., there are no hyper-parameters to be tuned, and users input data and error correction level only. The illustrated algorithms have been verified in a realistic running speed, and it is proved that the time costs of both algorithms are bounded in a linear function to the length of input data.
Keywords

订阅号|日报