潘建珊1, 林立2,3, 吴洁伟2, 刘翼翔2, 陈孝华1, 林其友1, 黄建业3, 唐晓颖2(1.深圳市公共信用中心;2.南方科技大学电子与电气工程系;3.香港大学电机电子工程系)
目的 联邦学习允许多个机构在不侵犯数据隐私、安全的前提下协作训练强大的深度模型。现有多数联邦范式在处理多中心不同数据分布时性能通常会下降，且弱监督条件下的联邦范式鲜有研究，特别是各站点数据采用不同形式的稀疏标注的情况。针对该问题，本文提出一种站点分布相似度感知知识蒸馏的统一弱监督个性化联邦学习框架(pFedWSD)，以应对多中心数据分布和标注上的差异。 方法 所提出的pFedWSD通过循环知识蒸馏为每个站点训练个性化模型，包含动态循环公共知识积累及个性化两个阶段。第一阶段以不确定度感知方式动态地排序每轮训练中各站点模型性能，并以循环知识蒸馏的形式积累公共知识；第二阶段通过批标准化层的统计信息来度量各站点间相似性并聚合得到各站点教师模型并进行知识蒸馏。在弱监督方面，引入门控条件随机场损失和树能量损失相结合的训练目标，以产生更为精确的伪标注监督信号。 结果 在眼底视杯视盘分割和视网膜中心凹无血管区分割两项任务中，pFedWSD的Dice系数和HD95指标均优于多种中心式联邦和个性化联邦方法，在两项任务中分别取得90.38(%)和93.12(%)的Dice系数，相较此前先进的方法FedAP和FedALA分别获得了1.67(%)和6.56(%)的提升，性能接近于全监督集中式训练所得的模型。 结论 本文所提出的弱监督个性化联邦学习框架能有效统一不同形式稀疏标注数据并对不同分布的各站点数据训练得到个性化模型，使各站点分割性能均得到显著提升。
pFedWSD: Unified weakly supervised personalized federated image segmentation via similarity-aware distillation
(Department of Electronic and Electrical Engineering, Southern University of Science and Technology)
Objective Federated learning (FL) allows multiple healthcare institutions to collaboratively train a powerful deep learning model without compromising data privacy and security (i.e., centralizing data). However, it is extremely challenging to employ a single model to accommodate the diverse data distributions from different sites. It is a common scenario that the performance of existing FL approaches degrades when there exist huge distribution gaps across sites. Additionally, previous works pay little attention to FL under weak supervision, especially under the supervision of different of sparsely-grained forms (i.e., point-, bounding box-, scribble-, block-wise). Weakly supervised FL is more clinically practical but more challenging. To address this issue, we here propose a unified and weakly-supervised personalized FL framework, targeting medical image segmentation, based on similarity-aware knowledge distillation across multiple sites, named pFedWSD. We aim to accommodate the domain gaps and annotation drifts across multiple sites and enhance the segmentation model"s performance for each site. Method The proposed pFedWSD trains a personalized model for each site via cyclic knowledge distillation, which consists of two stages: uncertainty-aware dynamic and cyclic common knowledge accumulation and similarity-aware personalization. In the first stage, during each training round, the performance of each site"s model is dynamically ranked in an uncertainty-aware manner, and common knowledge is accumulated in the form of cyclic knowledge distillation. In the second stage, the similarity between every two sites is measured and aggregated based on statistics from the batch normalization layers to attain a teacher model for each site and perform knowledge distillation. As for weakly-supervised learning, a combination of a partial cross-entropy loss, a gated conditional random field (CRF) loss and a tree energy loss is employed. Specifically, the partial cross-entropy loss is employed for supervising the annotated regions, ensuring informative guidance. The tree energy loss establishes pairwise affinities based on the preserved characteristics of high and low semantic spatial structures for a same object. This, in conjunction with the model"s predictions, generates soft pseudo-labels for the unlabeled regions. Through continuous online training and refining, the model"s predictions and the delivered pseudo-annotations gradually improve over time. Furthermore, the gated CRF loss serves as a regularization term, effectively curbing potential issues of excessive expansion or contraction of the target regions’ pseudo-labels that may arise from solely employing the tree energy loss. This adeptly consolidates diverse sparsely annotated data for training, facilitating real-time generations of additional pseudo proposals, and consequently attaining exceptional segmentation performance without necessitating supplementary supervised data, iterative optimization, nor time-intensive post-processing. To the best of our knowledge, pFedWSD stands as a pioneering weakly supervised personalized federated learning approach for medical image segmentation, adeptly implemented under heterogeneous annotation settings on multiple client devices. Result We create two datasets (from multiple publicly available datasets) each with five subsets serving as five different sites, respectively for optic cup/disc (ODOC) segmentation and retinal foveal avascular zone (FAZ) segmentation. Quantitative and qualitative experimental results show that pFedWSD outperforms representative state-of-the-art (SOTA) centralized and personalized FL methods in terms of Dice coefficients and HD95 statistics. The proposed pFedWSD achieves an average Dice coefficient of 90.38% on the ODOC segmentation task, exhibiting a remarkable improvement of 1.67% over the previous best-performing method. Moreover, it demonstrates a marginal difference of only 0.58% compared to local training under full supervision and a slight gap of merely 1.23% compared to centralized training under full supervision. Regarding the FAZ segmentation task, the proposed method achieves an impressive average Dice coefficient of 93.12%, showcasing a substantial improvement of 6.56% over the previous state-of-the-art method. Furthermore, it only exhibits a marginal difference of 0.5% compared to local training under full supervision and a mere 0.86% difference compared to centralized training under full supervision. Conclusion The proposed weakly-supervised and personalized FL framework pFedWSD can effectively unify different forms of sparsely labeled data and train personalized models that well adapt to different data distributions, with superior segmentation performance having been established. Our pFedWSD demonstrates its effectiveness through achieving optimal performance on both ODOC and FAZ segmentation tasks across datasets from multiple centers, with the overall performance closely approaching that of local training or centralized training using fully supervised labels. Through extensive ablation experiments, the importance and efficacy of each stage in pFedWSD and each component in the weakly supervised composite objective are evident. Moreover, by conducting site-ablation experiments, we analyze the contribution of each site to the federation, providing valuable guidance for medical institutions regarding the appropriate data volume and the sparse annotation form when participating in federated learning. The primary future research directions encompass further reducing the communication and computation overhead, as well as integrating universal large model training paradigms like prompt learning, to concurrently foster our proposed framework"s generalization performance and adaptive personalization capacity towards diverse data distributions.