文章摘要
苏 晨,邹云峰,祝宇楠,单 超.一种基于差分隐私的电力客户数据隐私保护聚类方法[J].电力需求侧管理,2023,25(2):101-106
一种基于差分隐私的电力客户数据隐私保护聚类方法
Privacy-preserving clustering method of power customer data based on differential privacy
投稿时间:2022-11-20  修订日期:2023-01-14
DOI:10.3969/j.issn.1009-1831.2023.02.016
中文关键词: 电力客户数据  隐私保护  加噪矩阵  k最近邻  差分隐私
英文关键词: power customer data  privacy preserve  noise matrix  k nearest neighbor  differential privacy
基金项目:国家电网有限公司科技项目(5700- 202018268A-0-0-00)
作者单位
苏 晨 东南大学 网络空间安全学院南京 211189 
邹云峰 国网江苏省电力有限公司营销服务中心南京 210036 
祝宇楠 国网江苏省电力有限公司营销服务中心南京 210036 
单 超 国网江苏省电力有限公司营销服务中心南京 210036 
摘要点击次数: 659
全文下载次数: 162
中文摘要:
      聚类在电网客户大数据分析中发挥着重要作用,随着我国《数据安全法》的颁布,如何在电力客户数据聚类中兼顾数据隐私和聚类质量,成为亟待解决的难点。针对已有的基于差分隐私的k-means聚类方法难以兼顾数据隐私与聚类质量问题,提出距离加噪扰动方法,通过提取数据距离并向距离数值添加满足差分约束的噪声,构建加噪矩阵,实现数据距离隐私保护;设计基于加噪矩阵的kq-means聚类方法,引入k最近邻概念,设计聚簇划分策略,将数据记录分配到距其最近的若干个中心点的期望区间,减小多轮迭代过程中差分噪声累积产生的聚类误差,从而支撑保护客户数据隐私的电网客户数据聚类。
英文摘要:
      Clustering plays an important role in the big data analysis of power grid customers. With the promulgation of China’s Data Security Law, how to give consideration to data privacy and clustering quality in power customer data clustering analysis has become a difficult point to be solved. The existing k-means clustering method based on differential privacy is difficult to balance data privacy and clustering quality. The method of adding noise to data distance is proposed. By extracting the data distance and adding noise satisfying differential privacy constraint to the distance value, a noise matrix is constructed to realize the privacy protection of the data distance. Furtherly, the kq-means clustering method based on noise matrix is designed, the concept of k nearest neighbor is introduced, and the clustering division strategy is designed. The data records are distributed to the expected intervals of several nearest central points, which reduces the clustering error caused by the accumulation of differential noise in the process of multiple iterations.Our solution can achieve both privacy and clustering accuracy of power grid customer data.
查看全文   查看/发表评论  下载PDF阅读器
关闭