Data Partition Based on Attribute Similarity to Preserve Data Privacy

Author(s):  
Dewei Peng ◽  
Yong Hu ◽  
Hongxia Wang ◽  
Feng Geng
Keyword(s):  
2020 ◽  
Author(s):  
Yiu-ming Cheung ◽  
Feng Yu

In the cross-silo federated learning setting, one kind of data partition according to features, which is so-called vertical federated learning (i.e. feature-wise federated learning) (Yang et al. 2019), is to apply to multiple datasets that share the same sample ID space but different feature spaces. Simultaneously, the image dataset can also be partitioned according to labels. To improve the model performance of the isolated parties based on feature-wise (i.e. label-wise) results, the most effective method is to federate the model results of the isolated parties together. However, it is a non-trivial task to allow the participating parties to share the model results without violating the data privacy of the parties. In this paper, within the framework of principal component analysis (PCA), we propose a Federated-PCA machine learning approach, in which the PCA method is used to reduce the dimensionality of sample data for all parties and extract the principal component feature information to improve the efficiency of subsequent training work. This process will not reveal the original data information of each party. The federal system can help each side build a common profit strategy. Under this federal mechanism, the identity and status of each party are the same. By comparing the federated results of the isolated parties and the result of the unseparated party through multiple sets of comparative experiments, we find that the experimental results of these two settings are close, and the proposed method can effectively improve the training model performance of most participating parties.


2020 ◽  
Author(s):  
Yiu-ming Cheung ◽  
Feng Yu

In the cross-silo federated learning setting, one kind of data partition according to features, which is so-called vertical federated learning (i.e. feature-wise federated learning) (Yang et al. 2019), is to apply to multiple datasets that share the same sample ID space but different feature spaces. Simultaneously, the image dataset can also be partitioned according to labels. To improve the model performance of the isolated parties based on feature-wise (i.e. label-wise) results, the most effective method is to federate the model results of the isolated parties together. However, it is a non-trivial task to allow the participating parties to share the model results without violating the data privacy of the parties. In this paper, within the framework of principal component analysis (PCA), we propose a Federated-PCA machine learning approach, in which the PCA method is used to reduce the dimensionality of sample data for all parties and extract the principal component feature information to improve the efficiency of subsequent training work. This process will not reveal the original data information of each party. The federal system can help each side build a common profit strategy. Under this federal mechanism, the identity and status of each party are the same. By comparing the federated results of the isolated parties and the result of the unseparated party through multiple sets of comparative experiments, we find that the experimental results of these two settings are close, and the proposed method can effectively improve the training model performance of most participating parties.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Danfeng Zhao ◽  
Wei Zhao ◽  
Le Sun ◽  
Dongmei Huang

Business data has been one of the current and future research frontiers, with such big data characteristics as high-volume, high-velocity, high-privacy, and so forth. Most corporations view their business data as a valuable asset and make efforts on the development and optimal utilization on these data. Unfortunately, data management technology at present has been lagging behind the requirements of business big data era. Based on previous business process knowledge, a lifecycle of business data is modeled to achieve consistent description between the data and processes. On this basis, a business data partition method based on user interest is proposed which aims to get minimum number of interferential tuples. Then, to balance data privacy and data transmission cost, our strategy is to explore techniques to execute SQL queries over encrypted business data, split the computations of queries across the server and the client, and optimize the queries with syntax tree. Finally, an instance is provided to verify the usefulness and availability of the proposed method.


Author(s):  
P. Sudheer ◽  
T. Lakshmi Surekha

Cloud computing is a revolutionary computing paradigm, which enables flexible, on-demand, and low-cost usage of computing resources, but the data is outsourced to some cloud servers, and various privacy concerns emerge from it. Various schemes based on the attribute-based encryption have been to secure the cloud storage. Data content privacy. A semi anonymous privilege control scheme AnonyControl to address not only the data privacy. But also the user identity privacy. AnonyControl decentralizes the central authority to limit the identity leakage and thus achieves semi anonymity. The  Anonymity –F which fully prevent the identity leakage and achieve the full anonymity.


Sign in / Sign up

Export Citation Format

Share Document