Privattacks modules
privattacks.attacks
- class privattacks.attacks.Attack(data: Data)[source]
Bases:
object- posterior_vulnerability(atk, qids, sensitive=[], distribution=False, combinations: list[int] | None = None, save_file=None, zip_save=False, n_processes=1, return_results=True, verbose=False)[source]
Posterior vulnerability.
- Parameters:
atk (str) – Either ‘ai’ for attribute inference attack, ‘reid’ for re-identification or ‘all’ for both attacks.
qids (list[str]) – List of quasi-identifiers.
sensitive (str or Sequence[str], optional) – A single or a list of sensitive attributes for attribute inference attack. Default is [].
distribution (bool, optional) – Whether to return the distribution of posterior vulnerability per record. Default is False.
combinations (list[int]) – Whether to run the attack for different subset of QIDs (instead of only the list of QIDs given in the parameter ‘qids’). It must be provided a list of subset sizes of QIDs. The attack will be run for all subset of QIDs of sizes present in the list.
zip_save (bool, optional) – Save the results in a zip file insteade of csv. Default is False.
save_file (str, optional) – File name to save the results. They will be saved in CSV format. Works only when ‘combinations’ is given.
n_processes (int, optional) – Number of processes to run the method in parallel using multiprocessing package. Default is 1. Works only when ‘combinations’ is given.
return_results (bool, optional) – Whether to return the results or not. Default is True. Works only when ‘combinations’ is given.
verbose (bool, optional) – Show the progress. Default is False. Works only when ‘combinations’ is given.
- Returns:
- float or (float, list): If distribution is False, returns the posterior vulnerability.
If distribution is True, returns a pair (<posterior vulnerability>, <distribution>). Example of output when distribution is False:
0.75Example of output when distribution is True:
(0.75, [0.5, 0.5, 1.0, 1.0, 0.75])
- if atk == ‘ai’:
dict[str, float] or (dict[str, list]): If distribution is False, returns a dictionary containing the posterior vulnerability for each sensitive attribute. If distribution is True, returns a pair
(<posterior vulnerability>, <distribution for each sensitive attribute>). Example of output when distribution is False:{'disease': 0.3455, 'income':0.7}
Example of ouput when distribution is True:
({'disease': 0.3455, 'income':0.7}, {'disease': [0.1, 0.1, 0.3, 0.4, 0.8275], 'income': [0.6, 0.7, 0.7, 0.7, 0.8]})
- if atk == ‘all’:
dict: Dictionary with values ‘reid’ and ‘ai’ and their respective posterior vulnerabilities.
- if combinations:
vulnerabilities: Pandas DataFrame with posterior vulnerabilities for all combination of n QIDs, where is the sizes provided in the parameter ‘combinations’.
- Return type:
if atk == ‘reid’
- prior_vulnerability(atk, sensitive=[])[source]
Prior vulnerability.
- Parameters:
atk (str) – Either ‘ai’ for attribute inference attack, ‘reid’ for re-identification or ‘all’ for both attacks. Default is [].
sensitive (str or Sequence[str], optional) – A single or a list of sensitive attributes for attribute inference attack.
- Returns:
float: Prior vulnerability.
- if atk == ‘ai’:
dict[str, float]: Dictionary containing the prior vulnerability for each sensitive attribute (keys are sensitive attribute names and values are posterior vulnerabilities).
- if atk == ‘all’:
dict: Dictionary with values ‘reid’ and ‘ai’ and their respective prior vulnerabilities.
- Return type:
if atk == ‘reid’
privattacks.data
- class privattacks.data.Data(file_name=None, cols=None, cols_to_ignore=None, sep_csv=',', encoding='utf-8', dataframe=None, matrix=None, domains=None, na_values=-1)[source]
Bases:
objectA class for handling datasets. The supported formats are ‘csv’, ‘rdata’ and ‘sas7bdat’.
- Parameters:
file_name (str, optional) – Dataset file path.
cols (list, optional) – Dataset columns. If not given when given file_name, read all columns in the file.
cols_to_ignore (list, optional) – Columns to ignore in the convertion to integers from 0 to domain_size-1. It must be used for columns with integer values only.
sep_csv (str, optional) – CSV delimiter, default is “,”.
encoding – (str, optional, default ‘utf-8’): Encoding to use for UTF when reading/writing (ex. ‘utf-8’, ‘latin1’).
dataframe (pandas.DataFrame, optional) – Pandas dataframe containing the dataset.
matrix (numpy.ndarray, optional) – Numpy 2d matrix containing the dataset.
domains (dict[str, list], optional) – Domain of columns. If not given, the domains will be taken from data. Keys are column names and values are lists.
na_values (int, optional) – Value to fill missing data (NaN) with, default is -1.
- dataset
Numpy matrix of integers.
- Type:
numpy.ndarray
- n_rows
Number of rows (records) in the dataset.
- Type:
int
- n_cols
Number of columns (attributes) in the dataset.
- Type:
int
- cols
List of column names in the dataset. The same order as the dataset matrix.
- Type:
list
- domains
Column domains. Keys are column names and values are lists. To generate the numpy matrix each original value will be converted to its index in the domain’s list.
- Type:
dict[str, list]
- df2np(dataframe: DataFrame) ndarray[source]
Converts a pandas dataframe to a numpy.ndarray. The matrix contains integers in “standard” type, i.e., for all column c, the original values from the domain of c are converted to integers from 0 to size(c). Each original value in a domain will be converted to the respective index the value is in the domain list. The method generates a numpy.ndarray.
- Parameters:
dataframe (pandas.DataFrame) – Dataset.
- Returns:
Dataset in standard type.
- Return type:
dataset (numpy.ndarray)
privattacks.util
- privattacks.util.create_histogram(ind_posteriors, bin_size=1) dict[source]
Generate a histogram of posterior vulnerabilities given partition sizes.
- Parameters:
ind_posteriors (-) – Individual posterior vulnerabilties for all records in the dataset.
bin_size (-) – Histogram bin size. For instance, if bin_size=5 then bin 0 = [0, 0.05), bin 2 = [0.05, 0.1), …, bin 19 = [0.95, 1]. Default is 5.
- Returns:
A dictionary containing the histogram. Keys are strings (e.g., ‘[0, 0.05)’, ‘[0.95,1]’) and values are the counts of the respective bins.
- Return type:
hist (dict)