Most existing data utility measures for confidentialised unit record files have certain shortcomings, i.e. are limited to continuous variables, to univariate utility assessment, and/or to local information loss measurements. This seminar will present a new user-centered global data utility measure and several integrated local univariate and bivariate data utility measures, all based on a benchmarking approach. Information loss and data utility in the model are calculated using various statistical tests and association measures, such as two-sample Kolmogorov Smirnov test, Chi-Square test (Cramer’s V), ANOVA F test (Eta Squared), Kruskal-Wallis H test (Epsilon Squared), Spearman Coefficient (Rho) and Pearson Correlation Coefficient (r). The next important steps in global data utility assessment should be developing an R package or programme code for measuring global data utility automatically and also to establish the relationship between univariate, bivariate and multivariate data utility of confidentialised data.
Location
Speakers
- Sebastian Kocar
Event Series
Contact
- CSRM Comms02 6125 1301