Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the...

11
Working Paper ENGLISH ONLY UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE (UNECE) CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN UNION (EUROSTAT) Joint UNECE/Eurostat work session on statistical data confidentiality (Ottawa, Canada, 28-30 October 2013) Topic (v): Confidentiality issues and case studies Experiences of implementing Bifrost Prepared by Lars-Erik Almberg, Karin Andersson and Lei Sun, Statistics Sweden

Transcript of Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the...

Page 1: Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the program τ –ARGUS (τ-ARGUS Us-er’s Manual 2011) via SAS. The system consists of

Working Paper

ENGLISH ONLY

UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE (UNECE)

CONFERENCE OF EUROPEAN STATISTICIANS

EUROPEAN COMMISSION

STATISTICAL OFFICE OF THE EUROPEAN UNION (EUROSTAT)

Joint UNECE/Eurostat work session on statistical data confidentiality (Ottawa, Canada, 28-30 October 2013) Topic (v): Confidentiality issues and case studies

Experiences of implementing Bifrost Prepared by Lars-Erik Almberg, Karin Andersson and Lei Sun, Statistics Sweden

Page 2: Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the program τ –ARGUS (τ-ARGUS Us-er’s Manual 2011) via SAS. The system consists of

1

Experiences of implementing Bifrost Lars-Erik Almberg*, Karin Andersson** and Lei Sun***

* Process Department, Statistics Sweden, Örebro, S-701 89, Sweden, [email protected] ** Process Department, Statistics Sweden, Stockholm, P.O. 24 300, Sweden, [email protected] ***Process Department, Statistics Sweden, Stockholm, P.O. 24 300, Sweden, [email protected]

Abstract: Statistics Sweden has in recent years put a lot of effort into developing standard tools for different parts of the statistics production process. For statistical disclosure control, Statistics Sweden has developed SAS2Argus, a collection of SAS macros that facilitates the use of the program τ -ARGUS via SAS. The entire system, consisting of SAS2Argus, τ -ARGUS and Xpress, is called Bif-rost. The Swedish legislation guarantees secrecy for individuals and businesses. To be able to publish ta-bles where information about businesses can be disclosed, Statistics Sweden has to ask the businesses for consent. A letter is send to each business where they specify if they give consent or not. This paper discusses our experiences of implementing statistical disclosure control with Bifrost. The experiences, both positive and negative, are presented and a solution to the problem of integrating consent control into the process of disclosure control is described.

1 Introduction

Statistics Sweden has developed Bifrost1 as a standard tool for disclosure control. It is a great advantage if all surveys use the same system for disclosure control. First of all we can improve the quality. Instead of every survey developing their own solu-tion, we can now ensure that disclosure control is carried out with all the quality as-pects considered. Another advantage with a standard tool is that the system will be easier to support in the future. (See Jansson et al. 2010 for further reading about the background of the development of Bifrost.)

2 Bifrost

Bifrost is a system that facilitates the use of the program τ –ARGUS (τ-ARGUS Us-er’s Manual 2011) via SAS. The system consists of a collection of macros written in SAS (SAS2Argus), as well as the software τ -ARGUS and Xpress, a so-called LP-solver to get full functionality of τ –ARGUS (Kraftling 2011).

1 In Nordic mythology Bifrost was a bridge to the place where the gods were living.

Page 3: Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the program τ –ARGUS (τ-ARGUS Us-er’s Manual 2011) via SAS. The system consists of

Bifrto tsampreshaproalsosys

Fig In ateriwilandlayeaccble .NE Theseq

frost allows the main rulmple weightssion with d

adow variabotection patto possible totem with m

g 1. The Bifr

addition to tization of ml require so

d format of ter in Figure

cording to thto adapt Bi

ET, Excel, e

e main SASquence of ut

users to accles of risk ats, etc.) alsodifferent co

bles it is postern. (See Ho suppleme

more method

rost system

the design amethods, imp

me effort tothe specifice 1 above. The requiremifrost to a vaetc.

S macro in Btility macros

cess the fullssessment (

o includes opst functionssible to dea

Hundepool ent τ -ARGU

ds for risk as

aspect of theplementing o adjust the data. The r

The layer canments of the u

ariety of sys

Bifrost is SAs is called th

2

l functional(thresholds, ptimal protes and controal with moreet al. 2010 foUS with othssessment a

e disclosureBifrost in ainput and o

result of thisan be made muser. The flstems and e

AS2Argus. What include

ity of τ -ARp% and do

ections metholled roundie complex sor further re

her modules and protectio

e problem, ian existing poutput with s effort is ilmore or lesslexibility ofenvironment

When the msyntax chec

RGUS, whicminance rulhods (seconing). With thsituations byeading on mand thereby

on.

.e. selectionproduction erespect to blustrated ass advanced f SAS also mts, such as S

main macro cking, gener

ch in additioules, using ndary cell suhe so-calledy copying th

methods.) It y expand th

n and paramenvironmen

both contents an adaptioand automa

makes it posSAS, SQL,

is executedration of the

on

up-d he is

he

me-nt t n ated ssi-

d, a e

Page 4: Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the program τ –ARGUS (τ-ARGUS Us-er’s Manual 2011) via SAS. The system consists of

3

necessary input files, initiation of a batch run of τ -ARGUS, and processes that gen-erate the output files. The SAS2Argus macro is a so-called named style macro, where the parameters are named and values are assigned as arguments. Below is an exam-ple of how the SAS2Argus macro can be invoked. Note that not all possible parame-ters are described in the example. %sas2argus(Jobname = table1, InData = microdataset, Explanatory = Year Size, Response = Input, SafetyRule = P(20,1),

Suppress = MOD(1,10), Out = inter(1), RunArgus = 1, SAS = 2, Debug = 1 ); The Jobname parameter gives the named prefix to all files that τ -ARGUS creates so it is easy to see which files that are connected. The parameter InData specifies the name of the microdata set. If aggregated data is used, the parameter InTable has to be called instead of InData. RunArgus is a parameter that controls if τ -ARGUS is run or not, and if new text files should be created or not. The parameter SAS controls the import from τ -ARGUS to SAS. With the Debug parameter it is possible to get more information in the SAS log. The other parameters work in the same way as in τ -ARGUS. (For a more specific explanation of all possible parameters see Appendix 1.)

3 Consent

Data collected by the statistical authorities are protected by the official secrecy laws. According to chapter 24, paragraph 8 of the Secrecy Act (2009:400), confidentiality have to be applied in such specific activities of the authority which relate to prepara-tion of statistical data about an individual's personal or financial circumstances. The confidentiality is absolute, which means that data cannot be disclosed. The use of data for statistical or research purposes are possible as exceptions to the main rule. It is however possible to publish sensitive aggregated data if businesses give consent to do so. A letter is sent to each business where Statistics Sweden asks for their con-sent to publish tables, where information could be disclosed. Another approach is to first check which businesses cause suppressions in the tables and only send a request for consent to them. The problem here is that the businesses causing suppressions can change over time.

Page 5: Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the program τ –ARGUS (τ-ARGUS Us-er’s Manual 2011) via SAS. The system consists of

4

Below is an example how consent control in a table called table1 is done. First, ta-ble1 is checked to see which cells are sensitive. %sas2argus(Jobname = table1, InData = microdataset, Explanatory = Year Size, Response = Input, SafetyRule = P(20,1), Out = inter(1), RunArgus = 1, SAS = 2, Debug = 1 ); %save_table(table1,1); The result from the first SAS2Argus run with primary suppressions is saved and the macro consent_control controls if some of the unsafe cells could be set to safe. Con-sent_control checks if the businesses, whose information could be disclosed from an unsafe cell, have given their consent. If that is the case the cell should be set to safe. When every unsafe cell is checked, consent_control creates an apriori file with every unsafe cell that is to be set to safe. The consent file “medgiv” consists of only two columns. One column is the identity column for the businesses and the other is for the status of consent (0 or 1). Id speci-fies the name of the indentifying variable of the companies. If aggregated data is used, the parameter Tabledata specifies the name of the data set with consent for top1 and top2 and the parameters Microdata and Consentdata are left out. %consent_control(Microdata = microdataset, Consentdata = medgiv, Id = knr, Jobname = table1, SafetyRule = P, Safety_var = 20, Var1 = Year, Var2 = Size, Response = Input, ); The macro arb_file rewrites the arb file so that τ –ARGUS will include the apriori file and then do secondary suppression to protect the unsafe cells that are not protect-ed by consent. When SAS2Argus is run for the second time the parameter RunArgus is set to 2, which means that no new text files are created. %arb_file(table1,’<SUPPRESS> MOD(1,10)’);

Page 6: Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the program τ –ARGUS (τ-ARGUS Us-er’s Manual 2011) via SAS. The system consists of

5

%sas2argus(Jobname = table1, InData = microdataset, Explanatory = Year Size, Response = Input, SafetyRule = P(20,1), Out = inter(1), RunArgus = 2, SAS = 2, Debug = 1 ); %save_table(table1,2); The result from the second SAS2Argus run is saved and there are now three different versions of table1: table1 without disclosure control, table1 with only primary sup-pressions and table1 with primary and secondary suppressions after checking for consents. In the present version it is only possible to use the p%-rule and the (n,k)-dominance rule when working with consent control. With the (n,k)-dominance rule n can be maximum 2, but it is possible to have a combination of two rules, (1,k) and (2,k). The minimum frequency rule, with a minimum of 3, is of course automatically applied when the p%-rule or the (2,k)-dominance rule is used. When the unsafe cells are checked for consent, only the two largest contributors in each cell are controlled.

4 Experiences

At Statistics Sweden, Bifrost is so far implemented in about 10 surveys. Our experi-ence of implementing statistical disclosure control with Bifrost is that it is time con-suming. It is also difficult to predict how much time the implementations will take, depending on the number and the complexity of the tables. The last two years, Statistics Sweden has in order to promote the implementation of Bifrost, reserved money in a central pot for that purpose. This means that the surveys do not have to pay for the implementation. But the work in the adaption layer before and after the implementation of Bifrost has to be paid by the surveys. For example, as every table has to be created in Bifrost, it is necessary that micro data are in order and well defined before the implementation can start. Bifrost with consent control is now implemented in eight surveys and our experienc-es of that work are positive. Most of the surveys can reduce the number of primary suppressions a lot and some have even been able to eliminate all primary suppres-sions in their tables. There is however a lot of work, and thus expensive for the sur-veys, to collect consents from the businesses. The consents are also time-limited, which increases the workload even more.

Page 7: Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the program τ –ARGUS (τ-ARGUS Us-er’s Manual 2011) via SAS. The system consists of

6

At Statistics Sweden consent is collected from businesses. At the time being this is the only way to do it. When working with groups of businesses and consent control, every business in the same group has to have the same consent status. That is not always the case, some businesses in the same group can give their consent while oth-ers do not, but the group has to have either 0 or 1 in consent status. Groups of com-panies change over time and this can be a problem if consent has been sent only to the businesses that causes suppressions. This is especially a problem for the surveys with short time statistics, where the production schedule is tight. Because of these problems we usually do not work with consent control for groups of businesses. Four persons are working part time with implementing Bifrost at Statistics Sweden. This group meet regularly and discuss problems and solutions. One problem that is indentified is that τ -ARGUS always create tables with totals for the rows and the columns, but sometimes the tables that are to be protected are constructed with totals for only part of the table. The tables may for example only have row totals. If it is reasonable to assume that the missing column totals cannot be found anywhere else, the disclosure control can be done for each row separately. This leads to fewer sec-ondary suppressions. Another problem is how to protect linked tables with very complex links. This problem can be solved if there are enough consents to reduce the number of primary suppressions.

References Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Lenz, R., Naylor, J., Schulte Nordholt, E., Seri, G., and De Wolf, P.-P. (2010). Handbook on Statistical Disclosure Control version 1.2. ESSnet SDC. Jansson, I., Bernström, F. and Carlson, M. (2010). The Process of Practicing Statistical Disclosure Control in Tabular Data at Statistics Sweden. Paper presented at Q2010 European Conference on Quality in Official Statistics, Helsinki, Finland, May 2010. Kraftling, A. (2011). SAS2Argus user manual. Unpublished manuscript. Statistics Sweden. τ-ARGUS User’s Manual (2011). ESSnet project, Statistics Netherlands.

Page 8: Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the program τ –ARGUS (τ-ARGUS Us-er’s Manual 2011) via SAS. The system consists of

7

Appendix 1

Parameter Description

General parameters (system parameters)

JOBNAME A name that is used as a prefix for all text files that are created for τ‐ARGUS and used by τ‐Argus in a "job" / execution. Default, unless specified, is SAS2ARGUS. This makes it easier, in that sense that you are able to “see" which files that "belong together" in an identifiable context.

RUNARGUS An option that allows to control if: 0.   Only text files are created by the macro. τ‐Argus is not executed. 1.   Text files are created and τ‐Argus is executed (default) 2. Don’t create any text file ‐ execute only τ‐ARGUS on already created text files. This makes it possible to produce the text files first, edit the text files man‐ually and finally execute the manually edited text files. To overcome excep‐tional situations not supported by the macro for example.

Parameter Description

General parameters (system parameters)

DEBUG An option for providing a way to get more information incorporated in the SAS log: 0.  No additional information is written in the SAS log  1. Information is written to the SAS log and the log of τ‐Argus is also included the 

SAS‐log (default) Facilitates debugging and documentation as all available information from execution is found in the SAS log.

HELP Describes the macro and its parameters in the SAS log: 0. No information in the log (default) 1. The macro is described in the log and the macro stops (no execution). The macro is indeed documented in the script code, but this is an easy way to get access to a brief description

SAS An option that controls imports from τ‐ARGUS to SAS: 0. No import from τ‐ARGUS to SAS (default) 1. Imports the results report in HTML format from τ‐ARGUS and includes it in the 

SAS internal browser. 2. Also imports the output from the τ‐Argus in what is called the intermediate 

format to the SAS WORK. (This is the only format suitable for import into a SAS dataset or table.)

Page 9: Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the program τ –ARGUS (τ-ARGUS Us-er’s Manual 2011) via SAS. The system consists of

8

Parameters that define the input data into τ‐ARGUS (One of these may be selected, ei‐ther INPUT or INTABLE)

 InData

Specifies the name of SAS data sets of micro data. Note that this also can be an SQL table. All data sources that SAS supports with access methods can be used. Must be specified, either InData or InTable.

InTable Specifies the name of SAS data sets for already aggregated data. Note that this also can be an SQL table. All data sources that SAS supports with access methods can be used. However, the data must often "be prepared" in some way and aggregate data must at least have information about the frequen‐cy in each cell to be useful as input to risk assessment and suppression.

Risk assessment and secondary suppression

SafetyRule Specifies the method of risk assessment to be used. This/these arguments are not checked in a "preventive way" by the macro. Study the τ‐ARGUS‐manual for valid arguments. Must be defined, unless an exception in the case cell status (variable name: Status) is available and thus the risk as‐sessment is already made.

Suppress Specifies the method for suppression to be used. This/these arguments ar not checked in a "preventive way"by the macro. Study the τ‐ARGUS‐manual for valid arguments. If this argument is omitted, it means that only a risk as‐sessment is done. 

Variables and their roles ‐ Generic for both micro data and aggregated data

Explanatory Specifies the name / names of the so‐called explanatory variables or di‐mensional variables that "spans the table." Must be defined. Along with the argument two options has been implemented. If the variable is hierar‐chical, one can in subsequent brackets add a description of how it is hier‐archical, or in which text file that description can be found. Note that vari‐able names specified with spaces as separators if there is more than one in a list.

Response This specifies the name of response variable. Must be defined.

Shadow The name of any shadow variable. A company's turnover could be such a "help variable". If not specified, then τ‐ARGUS uses the Response varia‐ble.

Cost Identification of potential cost variable. If not specified, then τ‐ARGUS uses 

Page 10: Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the program τ –ARGUS (τ-ARGUS Us-er’s Manual 2011) via SAS. The system consists of

9

the Response variable.

Lambda Transformation Parameters used in a "simplified Box Cox function" as an exponent of the cost (COST). Default = 1.

Variables and their roles ‐ specific for aggregated data

Frequency The name of the variable that describes the frequency. Must be defined for aggregated data. Otherwise peculiar result can be produced since τ‐ARGUS tries to compute the frequency.

LowerLevel The name of the variable indicating the lowest level of "protection inter‐vals".

UpperLevel The name of the variable indicating the highest level of "protection inter‐vals".

MaxScore The name of variables that holds the single highest contributors in each cell. Used in magnitude tables when the dominance rule is applied to pre‐aggregated tables. The largest contributors can be computed with PROC MEANS. There is a utility macro that can do this; Calculate_TopN.sas.

Status The name of any variable that indicates status.: Status (value) can then typ‐ically be: S = Safe U = Unsafe P = Protected

TotCode A constant that specifies/indicates which value that represent the total of an aggregate table. Default character is 'T'.

Variables and their roles ‐ specific for micro‐data

Weight The name of the variable that contains weight.

Holding The name of the variable that contains information about the corporate group. When observations belonging to the same corporate group should be grouped together in the input file.

Request The name of the variable that indicates the status when the respondent has requested protection of data or not. Inverse of consent.

Selection of output from τ‐Argus

Page 11: Experiences of implementing Bifrost - UNECE Homepage is a system that facilitates the use of the program τ –ARGUS (τ-ARGUS Us-er’s Manual 2011) via SAS. The system consists of

10

Out Possible choices:   TABLE ()   => VarName delimiter (,) Primary (x) Secondary (‐)   PIVOT (0)  => VarName No Status   PIVOT (1)  => VarName Status   CODE (0)   => NoName delimiter (,) Primary (‐) Secondary (x) No Status   CODE (1)   => NoName delimiter (,) Primary (Part) Secondary (x) No Status   CODE (2)   => NoName delimiter (,) Primary (‐) status (1,5,11,14)   CODE (3)   => NoName delimiter (,) Primary (Part) Status (1.11)   SBS()   => NoName delimiter (,) Exp, 0, Exp, 0 .. zero(deleted) Sta‐tus(V,D,A)         zero (deleted) Status (V, D, A)   INTER (0)  => NoName delimiter (;) status only (S, M, U)   INTER (1)  => NoName delimiter (;) Status (S, M, U)

If the parameter SAS=1 then we import PIVOT and INTER (if choosen). The “easiest” way may be to experiment in order to understand all the different options for out‐put. Intermediate may be considered most informative and useful.

Comment: VarName means that variable names can be found in the file. NoName means that no variable names are found in the text file. Characters used as delimit‐ers are listed in parentheses after Delimiter. Characters that replace primary sup‐pressed values are given in parentheses after Primary. Characters that replace sec‐ondary suppressed values are given in parentheses after Secondary. Status / NoSta‐tus indicates whether the cell status is reported in the output or not.