Analyzing repeated use of blackbox algorithm with Rachel Cummings

Coupling techniques that I wish to apply to solve my problem is from privacy fields so I was looking forward to this talk.

I asked question on how to measure the noise of the overall system with a known bound on noise of its subcomponents and she pointed me to sec.3.5 here. From the section, I found the following could be related to my problem:

Repeated use of differentially private algorithms on the same
database. This allows both the repeated use of the same
mechanism multiple times, as well as the modular construction of
differentially private algorithms from arbitrary private building
blocks.
Repeated use of differentially private algorithms on different
databases that may nevertheless contain information relating to
the same individual. This allows us to reason about the cumulative privacy loss of a single individual whose data might be
spread across multiple data sets, each of which may be used independently in a differentially private way. Since new databases are
created all the time, and the adversary may actually influence the
makeup of these new databases, this is a fundamentally different
problem than repeatedly querying a single, fixed, database.

The above link includes the context of my question, but to elaborate on its relation with repeated use of certain algorithms, I need to analyze the bound of compositions happening in a procedure that tests the fit between likelihood and posterior sampling method. This procedure is called simulation-based calibration (paper), a standard test for the new sampling algorithm (at least in open-source Bayesian community, Stan). “Repetition” comes in as I wish to extend this testing mechanism into an algorithm that finds the well-calibrated prior.

Rachel mentioned it is usual in privacy settings to break down the whole system into the building blocks. My two building blocks are likelihood and posterior sampling methods and proposition 3.6 from this paper is one example of my known building block for the latter. The proposition suggests that difference in predictive distribution is bounded by difference in posterior difference (so the effect of one likelihood).

So, to write this in terms of operators, I am analyzing the effect of $h^n$ when $h(X, U) = g(f(X, U), X)$ or $g(f(X, U), X, U)$. U is uniform random variable that represents randomness at data generation step or computation (if stochastic estimator is used) and X is a prior distribution. One difficulty in formulating this problem is that g is adaptive which is natural for sampling algorithms where step size or penalty coefficients change.

The following are slides from this talk which I found interesting.

The goal of data simulation is returning a synthetic database that share the identical properties with the original database.

2. Formulation of algorithm in its effect on the outcome

The following quantifies how the output of the algorithm is transformed w.r.t a unit change in input data. This abstraction can benefit my attempt to quantify the effect of one operation of a particular algorithm (adaptive posterior sampler) on the data and its future inference; to be precise $\frac{\partial y}{\partial \theta}$ from outcome and parameter joint space. However, $y$ is actually the outcome of the former algorithm. From the above comment from textbook on repeated use of algorithm on the same data, I am hopeful considerable amount of work exist along this line which I can learn from. Aside from that, I noticed a new addition, $\delta$ comparing with previous talks. I wonder what the motivation behind this change. What would be the interpretation behind different roles played by multiplicative and additive effect of one change of data. The following definition is from here but with slight update with delta.

An algorithm $M: T^{n} \rightarrow R$ is $(\boldsymbol{\epsilon}, \boldsymbol{\delta})$-differentially private if $\forall$ neighboring $x, x^{\prime} \in T^{n}$ and $\forall S \subseteq R$
$$
P[M(x) \in S] \leq e^{\epsilon} P\left[M\left(x^{\prime}\right) \in S\right]+\delta
$$

From this explanation here, post processing means no adversary can break privacy guarantee from the definition: if $M(x)$ is $\epsilon$-differentially private, and $A$ is any function, then $A(M(x))$ is $\epsilon$-differentially private.

Composition intuition is that epsilons add up from the definition: $M_{i}(x)$ is $\epsilon_{i}$-differentially private for $i=1, \ldots, k$, then the composition $M(x) \equiv\left(M_{1}(x), \ldots, M_{k}(x)\right)$ is $\sum_{i=1}^{k} \epsilon_{i}$-differentially private.

3. Questions

To build on, the following two questions need to be addressed before trying to import developments in DP field. I will call my algorithm which updates parameters based on the computed posterior samples through iterating SBC as autocalib. All references without citations are from this textbook.

1. How are adversarial roles of database updating algorithm and post-processing different?

For now, there is no adversarial in SBC as autocalib searches parameter region on which inference algorithm performs the best given the likelihood model. Performance is measured by how well the procedure recovers ($\theta$) the starting ground truths ($\theta’$). However, update to an agnostic learning is possible in two following ways. The iterative construction mechanism (sec. 5.2) and the following definition 5.4 (which I don’t understand) seems relevant.

In ML, ﬁnding a function f : X → {0, 1} from a class of functions Q that best labels a collection of labeled examples $(x_1 , y_1 ), . . . , (x m , y m ) ∈ X × {0, 1}$. Each example $x_i$ has a true label $y_i $, and a function f correctly labels $x_i$ if $f(x_i ) = y_i$ . An agnostic learning algorithm for a class Q is an algorithm that can ﬁnd the function in Q that labels all of the data points approximately as well as the best function in Q, even if no function in Q can perfectly label them. Equivalently, an agnostic learning algorithm is one that maximizes the number of positive examples labeled 1 minus the number of negative examples labeled 1. A distinguisher as deﬁned above is just an agnostic learning algorithm: just imagine that x contains all of the “positive” examples, and that y contains all of the “negative examples.” Agnostic learning does not require that any function perfectly label every example. For classes of linear queries Q, a distinguisher is simply an optimization algorithm. Because for linear queries f, f(x) − f(y) = f(x − y), a distinguisher simply seeks to ﬁnd $\underset{f}{argmax} Q|f(x − y)|$.
Agonostic learing

Database update algorithm

Let U : D × Q × R → D be an update rule and let T : R → R be a function. We say U is a $T(α)$ database update algorithm for query class Q if for every database x ∈ $N^|X|$ , every (U, x, Q, α, L)-database update sequence satisﬁes $L ≤ T(α)$.
Def. of Database update algorithm

One candidate for agnostic learning is autocalib algorithm. Compared to the current autocalib targeted for a single inference algorithm which returns the best updated distribution of $\theta’$ at the end of each iteration, suppose autocalib’s decision is made for group of algorithms $G_1, … , G_H$. Of course, we wouldn’t apply all inference algorithms due to heavy computation at every step. Instead autocalib chooses an inference algorithm that is expected to return the worst calibrated result, $\underset{G}{argmax} |C(\lambda_0)|$. From this, autocalib achieves its goal in finding the parameter space well-calibrated for a group of inference algorithms.

Post-processing
Output of inference algorithm (usually in the form of posterior samples) is not used once; different utility functions are expected but without knowing them, posterior distribution $p(\theta)$ is a canonical calibration target, instead of its post-processed version $p(U(\theta))$. However, with information on finite or family on utility functions $U_1, … , U_K$ calibrating the algorithm can be performed with outputs in decision space. In other words, precision and coverage of the output of algorithm that are previously compared in $\theta$ space is updated to the space of $U(\theta)$. Adversarial decisions, then, are made so that it chooses a worst performing utility functions. From the above slide, this is noted as $A(M(x))$ (no adversary can break the privacy guarantee) but I don’t have enough understanding to differentiate this with the above.

With deeper understanding on whether both database updating algorithm and post-processing in DP are adversarial and if so, how their roles are different (and their effects can be differentiated under an iterative structure as the input for the next iteration are output from both post process and database updating algorithm) schemes will be updated.With deeper understanding on whether both database updating algorithm and post-processing in DP are adversarial and if so, how their roles are different (and their effects can be differentiated under an iterative structure as the input for the next iteration are output from both post process and database updating algorithm) schemes will be updated.

2. What are approaches taken to make composition “automatic”?

What I am looking for is certain structures that automizes compositions such as borrowing the additive structure of gaussian noise in concentrated DP discussed in the following talk:

Adaptive composition from here seems relevant. However, the geometry of posterior space might be more diverse than database settings.

Analyzing repeated use of blackbox algorithm with Rachel Cummings

3. Questions

1 Comment

1 Pingback

Comment is the energy for a writer, thanks!Cancel reply

Categories

3. Questions

1 Comment

1 Pingback

Comment is the energy for a writer, thanks!Cancel reply