Skip to content

Commit 4153d32

Browse files
authored
OpenDP v0.5.0 (#486)
* Update to OpenDP v0.5 * approx bounds * Pandas breaking change Co-authored-by: Joshua <[email protected]>
1 parent 3db2ade commit 4153d32

24 files changed

+250
-823
lines changed

.gitignore

+4-1
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,7 @@ poetry.lock
112112
# mlflow
113113
mlruns
114114
result.json
115+
run.py
115116

116117
# VS Code
117118
.vscode/
@@ -135,4 +136,6 @@ PUMS_pid.csv
135136
PUMS_large.csv
136137
PUMS_dup.csv
137138
PUMS_null.csv
138-
*.db
139+
*.db
140+
141+

sql/HISTORY.md

+7
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
# SmartNoise SQL v0.2.5 Release Notes
2+
3+
* Update to use OpenDP v0.5, support for Mac Silicon
4+
* Switch to use discrete Laplace for all integer sums and counts
5+
* Enable discrete Gaussian option
6+
* `get_privacy_cost` now allows lists of queries
7+
18
# SmartNoise SQL v0.2.4 Release Notes
29

310
* Support for BigQuery (thanks, @oskarkocol!)

sql/README.md

-17
Original file line numberDiff line numberDiff line change
@@ -138,23 +138,6 @@ epsilon_many, _ = reader.odometer.spent
138138
print(f'{epsilon_many} < {epsilon_single * 100}')
139139
```
140140

141-
## Accuracy
142-
143-
The `get_simple_accuracy` method returns the column-wise accuracies for a given alpha for a given query.
144-
145-
```python
146-
privacy = Privacy(epsilon=1.0, delta=10e-6)
147-
148-
reader = from_connection(conn, metadata=metadata, privacy=privacy)
149-
150-
query = 'SELECT COUNT(*) AS n, SUM(age) AS age FROM PUMS.PUMS'
151-
152-
acc95 = reader.get_simple_accuracy(query, alpha=0.05)
153-
print(f'n will be +/- {acc95[0]} in 95% of executions. Age will be +/- {acc95[1]}')
154-
```
155-
156-
This method only returns simple accuracies, where the noise scale for each column is fixed and does not vary per row. Statistics like AVG and VARIANCE are computed from a quotient of noisy sum and noisy count, so the accuracy can vary widely per row. In these cases, a per-row accuracy can be obtained with `execute_with_accuracy`.
157-
158141
## Histograms
159142

160143
SQL `group by` queries represent histograms binned by grouping key. Queries over a grouping key with unbounded or non-public dimensions expose privacy risk. For example:

sql/VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.2.4
1+
0.2.5

sql/docs/source/advanced.rst

+1-3
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,10 @@ You can override the default mechanisms used for differentially private summary
1616
privacy = Privacy(epsilon=1.0)
1717
print(f"We default to using {privacy.mechanisms.map[Stat.count]} for counts.")
1818
print("Switching to use gaussian")
19-
privacy.mechanisms.map[Stat.count] = Mechanism.gaussian
19+
privacy.mechanisms.map[Stat.count] = Mechanism.discrete_gaussian
2020
2121
The list of statistics that can be mapped is in the ``Stat`` enumeration, and the mechanisms available are listed in the ``Mechanism`` enumeration. The ``AVG`` sumamry statistic is computed from a sum and a count, each of which can be overriden.
2222

23-
For integer sums, you can specify ``Stat.sum_int``, and specify ``Stat.sum_large_int`` separately for large integer sums. A "large" integer sum is any sum that is greater than the value set in ``mechanisms.large``, which defaults to 1000. This is primarily useful when using geometric mechanism for integer sums, since the geometric mechanism slows down for large integer sums.
24-
2523
pre_aggregated
2624
--------------
2725

sql/pyproject.toml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "smartnoise-sql"
3-
version = "0.2.4"
3+
version = "0.2.5"
44
description = "Differentially Private SQL Queries"
55
authors = ["SmartNoise Team <[email protected]>"]
66
license = "MIT"
@@ -11,7 +11,7 @@ readme = "README.md"
1111

1212
[tool.poetry.dependencies]
1313
python = ">3.6,<3.11"
14-
opendp = "^0.4.0"
14+
opendp = "^0.5.0"
1515
antlr4-python3-runtime = "4.9.3"
1616
pandasql = "^0.7.3"
1717
PyYAML = "^5.4.1"

sql/setup.py

+3-3
Large diffs are not rendered by default.

sql/snsql/sql/_mechanisms/__init__.py

+3-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
1-
from .gaussian import Gaussian
21
from .laplace import Laplace
3-
from .geometric import Geometric
4-
from .analytic_gaussian import AnalyticGaussian
2+
from .discrete_laplace import DiscreteLaplace
3+
from .discrete_gaussian import DiscreteGaussian
54
from .base import Mechanism, Unbounded
65

7-
__all__ = ["Gaussian", "Laplace", "Geometric", "Mechanism", "Unbounded", "AnalyticGaussian"]
6+
__all__ = ["Laplace", "DiscreteLaplace", "DiscreteGaussian", "Mechanism", "Unbounded",]

0 commit comments

Comments
 (0)