Skip to content

Commit 541e4dc

Browse files
committed
revise
1 parent dfd5c13 commit 541e4dc

File tree

3 files changed

+46
-42
lines changed

3 files changed

+46
-42
lines changed

ADictML_English.pdf

-528 Bytes
Binary file not shown.

ADictML_Glossary_English.tex

Lines changed: 35 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1391,8 +1391,8 @@
13911391
\\
13921392
Gaussian \glspl{rv} are widely used \glspl{probmodel} in the statistical analysis of
13931393
\gls{ml} methods. Their significance arises partly from the \gls{clt}, which is a mathematically
1394-
precise formulation of the following rule-of-thumb: The average of a large number of
1395-
independent \glspl{rv} (not necessarily Gaussian themselves) tends towards a Gaussian \gls{rv} \cite{ross2013first}.
1394+
precise formulation of the following rule-of-thumb: The average of many independent \glspl{rv}
1395+
(not necessarily Gaussian themselves) tends towards a Gaussian \gls{rv} \cite{ross2013first}.
13961396
\\
13971397
Compared to other \glspl{probdist}, the \gls{mvndist} is also distinct in that—in a mathematically
13981398
precise sense—represents maximum \gls{uncertainty}. Among all vector-valued \glspl{rv} with
@@ -2318,12 +2318,12 @@
23182318
\newglossaryentry{algorithm}
23192319
{name={algorithm}, plural={algorithms},
23202320
description={An\index{algorithm} algorithm is a precise, step-by-step specification for
2321-
how to produce an output from a given input within a finite number of computational steps \cite{Cormen:2022aa}.
2322-
For example, an algorithm for training a \gls{linmodel} explicitly describes how to
2321+
producing an output from a given input within a finite number of computational steps \cite{Cormen:2022aa}.
2322+
For example, an algorithm to train a \gls{linmodel} explicitly describes how to
23232323
transform a given \gls{trainset} into \gls{modelparams} through a sequence of \glspl{gradstep}.
23242324
To study algorithms rigorously, we can represent (or approximate) them by different mathematical structures \cite{Sipser2013}.
23252325
One approach is to represent an algorithm as a collection of possible executions. Each individual
2326-
execution is a sequence of the following form: $${\rm input}, s_1, s_2, \ldots, s_T, {\rm output}.$$ This sequence
2326+
execution is then a sequence of the form: $${\rm input}, s_1, s_2, \ldots, s_T, {\rm output}.$$ This sequence
23272327
starts from an input and progresses via intermediate steps until an output is delivered. Crucially, an algorithm
23282328
encompasses more than just a mapping from input to output; it also includes intermediate computational
23292329
steps $s_1, \ldots, s_T$.
@@ -2480,7 +2480,7 @@
24802480
\gls{hypothesis} (or trained \gls{model}) $\learnthypothesis \in \hypospace$. We evaluate the quality of a trained \gls{model}
24812481
by computing the average \gls{loss} on a \gls{testset}. But how can we assess
24822482
whether the resulting \gls{testset} performance is sufficiently good? How can we
2483-
determine if the trained \gls{model} performs close to optimal and there is little point
2483+
determine if the trained \gls{model} performs close to optimal such that there is little point
24842484
in investing more resources (for \gls{data} collection or computation) to improve it?
24852485
To this end, it is useful to have a reference (or baseline) level against which
24862486
we can compare the performance of the trained \gls{model}. Such a reference value
@@ -2499,8 +2499,8 @@
24992499
However, computing the \gls{bayesestimator} and \gls{bayesrisk} presents two
25002500
main challenges:
25012501
\begin{enumerate}[label=\arabic*)]
2502-
\item The \gls{probdist} $p(\featurevec,\truelabel)$ is unknown and needs to be estimated.
2503-
\item Even if $p(\featurevec,\truelabel)$ is known, it can be computationally too expensive to compute the \gls{bayesrisk} exactly \cite{cooper1990computational}.
2502+
\item The \gls{probdist} $p(\featurevec,\truelabel)$ is unknown and must be estimated from observed \gls{data}.
2503+
\item Even if $p(\featurevec,\truelabel)$ were known, computing the \gls{bayesrisk} exactly may be computationally infeasible \cite{cooper1990computational}.
25042504
\end{enumerate}
25052505
A widely used \gls{probmodel} is the \gls{mvndist} $\pair{\featurevec}{\truelabel} \sim \mathcal{N}({\bm \mu},{\bm \Sigma})$
25062506
for \glspl{datapoint} characterized by numeric \glspl{feature} and \glspl{label}.
@@ -2996,11 +2996,12 @@
29962996

29972997
\newglossaryentry{bootstrap}
29982998
{name={bootstrap},
2999-
description={For\index{bootstrap} the analysis of \gls{ml} methods, it is often useful to interpret
3000-
a given set of \glspl{datapoint} $\dataset = \big\{ \datapoint^{(1)}, \ldots, \datapoint^{(\samplesize)}\big\}$
3001-
as \glspl{realization} of \gls{iid} \glspl{rv} with a common \gls{probdist} $p(\datapoint)$. In general, we
3002-
do not know $p(\datapoint)$ exactly, but we need to estimate it. The bootstrap uses the
3003-
\gls{histogram} of $\dataset$ as an estimator for the underlying \gls{probdist} $p(\datapoint)$.
2999+
description={
3000+
For\index{bootstrap} the analysis of \gls{ml} methods, it is often useful to interpret
3001+
a given set of \glspl{datapoint}, $\dataset = \big\{ \datapoint^{(1)}, \ldots, \datapoint^{(\samplesize)} \big\}$,
3002+
as \glspl{realization} of \gls{iid} \glspl{rv} drawn from a common \gls{probdist} $p(\datapoint)$.
3003+
In practice, the \gls{probdist} $p(\datapoint)$ is unknown and must be estimated from $\dataset$.
3004+
The bootstrap approach uses the \gls{histogram} of $\dataset$ as an estimator for $p(\datapoint)$.
30043005
\\
30053006
See also: \gls{iid}, \gls{rv}, \gls{probdist}, \gls{histogram}.},
30063007
first={bootstrap},
@@ -3724,7 +3725,7 @@
37243725
For example, weak learners are shallow \glspl{decisiontree} which are combined to
37253726
obtain a deep \gls{decisiontree}. Boosting can be understood as a \gls{generalization}
37263727
of \gls{gdmethods} for \gls{erm} using parametric \glspl{model} and \gls{smooth} \glspl{lossfunc}
3727-
\cite{Friedman2001}. Just like \gls{gd} iteratively updates \gls{modelparams} to reduce the \gls{emprisk},
3728+
\cite{Friedman2001}. Just as \gls{gd} iteratively updates \gls{modelparams} to reduce the \gls{emprisk},
37283729
boosting iteratively combines (e.g., by summation) \gls{hypothesis} \glspl{map} to reduce the \gls{emprisk}.
37293730
A widely-used instance of the generic boosting idea is referred to as \gls{gradient} boosting, which
37303731
uses \glspl{gradient} of the \gls{lossfunc} for combining the weak learners \cite{Friedman2001}.
@@ -4784,12 +4785,12 @@
47844785
\newglossaryentry{ai}
47854786
{name={artificial intelligence (AI)},
47864787
description={AI\index{artificial intelligence (AI)} refers to systems that behave rationally in the sense of
4787-
maximizing a long-term \gls{reward}. The \gls{ml}-based approach to AI is to train a \gls{model} for
4788-
predicting optimal actions. These \glspl{prediction} are computed from observations about the state of the
4788+
maximizing a long-term \gls{reward}. The \gls{ml}-based approach to AI is to train a \gls{model} to
4789+
predict optimal actions. These \glspl{prediction} are computed from observations about the state of the
47894790
environment. The choice of \gls{lossfunc} sets AI applications apart from more basic \gls{ml} applications.
4790-
AI systems rarely have access to a labeled \gls{trainset} that allows the average \gls{loss} to be measured for any possible choice of \gls{modelparams}.
4791-
Instead, AI systems use observed \gls{reward} signals to obtain a (point-wise) estimate for the
4792-
\gls{loss} incurred by the current choice of \gls{modelparams}.
4791+
AI systems rarely have access to a labeled \gls{trainset} that allows the average \gls{loss} to be
4792+
measured for any possible choice of \gls{modelparams}. Instead, AI systems use observed \gls{reward}
4793+
signals to estimate the \gls{loss} incurred by the current choice of \gls{modelparams}.
47934794
\\
47944795
See also: \gls{reward}, \gls{ml}, \gls{model}, \gls{lossfunc}, \gls{trainset}, \gls{loss}, \gls{modelparams}.},
47954796
first={AI},
@@ -5327,9 +5328,9 @@
53275328
\item The internal structure of the \gls{model} remains hidden—which is useful for protecting intellectual property or trade secrets.
53285329
\end{itemize}
53295330
However, APIs are not without \gls{risk}. Techniques such as \gls{modelinversion} can potentially reconstruct a
5330-
\gls{model} from its \glspl{prediction} on carefully selected \glspl{featurevec}.
5331+
\gls{model} from its \glspl{prediction} using carefully selected \glspl{featurevec}.
53315332
\\
5332-
See also: \gls{ml}, \gls{model}, \gls{featurevec}, \gls{datapoint}, \gls{prediction}, \gls{feature}, \gls{modelinversion}.},
5333+
See also: \gls{ml}, \glspl{prediction}.},
53335334
first={application programming interface (API)},
53345335
text={API}
53355336
}
@@ -5339,7 +5340,7 @@
53395340
description={A\index{model inversion} \gls{model} inversion is a form of \gls{privattack} on an \gls{ml} system.
53405341
An adversary seeks to infer \glspl{sensattr} of individual \glspl{datapoint} by exploiting partial access
53415342
to a trained \gls{model} $\learnthypothesis \in \hypospace$. This access typically consists of
5342-
querying the \gls{model} for \glspl{prediction} $\learnthypothesis(\featurevec)$ on carefully chosen inputs.
5343+
querying the \gls{model} for \glspl{prediction} $\learnthypothesis(\featurevec)$ using carefully chosen inputs.
53435344
Basic \gls{model} inversion techniques have been demonstrated in the context of facial image
53445345
\gls{classification}, where images are reconstructed using the (\gls{gradient} of) \gls{model} outputs
53455346
combined with auxiliary information such as a person’s name \cite{Fredrikson2015}.
@@ -5446,14 +5447,14 @@
54465447
{name={bagging (or bootstrap aggregation)},
54475448
description={Bagging\index{bagging (or bootstrap aggregation)} (or bootstrap aggregation)
54485449
is a generic technique to improve (the \gls{robustness} of) a given \gls{ml} method.
5449-
The idea is to use the \gls{bootstrap} to generate perturbed copies of a given \gls{dataset}
5450-
and then to learn a separate \gls{hypothesis} for each copy. We then predict the
5451-
\gls{label} of a \gls{datapoint} by combining or aggregating the individual \glspl{prediction}
5450+
The idea is to use the \gls{bootstrap} to generate perturbed copies of a given \gls{dataset},
5451+
and learn a separate \gls{hypothesis} for each copy. We then predict the \gls{label} of a \gls{datapoint}
5452+
by combining or aggregating the individual \glspl{prediction}
54525453
of each separate \gls{hypothesis}. For \gls{hypothesis} \glspl{map} delivering numeric \gls{label}
54535454
values, this aggregation could be implemented by computing the average of individual
54545455
\glspl{prediction}.
54555456
\\
5456-
See also: \gls{robustness}, \gls{ml}, \gls{bootstrap}, \gls{dataset}, \gls{hypothesis}, \gls{label}, \gls{datapoint}, \gls{prediction}, \gls{map}.},
5457+
See also: \gls{robustness}, \gls{bootstrap}.},
54575458
first={bagging (or bootstrap aggregation)},
54585459
text={bagging}
54595460
}
@@ -5596,12 +5597,15 @@
55965597

55975598
\newglossaryentry{bayesestimator}
55985599
{name={Bayes estimator},
5599-
description={Consider\index{Bayes estimator} a \gls{probmodel} with a joint \gls{probdist}
5600-
$p(\featurevec,\truelabel)$ for the \glspl{feature} $\featurevec$ and \gls{label} $\truelabel$
5600+
description={
5601+
Consider\index{Bayes estimator} a \gls{probmodel} with a joint \gls{probdist}
5602+
$p(\featurevec,\truelabel)$ over the \glspl{feature} $\featurevec$ and the \gls{label} $\truelabel$
56015603
of a \gls{datapoint}. For a given \gls{lossfunc} $\lossfunc{\cdot}{\cdot}$, we refer to a \gls{hypothesis}
5602-
$\hypothesis$ as a Bayes estimator if its \gls{risk} $\expect\{\lossfunc{\pair{\featurevec}{\truelabel}}{\hypothesis}\}$ is the
5603-
\gls{minimum} \cite{LC}. Note that the property of a \gls{hypothesis} being a Bayes estimator depends on
5604-
the underlying \gls{probdist} and the choice for the \gls{lossfunc} $\lossfunc{\cdot}{\cdot}$.
5604+
$\hypothesis$ as a Bayes estimator if its \gls{risk}
5605+
$\expect\left\{\lossfunc{\pair{\featurevec}{\truelabel}}{\hypothesis}\right\}$
5606+
is the \gls{minimum} achievable \gls{risk}~\cite{LC}.
5607+
Note that whether a \gls{hypothesis} qualifies as a Bayes estimator depends on the underlying
5608+
\gls{probdist} and the choice of \gls{lossfunc} $\lossfunc{\cdot}{\cdot}$.
56055609
\\
56065610
See also: \gls{probmodel}, \gls{hypothesis}, \gls{risk}.},
56075611
first={Bayes estimator},

0 commit comments

Comments
 (0)