|
1391 | 1391 | \\
|
1392 | 1392 | Gaussian \glspl{rv} are widely used \glspl{probmodel} in the statistical analysis of
|
1393 | 1393 | \gls{ml} methods. Their significance arises partly from the \gls{clt}, which is a mathematically
|
1394 |
| - precise formulation of the following rule-of-thumb: The average of a large number of |
1395 |
| - independent \glspl{rv} (not necessarily Gaussian themselves) tends towards a Gaussian \gls{rv} \cite{ross2013first}. |
| 1394 | + precise formulation of the following rule-of-thumb: The average of many independent \glspl{rv} |
| 1395 | + (not necessarily Gaussian themselves) tends towards a Gaussian \gls{rv} \cite{ross2013first}. |
1396 | 1396 | \\
|
1397 | 1397 | Compared to other \glspl{probdist}, the \gls{mvndist} is also distinct in that—in a mathematically
|
1398 | 1398 | precise sense—represents maximum \gls{uncertainty}. Among all vector-valued \glspl{rv} with
|
|
2318 | 2318 | \newglossaryentry{algorithm}
|
2319 | 2319 | {name={algorithm}, plural={algorithms},
|
2320 | 2320 | description={An\index{algorithm} algorithm is a precise, step-by-step specification for
|
2321 |
| - how to produce an output from a given input within a finite number of computational steps \cite{Cormen:2022aa}. |
2322 |
| - For example, an algorithm for training a \gls{linmodel} explicitly describes how to |
| 2321 | + producing an output from a given input within a finite number of computational steps \cite{Cormen:2022aa}. |
| 2322 | + For example, an algorithm to train a \gls{linmodel} explicitly describes how to |
2323 | 2323 | transform a given \gls{trainset} into \gls{modelparams} through a sequence of \glspl{gradstep}.
|
2324 | 2324 | To study algorithms rigorously, we can represent (or approximate) them by different mathematical structures \cite{Sipser2013}.
|
2325 | 2325 | One approach is to represent an algorithm as a collection of possible executions. Each individual
|
2326 |
| - execution is a sequence of the following form: $${\rm input}, s_1, s_2, \ldots, s_T, {\rm output}.$$ This sequence |
| 2326 | + execution is then a sequence of the form: $${\rm input}, s_1, s_2, \ldots, s_T, {\rm output}.$$ This sequence |
2327 | 2327 | starts from an input and progresses via intermediate steps until an output is delivered. Crucially, an algorithm
|
2328 | 2328 | encompasses more than just a mapping from input to output; it also includes intermediate computational
|
2329 | 2329 | steps $s_1, \ldots, s_T$.
|
|
2480 | 2480 | \gls{hypothesis} (or trained \gls{model}) $\learnthypothesis \in \hypospace$. We evaluate the quality of a trained \gls{model}
|
2481 | 2481 | by computing the average \gls{loss} on a \gls{testset}. But how can we assess
|
2482 | 2482 | whether the resulting \gls{testset} performance is sufficiently good? How can we
|
2483 |
| - determine if the trained \gls{model} performs close to optimal and there is little point |
| 2483 | + determine if the trained \gls{model} performs close to optimal such that there is little point |
2484 | 2484 | in investing more resources (for \gls{data} collection or computation) to improve it?
|
2485 | 2485 | To this end, it is useful to have a reference (or baseline) level against which
|
2486 | 2486 | we can compare the performance of the trained \gls{model}. Such a reference value
|
|
2499 | 2499 | However, computing the \gls{bayesestimator} and \gls{bayesrisk} presents two
|
2500 | 2500 | main challenges:
|
2501 | 2501 | \begin{enumerate}[label=\arabic*)]
|
2502 |
| - \item The \gls{probdist} $p(\featurevec,\truelabel)$ is unknown and needs to be estimated. |
2503 |
| - \item Even if $p(\featurevec,\truelabel)$ is known, it can be computationally too expensive to compute the \gls{bayesrisk} exactly \cite{cooper1990computational}. |
| 2502 | + \item The \gls{probdist} $p(\featurevec,\truelabel)$ is unknown and must be estimated from observed \gls{data}. |
| 2503 | + \item Even if $p(\featurevec,\truelabel)$ were known, computing the \gls{bayesrisk} exactly may be computationally infeasible \cite{cooper1990computational}. |
2504 | 2504 | \end{enumerate}
|
2505 | 2505 | A widely used \gls{probmodel} is the \gls{mvndist} $\pair{\featurevec}{\truelabel} \sim \mathcal{N}({\bm \mu},{\bm \Sigma})$
|
2506 | 2506 | for \glspl{datapoint} characterized by numeric \glspl{feature} and \glspl{label}.
|
|
2996 | 2996 |
|
2997 | 2997 | \newglossaryentry{bootstrap}
|
2998 | 2998 | {name={bootstrap},
|
2999 |
| - description={For\index{bootstrap} the analysis of \gls{ml} methods, it is often useful to interpret |
3000 |
| - a given set of \glspl{datapoint} $\dataset = \big\{ \datapoint^{(1)}, \ldots, \datapoint^{(\samplesize)}\big\}$ |
3001 |
| - as \glspl{realization} of \gls{iid} \glspl{rv} with a common \gls{probdist} $p(\datapoint)$. In general, we |
3002 |
| - do not know $p(\datapoint)$ exactly, but we need to estimate it. The bootstrap uses the |
3003 |
| - \gls{histogram} of $\dataset$ as an estimator for the underlying \gls{probdist} $p(\datapoint)$. |
| 2999 | + description={ |
| 3000 | + For\index{bootstrap} the analysis of \gls{ml} methods, it is often useful to interpret |
| 3001 | + a given set of \glspl{datapoint}, $\dataset = \big\{ \datapoint^{(1)}, \ldots, \datapoint^{(\samplesize)} \big\}$, |
| 3002 | + as \glspl{realization} of \gls{iid} \glspl{rv} drawn from a common \gls{probdist} $p(\datapoint)$. |
| 3003 | + In practice, the \gls{probdist} $p(\datapoint)$ is unknown and must be estimated from $\dataset$. |
| 3004 | + The bootstrap approach uses the \gls{histogram} of $\dataset$ as an estimator for $p(\datapoint)$. |
3004 | 3005 | \\
|
3005 | 3006 | See also: \gls{iid}, \gls{rv}, \gls{probdist}, \gls{histogram}.},
|
3006 | 3007 | first={bootstrap},
|
|
3724 | 3725 | For example, weak learners are shallow \glspl{decisiontree} which are combined to
|
3725 | 3726 | obtain a deep \gls{decisiontree}. Boosting can be understood as a \gls{generalization}
|
3726 | 3727 | of \gls{gdmethods} for \gls{erm} using parametric \glspl{model} and \gls{smooth} \glspl{lossfunc}
|
3727 |
| - \cite{Friedman2001}. Just like \gls{gd} iteratively updates \gls{modelparams} to reduce the \gls{emprisk}, |
| 3728 | + \cite{Friedman2001}. Just as \gls{gd} iteratively updates \gls{modelparams} to reduce the \gls{emprisk}, |
3728 | 3729 | boosting iteratively combines (e.g., by summation) \gls{hypothesis} \glspl{map} to reduce the \gls{emprisk}.
|
3729 | 3730 | A widely-used instance of the generic boosting idea is referred to as \gls{gradient} boosting, which
|
3730 | 3731 | uses \glspl{gradient} of the \gls{lossfunc} for combining the weak learners \cite{Friedman2001}.
|
|
4784 | 4785 | \newglossaryentry{ai}
|
4785 | 4786 | {name={artificial intelligence (AI)},
|
4786 | 4787 | description={AI\index{artificial intelligence (AI)} refers to systems that behave rationally in the sense of
|
4787 |
| - maximizing a long-term \gls{reward}. The \gls{ml}-based approach to AI is to train a \gls{model} for |
4788 |
| - predicting optimal actions. These \glspl{prediction} are computed from observations about the state of the |
| 4788 | + maximizing a long-term \gls{reward}. The \gls{ml}-based approach to AI is to train a \gls{model} to |
| 4789 | + predict optimal actions. These \glspl{prediction} are computed from observations about the state of the |
4789 | 4790 | environment. The choice of \gls{lossfunc} sets AI applications apart from more basic \gls{ml} applications.
|
4790 |
| - AI systems rarely have access to a labeled \gls{trainset} that allows the average \gls{loss} to be measured for any possible choice of \gls{modelparams}. |
4791 |
| - Instead, AI systems use observed \gls{reward} signals to obtain a (point-wise) estimate for the |
4792 |
| - \gls{loss} incurred by the current choice of \gls{modelparams}. |
| 4791 | + AI systems rarely have access to a labeled \gls{trainset} that allows the average \gls{loss} to be |
| 4792 | + measured for any possible choice of \gls{modelparams}. Instead, AI systems use observed \gls{reward} |
| 4793 | + signals to estimate the \gls{loss} incurred by the current choice of \gls{modelparams}. |
4793 | 4794 | \\
|
4794 | 4795 | See also: \gls{reward}, \gls{ml}, \gls{model}, \gls{lossfunc}, \gls{trainset}, \gls{loss}, \gls{modelparams}.},
|
4795 | 4796 | first={AI},
|
|
5327 | 5328 | \item The internal structure of the \gls{model} remains hidden—which is useful for protecting intellectual property or trade secrets.
|
5328 | 5329 | \end{itemize}
|
5329 | 5330 | However, APIs are not without \gls{risk}. Techniques such as \gls{modelinversion} can potentially reconstruct a
|
5330 |
| - \gls{model} from its \glspl{prediction} on carefully selected \glspl{featurevec}. |
| 5331 | + \gls{model} from its \glspl{prediction} using carefully selected \glspl{featurevec}. |
5331 | 5332 | \\
|
5332 |
| - See also: \gls{ml}, \gls{model}, \gls{featurevec}, \gls{datapoint}, \gls{prediction}, \gls{feature}, \gls{modelinversion}.}, |
| 5333 | + See also: \gls{ml}, \glspl{prediction}.}, |
5333 | 5334 | first={application programming interface (API)},
|
5334 | 5335 | text={API}
|
5335 | 5336 | }
|
|
5339 | 5340 | description={A\index{model inversion} \gls{model} inversion is a form of \gls{privattack} on an \gls{ml} system.
|
5340 | 5341 | An adversary seeks to infer \glspl{sensattr} of individual \glspl{datapoint} by exploiting partial access
|
5341 | 5342 | to a trained \gls{model} $\learnthypothesis \in \hypospace$. This access typically consists of
|
5342 |
| - querying the \gls{model} for \glspl{prediction} $\learnthypothesis(\featurevec)$ on carefully chosen inputs. |
| 5343 | + querying the \gls{model} for \glspl{prediction} $\learnthypothesis(\featurevec)$ using carefully chosen inputs. |
5343 | 5344 | Basic \gls{model} inversion techniques have been demonstrated in the context of facial image
|
5344 | 5345 | \gls{classification}, where images are reconstructed using the (\gls{gradient} of) \gls{model} outputs
|
5345 | 5346 | combined with auxiliary information such as a person’s name \cite{Fredrikson2015}.
|
|
5446 | 5447 | {name={bagging (or bootstrap aggregation)},
|
5447 | 5448 | description={Bagging\index{bagging (or bootstrap aggregation)} (or bootstrap aggregation)
|
5448 | 5449 | is a generic technique to improve (the \gls{robustness} of) a given \gls{ml} method.
|
5449 |
| - The idea is to use the \gls{bootstrap} to generate perturbed copies of a given \gls{dataset} |
5450 |
| - and then to learn a separate \gls{hypothesis} for each copy. We then predict the |
5451 |
| - \gls{label} of a \gls{datapoint} by combining or aggregating the individual \glspl{prediction} |
| 5450 | + The idea is to use the \gls{bootstrap} to generate perturbed copies of a given \gls{dataset}, |
| 5451 | + and learn a separate \gls{hypothesis} for each copy. We then predict the \gls{label} of a \gls{datapoint} |
| 5452 | + by combining or aggregating the individual \glspl{prediction} |
5452 | 5453 | of each separate \gls{hypothesis}. For \gls{hypothesis} \glspl{map} delivering numeric \gls{label}
|
5453 | 5454 | values, this aggregation could be implemented by computing the average of individual
|
5454 | 5455 | \glspl{prediction}.
|
5455 | 5456 | \\
|
5456 |
| - See also: \gls{robustness}, \gls{ml}, \gls{bootstrap}, \gls{dataset}, \gls{hypothesis}, \gls{label}, \gls{datapoint}, \gls{prediction}, \gls{map}.}, |
| 5457 | + See also: \gls{robustness}, \gls{bootstrap}.}, |
5457 | 5458 | first={bagging (or bootstrap aggregation)},
|
5458 | 5459 | text={bagging}
|
5459 | 5460 | }
|
|
5596 | 5597 |
|
5597 | 5598 | \newglossaryentry{bayesestimator}
|
5598 | 5599 | {name={Bayes estimator},
|
5599 |
| - description={Consider\index{Bayes estimator} a \gls{probmodel} with a joint \gls{probdist} |
5600 |
| - $p(\featurevec,\truelabel)$ for the \glspl{feature} $\featurevec$ and \gls{label} $\truelabel$ |
| 5600 | + description={ |
| 5601 | + Consider\index{Bayes estimator} a \gls{probmodel} with a joint \gls{probdist} |
| 5602 | + $p(\featurevec,\truelabel)$ over the \glspl{feature} $\featurevec$ and the \gls{label} $\truelabel$ |
5601 | 5603 | of a \gls{datapoint}. For a given \gls{lossfunc} $\lossfunc{\cdot}{\cdot}$, we refer to a \gls{hypothesis}
|
5602 |
| - $\hypothesis$ as a Bayes estimator if its \gls{risk} $\expect\{\lossfunc{\pair{\featurevec}{\truelabel}}{\hypothesis}\}$ is the |
5603 |
| -\gls{minimum} \cite{LC}. Note that the property of a \gls{hypothesis} being a Bayes estimator depends on |
5604 |
| -the underlying \gls{probdist} and the choice for the \gls{lossfunc} $\lossfunc{\cdot}{\cdot}$. |
| 5604 | + $\hypothesis$ as a Bayes estimator if its \gls{risk} |
| 5605 | + $\expect\left\{\lossfunc{\pair{\featurevec}{\truelabel}}{\hypothesis}\right\}$ |
| 5606 | + is the \gls{minimum} achievable \gls{risk}~\cite{LC}. |
| 5607 | + Note that whether a \gls{hypothesis} qualifies as a Bayes estimator depends on the underlying |
| 5608 | + \gls{probdist} and the choice of \gls{lossfunc} $\lossfunc{\cdot}{\cdot}$. |
5605 | 5609 | \\
|
5606 | 5610 | See also: \gls{probmodel}, \gls{hypothesis}, \gls{risk}.},
|
5607 | 5611 | first={Bayes estimator},
|
|
0 commit comments