Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added Lecture Notes/Images/Moralized.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_11.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_12.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_13.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_14.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_15.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_16.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_17.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_18.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_19.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_20.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_21.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/Screenshot_9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
144 changes: 144 additions & 0 deletions Lecture Notes/Images/Week3.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{amssymb}
\title{CSC412 Notes Week 3}
\author{Jerry Zheng}
\date{April 2021}
\hypersetup{
colorlinks=true,
linkcolor=blue,
filecolor=magenta,
urlcolor=blue,
pdftitle={Sharelatex Example},
bookmarks=true,
pdfpagemode=FullScreen,
}

\begin{document}

\maketitle

\section{Graphical Models}
\subsection{Chain Rule}
The joint distribution of (N) random variables can be evaluated with the chain rule\\

$$P(x_{1, ..., N}) = P(x_1)P(x_2|x_1)P(x_3 | x_2, x_1) \ldots P(x_n | x_{n-1 ,..., 1})$$\\

When we have a joint distribution of discrete random variables with full dependence between variables.\\

More formally, in probability the chain rule for two random variables is\\
$$P(x, y) = P(x | y)P(y)$$\\

\subsection{Conditional Independence}

To represent large joint distributions we can assume conditional independence
$$X \perp Y | Z \Leftrightarrow P(X, Y | Z) = P(X | Z)P(Y | Z) \Leftrightarrow P(X | Y, Z) = P(X | Z)$$
This is very useful as now we can represent a large chain of N variables as a product of independent variables.
$$P(x_{1:n}) = P(x_1) \prod_{t=1}^n P(x_t|x_{t-1})$$

this is the (first order) Markov Assumption. Where "the future is independent of the past given the present"

\subsection{Probabilistic Graphical Models}
If you don't know what a graph is, \href{https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)}{Wikipedia}\\

A probabilistic graphical model can modeled can be used to represent joint distributions when we assume Conditional Independence. In this model, nodes are variables and edges show conditional dependence.\\

\subsection{Directed Acyclical Graphical Model}
Using the Markov Assumption we can very easily represent complicated graphs.\\
A 4 node starting graph can be represented with half the number of edges!\\
\includegraphics[scale=0.7]{Screenshot_2.png}\\

can now be represented as\\
\includegraphics[scale=0.7]{Screenshot_3.png}\\

its easy to see that $X_a$ is much easier to evaluate in the second graph than the first.

$$P(x_a) = P(x_a| x_b, x_c, x_d) P(x_b| x_c, x_d )P( x_c | x_d) P(x_d)$$
can be simplified to this with the Markov assumption.
$$P(x_a) = P(x_a| x_b) P(x_b| x_c)P( x_c | x_d) P(x_d)$$

\section{Conditional Independence and Directed-Separation }
Directed-separation is where two variables in a DAGM may or may not be connected given a third variable.\\
D-connection implies conditional dependence.\\
D-separation implies conditional independence.\\
\\
This also extrapolates into groups/sets of variables for X Y Z

$$X = \{X_1, ...X_n\}$$
$$Y = \{Y_1, ...Y_n\}$$
$$Z = \{Z_1, ...Z_n\}$$
$$X \bot Z | Y$$\\
If every variable in X is d-separated from every variable in Z conditioned on all the variables in Y.\\
To determine d-separation we will use the Bayes ball algorithm\\

For Bayes Ball there are 3 structures that you must know.

\subsection{Bayes Ball - Chain}
\includegraphics[scale=0.7]{Screenshot_7.png}\\

X and Z are conditionally dependent when y is unknown and conditionally independent when y is known.\\
From the chain's graph, we can encode the structure as.
$$P(x, y, z) = P(x)P(y|x)P(z|y)$$
once we condition on y we get.
$$
P(x, z | y) = \frac{P(x)p(y|x)p(z|y))}{P(y)} \\
= \frac{P(x, y)P(z|y)}{P(y)} \\
= P(x | y) P(z | y) \\
$$
$$\therefore x \bot z | y$$

\subsection{Bayes Ball - Fork}
\includegraphics[scale=0.7]{Screenshot_8.png}\\
X and Z are conditionally dependent when y is unknown and conditionally independent when y is known.\\
from the fork's graph we get the equation\\
$$P(x, y, z) = P(y)P(x|y)P(z|y)$$
conditioning on y we get.
$$
P(x, z | y) = \frac{P(x, y, z)}{P(y)} \\
= \frac{P(y)P(x|y)P(z|y))}{P(y)} \\
= P(x | y) P(z | y) \\
$$
$$\therefore x \bot z | y$$

\subsection{Bayes Ball - Collider}
\includegraphics[scale=0.7]{Screenshot_9.png}\\
X and Z are conditionally independent when y is unknown and conditionally dependent when y is known.\\
From the collider's graph we get the equation\\
$$p(x, y, z) = p(x)p(z)p(y|x, z)$$
conditioning on y we get.
$$
P(x, z | y) = \frac{P(x, y, z)}{P(y)} \\
= \frac{p(x)p(z)p(y|x, z))}{P(y)} \\
= P(x) P(z) \\
$$
$$\therefore x \not \perp z | y$$

however if we do not condition on y it's easy to see that...
$$ P(x, z) = P(x) P(z)$$
$$\therefore x \perp z$$
So we see that conditioning on a common child at the bottom of a collider/v-structure makes its parents become dependent.\\
This important effect is called explaining away, inter-causal reasoning, or Berkson’s paradox.\\
\\
As an example,\\
$X$ is the event of a Toronto Raptors parade $P(X)=0.01$\\
$Z$ is the event of a car accident $P(Z)=0.1$\\
$Y$ is the event of a traffic jam downtown\\
Lets say that these are the only 2 sources of traffic. So if we know a traffic jam has occurred, then at least one of the two events has happened.\\
\subsection{Boundary Conditions}
\includegraphics[scale=0.4]{Screenshot_11.png}\\
In example 3, if any child of Y is observed then Y is effectively observed, so the information 'bounces back'.\\
This is shown again in examples 1 and 2. Where if Y is known then the information goes up the chain.\\
\pagebreak
\subsection{Putting It Together}
Now that we understand Conditional Independence with Bayes Ball on simple graphs we can apply this to complex graphs.\\
\includegraphics[scale=0.7]{Screenshot_10.png}\\
Say, we want to determine the conditional dependence of 2 and 6 given 5.\\
There are 3 paths from 2 to 6. \\
2 → 5 → 6 cannot be traversed $2\perp 6 | 5$ (known chain)\\
2 → 4 → 7 → 6 cannot be traversed $4 \perp 6 | 7$ (unknown collider)\\
2 → 1 → 3 → 6 cannot be traversed $2 \perp 3 | 1$ (unknown fork)\\
so we can say $2 \perp 6 | 5$.\\
This would change if we knew 1 or 6 or didn't know 5.
\end{document}
115 changes: 115 additions & 0 deletions Lecture Notes/Images/Week4.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{hyperref}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage{amssymb}
\title{CSC412 Notes Week 4}
\author{Jerry Zheng}
\date{April 2021}
\hypersetup{
colorlinks=true,
linkcolor=blue,
filecolor=magenta,
urlcolor=blue,
pdftitle={Sharelatex Example},
bookmarks=true,
pdfpagemode=FullScreen,
}

\begin{document}

\maketitle
\section{Exact Inference}
Lets say we have a distribution. $P(x, y)$
If we want to perform inference on it we would use $P(y|x) = \frac{P(x,y)}{\sum_{y}p(x,y)}$
However, there may be a set of variables in our model P that isn't apart of x or y.

$$x = \text{The observed evidence}$$
$$y = \text{The unobserved variable we want to infer} $$
$$r = X - {x, y} $$

Where $r$ is a set of random variables neither apart of the query nor the evidence.

$$p(y | x) = \frac{p(y, x)}{p(x)}$$

each of the distributions we need to compute can be computed by marginalizing over the other variables.

$$p(y, x) = \sum_{r}p(y, x, r)$$

However, naively marginalizing over all unobserved variables requires a number of computations exponential in the number of random variables, (N), in our model.

\section{Variable Elimination}
To compute this efficiently we will use the Variable Elimination Algorithm.\\
\\
It's an exact inference algorithm, meaning it will calculate exactly $p(y|x)$.\\
\\
It's also general, meaning it can be used on many different kinds of graphical models.\\
\\
It's complexity depends of the conditional independence of our model.\\
\\
It's intuitively done with dynamic programming.

\subsection{Chain Example}
$$ A \rightarrow B \rightarrow C \rightarrow D $$

To find P(D), we have the variables

$$y = \{D\}$$
$$x = \{\}$$
$$r = \{A, B, C\} $$

\begin{align*}
P(y) &= \sum_{r} p(y, r) \\
\Rightarrow P(D) &= \sum_{A, B, C}p(A, B, C, D) \\
& = \sum_A \sum_B \sum_C p(A)p(B | A) p(C | B) p(D | C) \\
\end{align*}

This is exponential in the number of variables $\mathbf O(k^n)$ (k is the number of states per variable). But, reordering the joint distribution

$$ P(D) = \sum_C p(D | C) \sum_B p(C | B) \sum_A p(A)p(B | A) $$

we can begin to simplify

\begin{align*}
P(D) &= \sum_C p(D | C) \sum_b p(C | B) \sum_A p(A)p(B | A) \\
&= \sum_C p(D | C) \sum_B p(C | B) \tau (B) \\
&= \sum_C p(D | C) \tau (C) \\
\end{align*}

So, by using dynamic programming to do the computation in reverse, we do inference over the joint distribution represented by the chain without generating it explicitly!\\
We have reduced the running time to $\mathbf{O(nk^2)}$!

\subsection{bigger example}

\includegraphics[scale=0.7]{Screenshot_12.png}\\
The joint distribution of our CS student graph is given by

$$P (C, D, I, G, S, L, J, H) = P (C)P (D|C)P (I)P (G|I, D)P (S|I)P (L|G)P (J|L, S)P (H|G, J)$$

with factors

$$P(C, D, I, G, S, L, J, H) = {\psi_C(C), \psi_D(C, D), \psi_I(I), \psi_G(G, I, D), \psi_S(S, I), \psi_L(L, G), \psi_J(J, L, S), \psi_H(H, G, J)} $$
(the textbook uses an undirected graph, not too important as variable elimination can be done all the same.)\\
(an explanation of $\psi$ will be given in week 5's notes)\\
\\
To compute p(J = 1), we could calculate all possible assignments\\
$$p(J) = \sum_{L} \sum_{S} \sum_{ G} \sum_{ H} \sum_{ I} \sum_{D} \sum_{C}p(C, D, I, G, S, L, J, H)$$
But we can do better with variable elimination. Where we push sums inside products.\\

\begin{align*}
p(J) &= \sum_{L,S,G,H,I,D,C} p(C, D, I, G, S, L, J, H)\\
&= \sum_{L,S,G,H,I,D,C}\psi_C(C)\psi_D(D, C)\psi_I(I)\psi_G(G, I, D)\psi_S(S, I)\psi_L(L, G) \times \psi_J(J, L, S)\psi_H(H, G, J)\\
&= \sum_{L,S}\psi_J(J, L, S)
\sum_{G}\psi_L(L, G)
\sum_{H}\psi_H(H, G, J)
\sum_{I}\psi_S(S, I)\psi_I(I)
\times \sum_{D}\psi_G(G, I, D)
\sum_{C}\psi_C(C)\psi_D(D, C)
\end{align*}
from here we will marginalize out each variable individually to get a new factor at each step.\\
We do it in the order C, D, I, H, G, S, L to get P(J)\\

\includegraphics[scale=0.6]{Screenshot_13.png}

\end{document}
119 changes: 119 additions & 0 deletions Lecture Notes/Images/Week5.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
documentclass{article}
usepackage[utf8]{inputenc}
usepackage{hyperref}
usepackage{amsmath}
usepackage{graphicx}
usepackage{amssymb}
title{CSC412 Notes Week 5}
author{Jerry Zheng}
date{April 2021}
hypersetup{
colorlinks=true,
linkcolor=blue,
filecolor=magenta,
urlcolor=blue,
pdftitle={Sharelatex Example},
bookmarks=true,
pdfpagemode=FullScreen,
}

begin{document}

maketitle
section{Problems with Directed Graphical Models}
For some problems, directionality for the edges in our DAGMs really hinders us.
for example, when we process an image we know that a pixel depends on its neighbours.
Lets Say pixel 2 depends on pixel 1 and pixel 3 depends on pixel 2.
We can extrapolate this into a Markov mesh.
includegraphics[scale=0.5]{Screenshot_15.png}
Of course, this model isnt very good because dependencies are directional and only go down and to the right.
Also, if we observe some pixels, then pixels nearby can be arbitrarily conditionally independent!

Alternatively we can have a Naive Bayes model by introducing a hidden class variable.
$$p(X) = sum_z p(X,z)$$
$$p(X) = sum_z prod_{x_iin X} p(x_iz)$$
includegraphics[scale=0.4]{Screenshot_16.png}

However there are issues with this too.
The top left and bot right pixels are dependent on each other which might not be desired.
Also, if we know what the class is then all the pixels are conditionally independent.

An alternative to DAGMs, is undirected graphical models (UGMs).


section{Undirected Graphical Models}
In UGMs, we have edges that captures relation between variables rather than defining them as parent and child.

includegraphics[scale=0.4]{Screenshot_17.png}

subsection{D-Sepearation in Undirected Graphical Models}
The following three properties are used to determine if nodes are conditionally independent
includegraphics[scale=0.2]{skggm_markov.png}

Global Markov Property
$X_A bot X_B X_C$ iff $(X_C)$ separates $(X_A)$ from $(X_B)$

Local Markov Property The set of nodes that renders a node conditionally independent of all the other nodes in the graph

$$X_j bot X_{V - {j,neighbour(j)}} X_{neighbour(j)}$$

Pairwise Markov Property The set of nodes that renders two nodes conditionally independent of each other.
$$X_j bot X_i X_{V - {j, i}}$$

It's obvious that global Markov implies local Markov which implies pairwise Markov.

subsection{limitations of UAGs and DAGs}
Note, though we can represent new relations between variables, we can't represent others.
includegraphics[scale=0.4]{Screenshot_18.png}
A DAG cant represent graph 1 where X and Z are conditionally dependent while a UAG can.
But a UAG cannot represent graph 2 where X and Z are conditionally independent without knowing Y.

subsection{Moralization}
This was only mentioned in passing during lecture but a DAG can be converted to a UAG using href{httpsen.wikipedia.orgwikiMoral_graph}{Moralization}
This is done adding edges between all pairs of non-adjacent nodes that have a common child, then making all edges in the graph undirected.
includegraphics[scale=0.2]{moralGraph-DAG.png} becomes
includegraphics[scale=0.2]{Moralized.png}
section{Cliques}
A clique in an undirected graph is a subset of its vertices such that every two vertices in the subset are connected by an edge.

A maximal clique is a clique that cannot be extended by including one more adjacent vertex.
A maximum clique is a clique of the largest possible size in a graph.

includegraphics[scale=0.8]{Screenshot_19.png}
The image shows 2 maximal cliques with the red clique also being the maximum clique

section{Hammersley-Clifford Theorem}
Since there is no topological ordering for an undirected graph, we can’t use the chain rule to represent p(y). So instead, we associate factors with each maximal clique.
We will denote the potential function for clique with $psi_c(y_ctheta_c)$ (back from the last week's notes) this can be any non-negative function.
The joint distribution is then defined to be proportional to the product of clique potentials.

Hammersley-Clifford Theorem A positive distribution $p(y) 0$ satisfies the CI properties of an undirected graph G iff p can be represented as a product of factors, one per maximal clique, i.e.,

$$P(ytheta) = frac{1}{Z(theta)} prod_C psi_c (y_ctheta_c)$$

Where $Z(theta)$ is the sum of all our possible values.
$$P(ytheta) = sum_C psi_c (y_ctheta_c)$$

going back to our example graph
includegraphics[scale=0.8]{Screenshot_20.png}
$$
p(ytheta) = frac{1}{Z(theta)} psi_{123}(y_1, y_2, y_3) psi_{234}(y_2, y_3, y_4) psi_{35}(y_3, y_5)
$$
$$
Z = sum_y psi_{123}(y_1, y_2, y_3)psi_{234}(y_2, y_3, y_4)psi_{35}(y_3, y_5)
$$
this is useful because we can represent terms in terms of cliques instead of edges with reduces the number of terms in variable elimination

section{Energy Based Models}
UGMs are very useful in physics. Take for example the Gibbs distribution used for modeling of Gibbs free Energy
$$p(yθ) = frac{1}{Z(theta)} exp(−sum_c E(y_ctheta_c)) $$
where $E(y_c) 0$ is the energy associated with the variables in clique c. We can convert this to
UGM by defining.
$$psi_c(y_ctheta_c) = exp(−E(y_ctheta_c))$$

so we can now model the energy state of say, a protein molecule as a UGM.
includegraphics[scale=0.6]{Screenshot_21.png}
not a molecule don't @ me

But going back to our initial example, it's east to see with a UAGM we can represent say, pixels in an image and have neighbouring pixels be related to each other.
end{document}
Binary file added Lecture Notes/Images/moralGraph-DAG.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Images/skggm_markov.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lecture Notes/Week_3.pdf
Binary file not shown.
Binary file added Lecture Notes/Week_4.pdf
Binary file not shown.
Binary file added Lecture Notes/Week_5.pdf
Binary file not shown.