iml-wg · matthewfeickert · Feb 5, 2021 · Feb 5, 2021 · Feb 5, 2021 · Feb 5, 2021
diff --git a/HEPML.bib b/HEPML.bib
@@ -1,5 +1,15 @@
 # HEPML Papers
 
+% February 3, 2021
+@misc{collado2021learning,
+      title="{Learning to Isolate Muons}",
+      author={Julian Collado and Kevin Bauer and Edmund Witkowski and Taylor Faucett and Daniel Whiteson and Pierre Baldi},
+      year={2021},
+      eprint="{2102.02278}",
+      archivePrefix={arXiv},
+      primaryClass={physics.data-an}
+}
+
 % January 28, 2021
 @misc{gonzalez2021tackling,
       title="{Tackling the muon identification in water Cherenkov detectors problem for the future Southern Wide-field Gamma-ray Observatory by means of Machine Learning}",

diff --git a/HEPML.tex b/HEPML.tex
@@ -33,7 +33,7 @@
 \begin{document}
 \maketitle
 
-The purpose of this note is to collect references for modern machine learning as applied to particle physics.  A minimal number of categories is chosen in order to be as useful as possible.  Note that papers may be referenced in more than one category.  The fact that a paper is listed in this document does not endorse or validate its content - that is for the community (and for peer-review) to decide.  Furthermore, the classification here is a best attempt and may have flaws - please let us know if (a) we have missed a paper you think should be included, (b) a paper has been misclassified, or (c) a citation for a paper is not correct or if the journal information is now available.  In order to be as useful as possible, this document will continue to evolve so please check back\footnote{See \href{https://github.com/iml-wg/HEPML-LivingReview}{https://github.com/iml-wg/HEPML-LivingReview}.} before you write your next paper.  You can simply download the .bib file to get all of the latest references. 
+The purpose of this note is to collect references for modern machine learning as applied to particle physics.  A minimal number of categories is chosen in order to be as useful as possible.  Note that papers may be referenced in more than one category.  The fact that a paper is listed in this document does not endorse or validate its content - that is for the community (and for peer-review) to decide.  Furthermore, the classification here is a best attempt and may have flaws - please let us know if (a) we have missed a paper you think should be included, (b) a paper has been misclassified, or (c) a citation for a paper is not correct or if the journal information is now available.  In order to be as useful as possible, this document will continue to evolve so please check back\footnote{See \href{https://github.com/iml-wg/HEPML-LivingReview}{https://github.com/iml-wg/HEPML-LivingReview}.} before you write your next paper.  You can simply download the .bib file to get all of the latest references.
 
 \begin{itemize}
 \item \textbf{Reviews}
@@ -46,12 +46,12 @@
 \item \textbf{Classification}
 \\\textit{Given a feature space $x\in\mathbb{R}^n$, a binary classifier is a function $f:\mathbb{R}^n\rightarrow [0,1]$, where $0$ corresponds to features that are more characteristic of the zeroth class (e.g. background) and $1$ correspond to features that are more characteristic of the one class (e.g. signal).  Typically, $f$ will be a function specified by some parameters $w$ (e.g. weights and biases of a neural network) that are determined by minimizing a loss of the form $L[f]=\sum_{i}\ell(f(x_i),y_i)$, where $y_i\in\{0,1\}$ are labels.  The function $\ell$ is smaller when $f(x_i)$ and $y_i$ are closer.  Two common loss functions are the mean squared error $\ell(x,y)=(x-y)^2$ and the binary cross entropy $\ell(x,y)=y\log(x)+(1-y)\log(1-x)$.  Exactly what `more characteristic of' means depends on the loss function used to determine $f$.  It is also possible to make a multi-class classifier.  A common strategy for the multi-class case is to represent each class as a different basis vector in $\mathbb{R}^{n_\text{classes}}$ and then $f(x)\in[0,1]^{n_\text{classes}}$.  In this case, $f(x)$ is usually restricted to have its $n_\text{classes}$ components sum to one and the loss function is typically the cross entropy $\ell(x,y)=\sum_\text{classes $i$} y_i\log(x)$.}
 	\begin{itemize}
-		\item \textbf{Parameterized classifiers}~\cite{Baldi:2016fzo,Cranmer:2015bka,Nachman:2021yvi}. 
+		\item \textbf{Parameterized classifiers}~\cite{Baldi:2016fzo,Cranmer:2015bka,Nachman:2021yvi}.
 			\\\textit{A classifier that is conditioned on model parameters $f(x|\theta)$ is called a parameterized classifier.}
 		\item \textbf{Representations}
 			\\\textit{There is no unique way to represent high energy physics data.  It is often natural to encode $x$ as an image or another one of the structures listed below.}
 			\begin{itemize}
-				\item \textbf{Jet images}~\cite{Pumplin:1991kc,Cogan:2014oua,Almeida:2015jua,deOliveira:2015xxd,ATL-PHYS-PUB-2017-017,Lin:2018cin,Komiske:2018oaa,Barnard:2016qma,Komiske:2016rsd,Kasieczka:2017nvn,Macaluso:2018tck,li2020reconstructing,li2020attention,Lee:2019cad}
+				\item \textbf{Jet images}~\cite{Pumplin:1991kc,Cogan:2014oua,Almeida:2015jua,deOliveira:2015xxd,ATL-PHYS-PUB-2017-017,Lin:2018cin,Komiske:2018oaa,Barnard:2016qma,Komiske:2016rsd,Kasieczka:2017nvn,Macaluso:2018tck,li2020reconstructing,li2020attention,Lee:2019cad,collado2021learning}
 				\\\textit{Jets are collimated sprays of particles.  They have a complex radiation pattern and such, have been a prototypical example for many machine learning studies.  See the next item for a specific description about images.}
 				\item \textbf{Event images}~\cite{Nguyen:2018ugw,ATL-PHYS-PUB-2019-028,Lin:2018cin,Andrews:2018nwy,Chung:2020ysf}
 				\\\textit{A grayscale image is a regular grid with a scalar value at each grid point.  `Color' images have a fixed-length vector at each grid point.  Many detectors are analogous to digital cameras and thus images are a natural representation.  In other cases, images can be created by discretizing.   Convolutional neural networks are natural tools for processing image data.  One downside of the image representation is that high energy physics data tend to be sparse, unlike natural images.}
@@ -61,7 +61,7 @@
 				\\\textit{Recursive neural networks are natural tools for processing data in a tree structure.}
 				\item \textbf{Graphs}~\cite{Henrion:DLPS2017,Ju:2020xty,Martinez:2018fwc,Moreno:2019bmu,Qasim:2019otl,Chakraborty:2019imr,Chakraborty:2020yfc,1797439,1801423,1808887,Iiyama:2020wap,1811770,Choma:2020cry,alonsomonsalve2020graph,guo2020boosted,Heintz:2020soy,Verma:2020gnq,Dreyer:2020brq,Qian:2021vnh,Pata:2021oez}
 				\\\textit{A graph is a collection of nodes and edges.  Graph neural networks are natural tools for processing data in a tree structure.}
-				\item \textbf{Sets (point clouds)}~\cite{Komiske:2018cqr,Qu:2019gqs,Mikuni:2020wpr,Shlomi:2020ufi,Dolan:2020qkr,Fenton:2020woz,Lee:2020qil}
+				\item \textbf{Sets (point clouds)}~\cite{Komiske:2018cqr,Qu:2019gqs,Mikuni:2020wpr,Shlomi:2020ufi,Dolan:2020qkr,Fenton:2020woz,Lee:2020qil,collado2021learning}
 				\\\textit{A point cloud is a (potentially variable-size) set of points in space.  Sets are distinguished from sequences in that there is no particular order (i.e. permutation invariance).  Sets can also be viewed as graphs without edges and so graph methods that can parse variable-length inputs may also be appropriate for set learning, although there are other methods as well.}
 				\item \textbf{Physics-inspired basis}~\cite{Datta:2019,Datta:2017rhs,Datta:2017lxt,Komiske:2017aww,Butter:2017cot}
 				\\\textit{This is a catch-all category for learning using other representations that use some sort of manual or automated physics-preprocessing.}

diff --git a/README.md b/README.md
@@ -61,6 +61,7 @@ The purpose of this note is to collect references for modern machine learning as
             * [Reconstructing boosted Higgs jets from event image segmentation](https://arxiv.org/abs/2008.13529)
             * [An Attention Based Neural Network for Jet Tagging](https://arxiv.org/abs/2009.00170)
             * [Quark-Gluon Jet Discrimination Using Convolutional Neural Networks](https://arxiv.org/abs/2012.02531) [[DOI](https://doi.org/10.3938/jkps.74.219)]
+            * [Learning to Isolate Muons](https://arxiv.org/abs/2102.02278)
 
         *  Event images
 
@@ -114,6 +115,7 @@ The purpose of this note is to collect references for modern machine learning as
             * [Equivariant Energy Flow Networks for Jet Tagging](https://arxiv.org/abs/2012.00964)
             * [Permutationless Many-Jet Event Reconstruction with Symmetry Preserving Attention Networks](https://arxiv.org/abs/2010.09206)
             * [Zero-Permutation Jet-Parton Assignment using a Self-Attention Network](https://arxiv.org/abs/2012.03542)
+            * [Learning to Isolate Muons](https://arxiv.org/abs/2102.02278)
 
         *  Physics-inspired basis