Skip to content

Commit

Permalink
chore(manuscript): update v2 diff file
Browse files Browse the repository at this point in the history
  • Loading branch information
cameronraysmith committed Aug 21, 2024
1 parent 29c9662 commit e0c843c
Showing 1 changed file with 174 additions and 1 deletion.
175 changes: 174 additions & 1 deletion reproducibility/manuscript/v2.tex
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,7 @@
\renewcommand*\contentsname{Contents}
{
\hypersetup{linkcolor=}
\setcounter{tocdepth}{3}
\setcounter{tocdepth}{2}
\tableofcontents
}

Expand Down Expand Up @@ -661,6 +661,179 @@ \subsection{Model formulation}\label{sec-methods-model}
&= \beta_g u\left(\tau^{\left(k_{cg}\right)}\right)-\gamma_g s\left(\tau^{\left(k_{cg}\right)}\right). \label{eq-dsdt}
\end{align}
\subsection{Posterior
prediction}\label{sec-methods-posterior-prediction}
To benchmark Pyro -Velocity performance in predicting cell fate, we
generated the posterior samples measurement
\(x_{cg}=\left(u\left(\tau^{(k_{cg})}\right),
s\left(\tau^{(k_{cg})}\right)\right)\) or
\(x_{cg}=\left(u_{cg}, s_{cg} \right)\), \(t_c\), \(\beta_g\),
\(\gamma_g\) from the same single cell RNA-seq using \(N=30\) Monte
Carlo samples from the posterior predictive distribution following
\(p(x_{cg} \vert \theta, \psi) p(\theta, \psi) \approx p(x_{cg} \vert \theta,
\psi) q_{\phi}(\theta, \psi)\). \(x_{cg}\) can be posterior samples of
either denoised gene expression (used for phase portraits and vector
field-based trajectory inference) or raw read counts (used for gene
selection). Then we calculated the posterior samples of RNA velocity as
\(v_{cg}=\frac{ds(\tau^{(k_{cg})})}{d \tau^{(k{cg})}}\) based on
posterior samples measurement of denoised gene expression \(x_{cg}\) and
\(\beta_{g}\), \(\gamma_{g}\).
\subsection{Prioritization of cell fate
markers}\label{sec-methods-fate-markers}
We prioritize the cell fate markers using two metrics: First, the
Pearson correlation between each gene's posterior mean of the denoised
spliced expression and posterior mean of the shared time; Second, the
negative mean absolute errors of each gene's observed spliced, and
unspliced read counts with posterior predictive samples of spliced and
unspliced raw read counts, i.e.
\(\frac{1}{N n_c} \sum_i \sum_c (x_{icg}-\tilde{X}_{cg})\), where \(N\)
is the posterior sample number that is set to \(30\), \(i\) is the
posterior sample index, and \(n_c\) is the cell number. We first select
the top \(300\) genes with the highest negative mean absolute errors and
then rank the \(300\) genes based on the most positively correlated
genes and the least negatively correlated genes. We use the same
strategy for scVelo to rank the markers by the model likelihood to get
the top 300 genes and then prioritize these genes by Pearson's
correlation between scVelo latent time and normalized expression.
\subsection{Single-cell data
preprocessing}\label{sec-methods-preprocessing}
We used scanpy and scVelo to handle the data input and output; thus,
both h5ad and loom files generated by velocyto and kallisto
\citep{Melsted2021-ap} are supported. The fully mature PBMC dataset was
processed with the same procedure proposed in a review paper
\citep{Bergen2021-qz}
(\url{https://scvelo.readthedocs.io/perspectives/Perspectives/}). We
reproduced this procedure using the scVelo package and raw read counts
of the same top three dynamical genes NKG7, IGHM, and GNLY with the best
likelihoods. The pancreas dataset was processed with scVelo using the
following options
\begin{Shaded}
\begin{Highlighting}[]
\ImportTok{import}\NormalTok{ scvelo}
\NormalTok{scvelo.pp.filter\_and\_normalize(}
\NormalTok{ adata}\OperatorTok{=}\NormalTok{adata,}
\NormalTok{ min\_shared\_counts}\OperatorTok{=}\DecValTok{30}\NormalTok{, }
\NormalTok{ n\_top\_genes}\OperatorTok{=}\DecValTok{2000}\NormalTok{,}
\NormalTok{)}
\NormalTok{scvelo.pp.moments(}
\NormalTok{ adata,}
\NormalTok{ n\_pcs}\OperatorTok{=}\DecValTok{30}\NormalTok{,}
\NormalTok{ n\_neighbors}\OperatorTok{=}\DecValTok{30}\NormalTok{,}
\NormalTok{)}
\end{Highlighting}
\end{Shaded}
The same top variable genes with raw spliced and unspliced read counts
were used as input for the Pyro -Velocity model. The original LARRY
dataset of in vitro Hematopoiesis containing \(130,887\) cells was first
filtered to remove cells without LARRY barcoding. \(49,302\) cells were
recovered after this step with at least one LARRY barcode. For
simplicity, we termed this filtered dataset with multiple cell fate
(multi-fate) as the full dataset. Based on this dataset, we created two
datasets with uni-fate progression toward monocyte or neutrophil based
on the lineage LARRY barcodes and time information. Namely, we selected
sets of cells with a single LARRY barcode, spanning three time points
(day 2, 4, 6), and all the cells from the last time point (day 6) belong
to a unique cell type (either monocyte or neutrophil). The two uni-fate
datasets were combined to represent the bi-fate LARRY dataset. The
multi-fate full dataset was processed using the same options as the
pancreas dataset; the rest of the uni-fate and bi-fate datasets were
processed using the following parameters
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{scvelo.pp.filter\_and\_normalize(}
\NormalTok{ adata}\OperatorTok{=}\NormalTok{adata,}
\NormalTok{ n\_top\_genes}\OperatorTok{=}\DecValTok{2000}\NormalTok{,}
\NormalTok{ min\_shared\_counts}\OperatorTok{=}\DecValTok{20}\NormalTok{,}
\NormalTok{)}
\NormalTok{scvelo.pp.moments(adata)}
\end{Highlighting}
\end{Shaded}
\subsection{scVelo model}\label{sec-methods-scvelo}
We benchmarked the dynamical RNA velocity model implemented in scVelo
\texttt{v0.2.4} for the pancreas and the four LARRY datasets using the
same user options
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{scvelo.tl.recover\_dynamics(}
\NormalTok{ data}\OperatorTok{=}\NormalTok{adata, }
\NormalTok{ n\_jobs}\OperatorTok{=}\DecValTok{30}\NormalTok{,}
\NormalTok{)}
\NormalTok{scvelo.tl.velocity(}
\NormalTok{ data}\OperatorTok{=}\NormalTok{adata, }
\NormalTok{ mode}\OperatorTok{=}\StringTok{"dynamical"}\NormalTok{,}
\NormalTok{)}
\end{Highlighting}
\end{Shaded}
Then, we tested a set of user options, including the neighboring cell
numbers and the top variable gene numbers, in the pancreas dataset to
explore the stability of the scVelo dynamical model. For the fully
mature PBMC dataset, we followed the notebook proposed by the original
authors \url{https://scvelo.readthedocs.io/perspectives/Perspectives/},
i.e., we used the stochastic RNA velocity model implemented in scVelo
with the top three likelihood genes. The latent time from scVelo was
computed using their provided function
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{scvelo.tl.latent\_time(}
\NormalTok{ data}\OperatorTok{=}\NormalTok{adata,}
\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\subsection{Trajectory
inference}\label{sec-methods-trajectory-inference}
\subsubsection{Velocity vector field}\label{velocity-vector-field}
The velocity-based vector fields were generated using the scVelo
function call
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{scvelo.tl.velocity\_graph(adata)}
\NormalTok{scvelo.tl.velocity\_embedding(}
\NormalTok{ adata, }
\NormalTok{ vkey}\OperatorTok{=}\StringTok{\textquotesingle{}velocity\textquotesingle{}}\NormalTok{,}
\NormalTok{ basis}\OperatorTok{=}\StringTok{\textquotesingle{}embed\textquotesingle{}}\NormalTok{,}
\NormalTok{)}
\end{Highlighting}
\end{Shaded}
We used the default options for projecting the vector fields from scVelo
models. Unlike scVelo, Pyro -Velocity uses statistics derived from
posterior samples of the denoised spliced gene expression and posterior
samples of the velocity estimation for building the cell state
transition matrix estimates using cosine similarity. Pyro -Velocity uses
the same projection method as scVelo for projecting the transition
matrix into the two-dimensional vector field on the user-provided
embedding space.
\subsubsection{Clonal progression vector
field}\label{clonal-progression-vector-field}
No method exists for projecting clonal cell fates from the LARRY data
and associated subsets thereof into a vector field. Thus, we designed a
strategy for projecting clonal progression vector fields. Given cells
sharing the same LARRY barcode, we considered the vectors connecting the
embedding centroid of cells at consecutive time points and then
leveraged Gaussian smoothing of nearest centroid directions to represent
the clonal progression vector field informed by LARRY barcodes on a
regular grid over the cell manifold.
\subsection{Cell fate uncertainty
estimation}\label{sec-methods-fate-uncertainty}
Expand Down

0 comments on commit e0c843c

Please sign in to comment.