chore(manuscript): update v2 diff file

pinellolab · Aug 21, 2024 · e0c843c · e0c843c
1 parent 29c9662
commit e0c843c
Showing 1 changed file with 174 additions and 1 deletion.
diff --git a/reproducibility/manuscript/v2.tex b/reproducibility/manuscript/v2.tex
@@ -270,7 +270,7 @@
 \renewcommand*\contentsname{Contents}
 {
 \hypersetup{linkcolor=}
-\setcounter{tocdepth}{3}
+\setcounter{tocdepth}{2}
 \tableofcontents
 }
 
@@ -661,6 +661,179 @@ \subsection{Model formulation}\label{sec-methods-model}
   &= \beta_g u\left(\tau^{\left(k_{cg}\right)}\right)-\gamma_g s\left(\tau^{\left(k_{cg}\right)}\right). \label{eq-dsdt}
 \end{align}
 
+\subsection{Posterior
+prediction}\label{sec-methods-posterior-prediction}
+
+To benchmark Pyro -Velocity performance in predicting cell fate, we
+generated the posterior samples measurement
+\(x_{cg}=\left(u\left(\tau^{(k_{cg})}\right),
+s\left(\tau^{(k_{cg})}\right)\right)\) or
+\(x_{cg}=\left(u_{cg}, s_{cg} \right)\), \(t_c\), \(\beta_g\),
+\(\gamma_g\) from the same single cell RNA-seq using \(N=30\) Monte
+Carlo samples from the posterior predictive distribution following
+\(p(x_{cg} \vert \theta, \psi) p(\theta, \psi) \approx p(x_{cg} \vert \theta,
+\psi) q_{\phi}(\theta, \psi)\). \(x_{cg}\) can be posterior samples of
+either denoised gene expression (used for phase portraits and vector
+field-based trajectory inference) or raw read counts (used for gene
+selection). Then we calculated the posterior samples of RNA velocity as
+\(v_{cg}=\frac{ds(\tau^{(k_{cg})})}{d \tau^{(k{cg})}}\) based on
+posterior samples measurement of denoised gene expression \(x_{cg}\) and
+\(\beta_{g}\), \(\gamma_{g}\).
+
+\subsection{Prioritization of cell fate
+markers}\label{sec-methods-fate-markers}
+
+We prioritize the cell fate markers using two metrics: First, the
+Pearson correlation between each gene's posterior mean of the denoised
+spliced expression and posterior mean of the shared time; Second, the
+negative mean absolute errors of each gene's observed spliced, and
+unspliced read counts with posterior predictive samples of spliced and
+unspliced raw read counts, i.e.
+\(\frac{1}{N n_c} \sum_i \sum_c (x_{icg}-\tilde{X}_{cg})\), where \(N\)
+is the posterior sample number that is set to \(30\), \(i\) is the
+posterior sample index, and \(n_c\) is the cell number. We first select
+the top \(300\) genes with the highest negative mean absolute errors and
+then rank the \(300\) genes based on the most positively correlated
+genes and the least negatively correlated genes. We use the same
+strategy for scVelo to rank the markers by the model likelihood to get
+the top 300 genes and then prioritize these genes by Pearson's
+correlation between scVelo latent time and normalized expression.
+
+\subsection{Single-cell data
+preprocessing}\label{sec-methods-preprocessing}
+
+We used scanpy and scVelo to handle the data input and output; thus,
+both h5ad and loom files generated by velocyto and kallisto
+\citep{Melsted2021-ap} are supported. The fully mature PBMC dataset was
+processed with the same procedure proposed in a review paper
+\citep{Bergen2021-qz}
+(\url{https://scvelo.readthedocs.io/perspectives/Perspectives/}). We
+reproduced this procedure using the scVelo package and raw read counts
+of the same top three dynamical genes NKG7, IGHM, and GNLY with the best
+likelihoods. The pancreas dataset was processed with scVelo using the
+following options
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\ImportTok{import}\NormalTok{ scvelo}
+
+\NormalTok{scvelo.pp.filter\_and\_normalize(}
+\NormalTok{  adata}\OperatorTok{=}\NormalTok{adata,}
+\NormalTok{  min\_shared\_counts}\OperatorTok{=}\DecValTok{30}\NormalTok{, }
+\NormalTok{  n\_top\_genes}\OperatorTok{=}\DecValTok{2000}\NormalTok{,}
+\NormalTok{)}
+\NormalTok{scvelo.pp.moments(}
+\NormalTok{  adata,}
+\NormalTok{  n\_pcs}\OperatorTok{=}\DecValTok{30}\NormalTok{,}
+\NormalTok{  n\_neighbors}\OperatorTok{=}\DecValTok{30}\NormalTok{,}
+\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+The same top variable genes with raw spliced and unspliced read counts
+were used as input for the Pyro -Velocity model. The original LARRY
+dataset of in vitro Hematopoiesis containing \(130,887\) cells was first
+filtered to remove cells without LARRY barcoding. \(49,302\) cells were
+recovered after this step with at least one LARRY barcode. For
+simplicity, we termed this filtered dataset with multiple cell fate
+(multi-fate) as the full dataset. Based on this dataset, we created two
+datasets with uni-fate progression toward monocyte or neutrophil based
+on the lineage LARRY barcodes and time information. Namely, we selected
+sets of cells with a single LARRY barcode, spanning three time points
+(day 2, 4, 6), and all the cells from the last time point (day 6) belong
+to a unique cell type (either monocyte or neutrophil). The two uni-fate
+datasets were combined to represent the bi-fate LARRY dataset. The
+multi-fate full dataset was processed using the same options as the
+pancreas dataset; the rest of the uni-fate and bi-fate datasets were
+processed using the following parameters
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{scvelo.pp.filter\_and\_normalize(}
+\NormalTok{  adata}\OperatorTok{=}\NormalTok{adata,}
+\NormalTok{  n\_top\_genes}\OperatorTok{=}\DecValTok{2000}\NormalTok{,}
+\NormalTok{  min\_shared\_counts}\OperatorTok{=}\DecValTok{20}\NormalTok{,}
+\NormalTok{)}
+\NormalTok{scvelo.pp.moments(adata)}
+\end{Highlighting}
+\end{Shaded}
+
+\subsection{scVelo model}\label{sec-methods-scvelo}
+
+We benchmarked the dynamical RNA velocity model implemented in scVelo
+\texttt{v0.2.4} for the pancreas and the four LARRY datasets using the
+same user options
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{scvelo.tl.recover\_dynamics(}
+\NormalTok{  data}\OperatorTok{=}\NormalTok{adata, }
+\NormalTok{  n\_jobs}\OperatorTok{=}\DecValTok{30}\NormalTok{,}
+\NormalTok{)}
+\NormalTok{scvelo.tl.velocity(}
+\NormalTok{  data}\OperatorTok{=}\NormalTok{adata, }
+\NormalTok{  mode}\OperatorTok{=}\StringTok{"dynamical"}\NormalTok{,}
+\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+Then, we tested a set of user options, including the neighboring cell
+numbers and the top variable gene numbers, in the pancreas dataset to
+explore the stability of the scVelo dynamical model. For the fully
+mature PBMC dataset, we followed the notebook proposed by the original
+authors \url{https://scvelo.readthedocs.io/perspectives/Perspectives/},
+i.e., we used the stochastic RNA velocity model implemented in scVelo
+with the top three likelihood genes. The latent time from scVelo was
+computed using their provided function
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{scvelo.tl.latent\_time(}
+\NormalTok{  data}\OperatorTok{=}\NormalTok{adata,}
+\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+\subsection{Trajectory
+inference}\label{sec-methods-trajectory-inference}
+
+\subsubsection{Velocity vector field}\label{velocity-vector-field}
+
+The velocity-based vector fields were generated using the scVelo
+function call
+
+\begin{Shaded}
+\begin{Highlighting}[]
+\NormalTok{scvelo.tl.velocity\_graph(adata)}
+\NormalTok{scvelo.tl.velocity\_embedding(}
+\NormalTok{  adata, }
+\NormalTok{  vkey}\OperatorTok{=}\StringTok{\textquotesingle{}velocity\textquotesingle{}}\NormalTok{,}
+\NormalTok{  basis}\OperatorTok{=}\StringTok{\textquotesingle{}embed\textquotesingle{}}\NormalTok{,}
+\NormalTok{)}
+\end{Highlighting}
+\end{Shaded}
+
+We used the default options for projecting the vector fields from scVelo
+models. Unlike scVelo, Pyro -Velocity uses statistics derived from
+posterior samples of the denoised spliced gene expression and posterior
+samples of the velocity estimation for building the cell state
+transition matrix estimates using cosine similarity. Pyro -Velocity uses
+the same projection method as scVelo for projecting the transition
+matrix into the two-dimensional vector field on the user-provided
+embedding space.
+
+\subsubsection{Clonal progression vector
+field}\label{clonal-progression-vector-field}
+
+No method exists for projecting clonal cell fates from the LARRY data
+and associated subsets thereof into a vector field. Thus, we designed a
+strategy for projecting clonal progression vector fields. Given cells
+sharing the same LARRY barcode, we considered the vectors connecting the
+embedding centroid of cells at consecutive time points and then
+leveraged Gaussian smoothing of nearest centroid directions to represent
+the clonal progression vector field informed by LARRY barcodes on a
+regular grid over the cell manifold.
+
 \subsection{Cell fate uncertainty
 estimation}\label{sec-methods-fate-uncertainty}