You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>“The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. A notebook integrates code and its output into a single document that combines visualizations, narrative text, mathematical equations, and other rich media. The intuitive workflow promotes iterative and rapid development, making notebooks an increasingly popular choice at the heart of contemporary data science, analysis, and increasingly science at large.” - <ahref="https://www.dataquest.io/blog/jupyter-notebook-tutorial/">dataquest</a></p>
319
319
<p>All of our lessons will be presented in Jupyter notebooks due to their interactive nature (.ipynb file extension). They consist of two main attributes, a <code>kernel</code> and <code>cells</code>. * A <code>kernel</code> interprets and executes the code. Here we are using the kernel for python; however, you can specify a kernel for another language like R. * A <code>cell</code> is a container for either text or code to be executed.</p>
320
320
<p>To run the python code in a cell, you just hit shift + enter. Try it with the code below.</p>
<spanid="cb15-17"><ahref="#cb15-17" aria-hidden="true" tabindex="-1"></a>sc.pp.scale(adata, max_value<spanclass="op">=</span><spanclass="dv">10</span>)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
1183
+
<divclass="cell-output cell-output-stderr">
1184
+
<pre><code>/usr/local/lib/python3.10/dist-packages/scanpy/preprocessing/_normalization.py:170: UserWarning: Received a view of an AnnData. Making a copy.
<h2class="anchored" data-anchor-id="scanpy---pca-and-umap-clustering">scanpy - PCA and UMAP clustering</h2>
1187
1191
<p>PCA stands for principal component analysis. principal components (PCs) are axes capturing variation in your data. They are often used to reduce the dimensionality of your dataset and can be used in machine learning/regression models. See <ahref="https://towardsdatascience.com/a-step-by-step-introduction-to-pca-c0d78e26a0dd">A Step-By-Step Introduction to PCA</a> for a more detailed overview. Let’s calculate the PCs and visualize the first two PCs highlighting CST3 expression.</p>
<divclass="sourceCode cell-code" id="cb16"><preclass="sourceCode python code-with-copy"><codeclass="sourceCode python"><spanid="cb16-1"><ahref="#cb16-1" aria-hidden="true" tabindex="-1"></a><spanclass="co"># look at pcs to see how many pcs to use in neighborhood graph construction</span></span>
<spanid="cb16-3"><ahref="#cb16-3" aria-hidden="true" tabindex="-1"></a><spanclass="co"># pl ie plot just the first two principal components</span></span>
1192
-
<spanid="cb16-4"><ahref="#cb16-4" aria-hidden="true" tabindex="-1"></a>sc.pl.pca(adata, color<spanclass="op">=</span><spanclass="st">'CST3'</span>)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
1193
+
<divclass="sourceCode cell-code" id="cb17"><preclass="sourceCode python code-with-copy"><codeclass="sourceCode python"><spanid="cb17-1"><ahref="#cb17-1" aria-hidden="true" tabindex="-1"></a><spanclass="co"># look at pcs to see how many pcs to use in neighborhood graph construction</span></span>
<spanid="cb17-3"><ahref="#cb17-3" aria-hidden="true" tabindex="-1"></a><spanclass="co"># pl ie plot just the first two principal components</span></span>
1196
+
<spanid="cb17-4"><ahref="#cb17-4" aria-hidden="true" tabindex="-1"></a>sc.pl.pca(adata, color<spanclass="op">=</span><spanclass="st">'CST3'</span>)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
<p>In the figure above, each dot is a cell plotted against the first two PCs. The color of the dot is correlated with CST3 expression. It looks like there are three or four different clusters just based on these PCs and CST3 expression level.</p>
1202
1206
<p>Now, let’s create an elbow plot which will plot variance captured vs each PC. This gives us an idea of which PCs to use in clustering (those that capture the most variance).</p>
<divclass="sourceCode cell-code" id="cb17"><preclass="sourceCode python code-with-copy"><codeclass="sourceCode python"><spanid="cb17-1"><ahref="#cb17-1" aria-hidden="true" tabindex="-1"></a><spanclass="co"># note that this is a logarithmic scale of variance ratio</span></span>
1205
-
<spanid="cb17-2"><ahref="#cb17-2" aria-hidden="true" tabindex="-1"></a>sc.pl.pca_variance_ratio(adata, log<spanclass="op">=</span><spanclass="va">True</span>)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
1208
+
<divclass="sourceCode cell-code" id="cb18"><preclass="sourceCode python code-with-copy"><codeclass="sourceCode python"><spanid="cb18-1"><ahref="#cb18-1" aria-hidden="true" tabindex="-1"></a><spanclass="co"># note that this is a logarithmic scale of variance ratio</span></span>
1209
+
<spanid="cb18-2"><ahref="#cb18-2" aria-hidden="true" tabindex="-1"></a>sc.pl.pca_variance_ratio(adata, log<spanclass="op">=</span><spanclass="va">True</span>)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb18-4"><ahref="#cb18-4" aria-hidden="true" tabindex="-1"></a><spanclass="co"># initial clustering...this part isn't in the official demo but I think they forgot this part</span></span>
<spanid="cb19-4"><ahref="#cb19-4" aria-hidden="true" tabindex="-1"></a><spanclass="co"># initial clustering...this part isn't in the official demo but I think they forgot this part</span></span>
<spanid="cb20-2"><ahref="#cb20-2" aria-hidden="true" tabindex="-1"></a>sc.pl.umap(adata, color<spanclass="op">=</span>[<spanclass="st">'leiden'</span>, <spanclass="st">'CST3'</span>, <spanclass="st">'NKG7'</span>])</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
1262
+
<divclass="cell-output cell-output-stderr">
1263
+
<pre><code>/usr/local/lib/python3.10/dist-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
<p>In-class exercise 1: From the <ahref="#anndata">AnnData</a> section…instead of creating a <code>csr_matrix</code> can we create a pandas dataframe instead to look at the data more easily?</p>
0 commit comments