fixup! Add tutorial for structural alphabets

biotite-dev · Nov 2, 2024 · ec8ed77 · ec8ed77
1 parent f135b31
commit ec8ed77
Showing 1 changed file with 60 additions and 20 deletions.
diff --git a/doc/tutorial/structure/alphabet.rst b/doc/tutorial/structure/alphabet.rst
@@ -22,7 +22,11 @@ Just keep in mind that in the following examples the underlying structural alpha
 can be substituted with minimal modifications.
 
 Converting structures to sequences
------------------------------------
+----------------------------------
+We start by getting the structure of our protein of interest.
+In this case we will use ferredoxin from *E. coli* (PDB ID: ``2ZVS``).
+After filtering out all non-amino acid residues, we create the *3Di* sequence for each
+chain with :func:`to_3di()`.
 
 .. jupyter-execute::
 
@@ -41,29 +45,50 @@ Converting structures to sequences
     print(structural_sequences)
     print(chain_starts)
 
-    .. jupyter-execute::
+Note that :func:`to_3di()` returns not a single :class:`I3DSequence` sequence but a list
+of sequences, one for each chain in the structure.
+Accompanying the sequences, the function also returns the atom indices where each of the
+chains starts.
+As our structure contains only one chain, the desired 3Di sequence is the first and only
+element in the list.
+
+.. jupyter-execute::
 
     ec_ferredoxin_3di = structural_sequences[0]
     print(ec_ferredoxin_3di)
 
-Each symbol in the sequence corresponds to one residue in the structure,
-that can be extracted using the residue-level functionality of
-:mod:`biotite.structure`.
-
+Each symbol in this rather cryptic sequence corresponds to one residue in the structure.
+To get the corresponding residues as :class:`.AtomArray` objects we can use the
+residue-level functionality of :mod:`biotite.structure`.
+While the sequence is barely human-readable, it true power lies in its ability to
+be compared to *3Di* sequences from other proteins.
 
 Sequence alignments on structural alphabets
 -------------------------------------------
-
 As mentioned in the :doc:`sequence chapter <../sequence/encoding>`, the sequence
 based methods in :mod:`biotite.sequence` generally do not care about the type of
 sequence.
-This means that we can use any method we have learned so far on the structural
-sequences.
+This means that we can use any method we have learned so far on structural sequences as
+well.
 For the scope of this tutorial we will merely use :func:`align_optimal()` to find
 corresponding residues in two structures.
-As structure is much better conserved than its sequence, the alignment will even work
-on remote homologs with low sequence identity, where a classical sequence alignment
-would fail.
+As the structure is generally much better conserved than its sequence, the alignment of
+*3Di* sequences will even work on remote homologs with low amino acid sequence identity,
+where a classical sequence alignment would fail.
+To demonstrate this, we will compare the *E. coli* ferredoxin with the remotely similar
+ferredoxin from the thermophilic archaeon *S. tokodaii*.
+
+.. jupyter-execute::
+
+    pdbx_file = pdbx.BinaryCIFFile.read(rcsb.fetch("1XER", "bcif"))
+    st_ferredoxin = pdbx.get_structure(pdbx_file, model=1)
+    st_ferredoxin = st_ferredoxin[struc.filter_amino_acids(st_ferredoxin)]
+    st_ferredoxin_3di = strucalph.to_3di(st_ferredoxin)[0][0]
+
+To align the two 3Di sequences, we merely need a :class:`.SubstitutionMatrix` that
+matches the alphabet of the :class:`I3DSequence`.
+Like for amino acid and nucleotide sequences, :mod:`biotite.sequence.align` provides
+it out of the box with :func:`.SubstitutionMatrix.std_3di_matrix()`.
 
 .. jupyter-execute::
 
@@ -85,19 +110,34 @@ would fail.
         ax, alignment, matrix=matrix, labels=["EC", "ST"], symbols_per_line=50
     )
 
+If you prefer coloring the symbols in the alignment by their type, you are lucky:
+:mod:`biotite.sequence.graphics` provides a
+:doc:`color scheme <../../examples/gallery/sequence/misc/color_schemes>` for each of the
+supported structural alphabets as well.
+
+.. jupyter-execute::
+
+    fig, ax = plt.subplots(figsize=(8.0, 2.0))
+    graphics.plot_alignment_type_based(
+        ax, alignment, labels=["EC", "ST"], symbols_per_line=50
+    )
+
 Example: Superimposing structures
 ---------------------------------
-
-Now that we know the aligned residues from the alignment as *anchors* to superimpose
-the structures.
+One typical use case of structural alphabets is superimposing structures of remote
+homologs.
+Here the challenge is finding the corresponding residues in the two structures, whose
+squared distance the superimposition algorithm should minimize.
+The solution is to use the alignment of the structural alphabet:
+One simply inputs the ``CA`` atoms of the aligned residues.
 
 .. jupyter-execute::
 
     def rmsd_from_alignment(fixed, mobile, alignment):
         """
-        A very simple function that extracts corresponding residues (the 'anchors') from
-        an alignment and uses them to run a superimposition.
-        Finally the RMSD on the superimposed structures plus the number of anchors is
+        A very simple function that extracts corresponding residues (the 'anchors')
+        from an alignment and uses them to run a superimposition.
+        Finally the RMSD of the superimposed structures plus the number of anchors is
         returned.
         """
         alignment_codes = align.get_codes(alignment)
@@ -126,7 +166,7 @@ the structures.
     print("RMSD:", rmsd)
 
 Again, with a classical amino acid sequence based approach the accuracy of the
-superposition would be much lower:
+superimposition would be much lower:
 In this case less corresponding residues can be found from the the amino sequence
 alignment and the RMSD between them is significantly higher.
 
@@ -151,6 +191,6 @@ alignment and the RMSD between them is significantly higher.
         ax, alignment, matrix=matrix, labels=["EC", "ST"], symbols_per_line=50
     )
 
-This shows only a small fraction of the power of structural alphabets.
+This shows only a small fraction of the versatility of structural alphabets.
 They can also be used to find structural homologs in a large database, to superimpose
 multiple structures at once and much more!