Skip to content

Commit

Permalink
fixup! Add tutorial for structural alphabets
Browse files Browse the repository at this point in the history
  • Loading branch information
padix-key committed Nov 2, 2024
1 parent f135b31 commit ec8ed77
Showing 1 changed file with 60 additions and 20 deletions.
80 changes: 60 additions & 20 deletions doc/tutorial/structure/alphabet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,11 @@ Just keep in mind that in the following examples the underlying structural alpha
can be substituted with minimal modifications.

Converting structures to sequences
-----------------------------------
----------------------------------
We start by getting the structure of our protein of interest.
In this case we will use ferredoxin from *E. coli* (PDB ID: ``2ZVS``).
After filtering out all non-amino acid residues, we create the *3Di* sequence for each
chain with :func:`to_3di()`.

.. jupyter-execute::

Expand All @@ -41,29 +45,50 @@ Converting structures to sequences
print(structural_sequences)
print(chain_starts)

.. jupyter-execute::
Note that :func:`to_3di()` returns not a single :class:`I3DSequence` sequence but a list
of sequences, one for each chain in the structure.
Accompanying the sequences, the function also returns the atom indices where each of the
chains starts.
As our structure contains only one chain, the desired 3Di sequence is the first and only
element in the list.

.. jupyter-execute::

ec_ferredoxin_3di = structural_sequences[0]
print(ec_ferredoxin_3di)

Each symbol in the sequence corresponds to one residue in the structure,
that can be extracted using the residue-level functionality of
:mod:`biotite.structure`.

Each symbol in this rather cryptic sequence corresponds to one residue in the structure.
To get the corresponding residues as :class:`.AtomArray` objects we can use the
residue-level functionality of :mod:`biotite.structure`.
While the sequence is barely human-readable, it true power lies in its ability to
be compared to *3Di* sequences from other proteins.

Sequence alignments on structural alphabets
-------------------------------------------

As mentioned in the :doc:`sequence chapter <../sequence/encoding>`, the sequence
based methods in :mod:`biotite.sequence` generally do not care about the type of
sequence.
This means that we can use any method we have learned so far on the structural
sequences.
This means that we can use any method we have learned so far on structural sequences as
well.
For the scope of this tutorial we will merely use :func:`align_optimal()` to find
corresponding residues in two structures.
As structure is much better conserved than its sequence, the alignment will even work
on remote homologs with low sequence identity, where a classical sequence alignment
would fail.
As the structure is generally much better conserved than its sequence, the alignment of
*3Di* sequences will even work on remote homologs with low amino acid sequence identity,
where a classical sequence alignment would fail.
To demonstrate this, we will compare the *E. coli* ferredoxin with the remotely similar
ferredoxin from the thermophilic archaeon *S. tokodaii*.

.. jupyter-execute::

pdbx_file = pdbx.BinaryCIFFile.read(rcsb.fetch("1XER", "bcif"))
st_ferredoxin = pdbx.get_structure(pdbx_file, model=1)
st_ferredoxin = st_ferredoxin[struc.filter_amino_acids(st_ferredoxin)]
st_ferredoxin_3di = strucalph.to_3di(st_ferredoxin)[0][0]

To align the two 3Di sequences, we merely need a :class:`.SubstitutionMatrix` that
matches the alphabet of the :class:`I3DSequence`.
Like for amino acid and nucleotide sequences, :mod:`biotite.sequence.align` provides
it out of the box with :func:`.SubstitutionMatrix.std_3di_matrix()`.

.. jupyter-execute::

Expand All @@ -85,19 +110,34 @@ would fail.
ax, alignment, matrix=matrix, labels=["EC", "ST"], symbols_per_line=50
)

If you prefer coloring the symbols in the alignment by their type, you are lucky:
:mod:`biotite.sequence.graphics` provides a
:doc:`color scheme <../../examples/gallery/sequence/misc/color_schemes>` for each of the
supported structural alphabets as well.

.. jupyter-execute::

fig, ax = plt.subplots(figsize=(8.0, 2.0))
graphics.plot_alignment_type_based(
ax, alignment, labels=["EC", "ST"], symbols_per_line=50
)

Example: Superimposing structures
---------------------------------

Now that we know the aligned residues from the alignment as *anchors* to superimpose
the structures.
One typical use case of structural alphabets is superimposing structures of remote
homologs.
Here the challenge is finding the corresponding residues in the two structures, whose
squared distance the superimposition algorithm should minimize.
The solution is to use the alignment of the structural alphabet:
One simply inputs the ``CA`` atoms of the aligned residues.

.. jupyter-execute::

def rmsd_from_alignment(fixed, mobile, alignment):
"""
A very simple function that extracts corresponding residues (the 'anchors') from
an alignment and uses them to run a superimposition.
Finally the RMSD on the superimposed structures plus the number of anchors is
A very simple function that extracts corresponding residues (the 'anchors')
from an alignment and uses them to run a superimposition.
Finally the RMSD of the superimposed structures plus the number of anchors is
returned.
"""
alignment_codes = align.get_codes(alignment)
Expand Down Expand Up @@ -126,7 +166,7 @@ the structures.
print("RMSD:", rmsd)

Again, with a classical amino acid sequence based approach the accuracy of the
superposition would be much lower:
superimposition would be much lower:
In this case less corresponding residues can be found from the the amino sequence
alignment and the RMSD between them is significantly higher.

Expand All @@ -151,6 +191,6 @@ alignment and the RMSD between them is significantly higher.
ax, alignment, matrix=matrix, labels=["EC", "ST"], symbols_per_line=50
)

This shows only a small fraction of the power of structural alphabets.
This shows only a small fraction of the versatility of structural alphabets.
They can also be used to find structural homologs in a large database, to superimpose
multiple structures at once and much more!

0 comments on commit ec8ed77

Please sign in to comment.