diff --git a/forward_sims.md b/forward_sims.md
index c90b186d..b0cef078 100644
--- a/forward_sims.md
+++ b/forward_sims.md
@@ -11,26 +11,476 @@ kernelspec:
   name: python3
 ---
 
+```{currentmodule} tskit
+```
+
 (sec_tskit_forward_simulations)=
 
-# _Building a forward simulator_
+# Building a forward simulator
 
-% remove underscores in title when tutorial is complete or near-complete
+This tutorial shows how tskit can be used to build your own forward-time, tree sequence simulator from scratch.
+The simulator will use the discrete-time Wright-Fisher model, and track individuals
+along with their genomes, storing inherited genomic regions as well as the full pedigree.
 
-This tutorial shows how tskit can be used to
-build your own forwards-in-time tree sequence recording simulator from scratch.
+The code in this tutorial is broken into separate functions for clarity and
+to make it easier to modify for your own purposes; a simpler and substantially
+condensed forward-simulator is coded as a single function at the top of the
+{ref}`sec_completing_forwards_simulations` tutorial.
 
 :::{note}
 If you are simply trying to obtain a tree sequence which is
-the result of a forwards-in-time simulation, this can be done by using one of the
-highly capable forwards-in-time genetic simulators that already exist, such as
+the result of a forward-time simulation, this can be done by using one of the
+highly capable forward-time genetic simulators that already exist, such as
 [SLiM](https://messerlab.org/slim/) or [fwdpy11](https://github.com/molpopgen/fwdpy11).
 Documentation and tutorials for these tools exist on their respective websites. This
-tutorial focusses instead on illustrating the general principles that lie behind such
+tutorial is instead intended to illustrate the general principles that lie behind such
 simulators.
 :::
 
-:::{todo}
-Add details on building a forward simulator (see issue
-[#14](https://github.com/tskit-dev/tutorials/issues/14))
+We will focus on the case of diploids, in which each {ref}`individual<sec_nodes_or_individuals>` contains 2 genomes,
+but the concepts used here generalize to any ploidy, if you are willing to do the book-keeping.
+The individuals themselves are not strictly necessary for representing genetic genealogies
+(it's the genomes which are important), but they are needed during the simulation,
+and so we record them in the resulting output for completeness.
+
+## Definitions
+
+Before we can make any progress, we require a few definitions.
+
+A *node* represents a genome at a point in time (often we imagine this as the "birth time" of the genome).
+It can be described by a tuple, `(id, time)`, where `id` is a unique integer,
+and `time` reflects the birth time of that `id`. When generating a tree sequence,
+this will be stored in a row of the {ref}`sec_node_table_definition`.
+
+A *diploid individual* is a group of two nodes. During simulation,
+a simple and efficient grouping assigns sequential pairs of node IDs to an individual.
+It can be helpful (but not strictly necessary) to store individuals within the tree sequence
+as rows of the the {ref}`sec_individual_table_definition` (a node can then be assigned to an
+individual by storing that individual's id in the appropriate row of the node table).
+
+An *edge* reflects a transmission event between nodes.  An edge is a tuple `(Left, Right, Parent, Child)`
+whose meaning is "Child genome $C$ inherited the genomic interval $[L, R)$ from $Parent genome $P$".
+In a tree sequence this is stored in a row of the {ref}`sec_edge_table_definition`.
+
+The *time*, in the discrete-time Wright-Fisher (WF) model which we will simulate, is measured in
+integer generations. To match the tskit notion of time, we record time in *generations ago*:
+i.e. for a simple simulation of G generations, we start the simulation at generation $G-1$
+and count down until we reach generation 0 (the current-day).
+
+The *population* consists of $N$ diploid individuals ($2N$ nodes) at a particular time $t$.
+At the start, the population will have no known ancestry, but subsequently
+each individual will be formed by choosing (at random) two parent individuals from the
+population in the previous generation.
+
+## Approach
+
+We will generate edges, nodes, and individuals forwards in time, adding them to the relevant `tskit` tables.
+To aid efficiency, we will also see how to "simplify" the tables into the minimal set of nodes and edges that
+describe the history of the sample. Finally, these tables can be exported into an immutable
+{class}`TreeSequence` for storing or analysis.
+
+### Setup
+First, we'll import the necessary libraries and define some general parameters.
+The [numpy](https://numpy.org/doc/stable/) library will be used to produce random numbers.
+
+```{code-cell} ipython3
+import tskit
+import numpy as np
+
+random_seed = 2
+random = np.random.default_rng(random_seed)  # A random number generator for general use
+
+L = 50_000  # The sequence length = 50 Kb
+```
+
+### Core steps
+
+ The core simulation function generates a new population by repeating the following steps:
+ 1. Pick a pair of parent individuals at random from the population in the previous generation.
+ 2. Create a new child individual with two genomes (an individual ID will be created
+    by {meth}`adding a row to the individual table<IndividualTable.add_row>`
+    and two nodes IDs will be created by
+    {meth}`adding two rows to the node table<NodeTable.add_row>`. If desired,
+    we can also provide the parent IDs when adding to the individual table, which
+    will result in storing the entire genealogical pedigree.
+ 3. Add the inheritance paths for each child genome using an `add_inheritance_paths()`
+    function (defined later).
+
+ We will also define a simpler function, involving just step 2, to create the initial
+ population.
+
+ For convenience, the population will be stored in a Python dictionary that maps
+ the individual ID to the IDs of its two genomes. Note that in lower-level
+ languages such as C, rather than use a mapping of individual ID to two genomes,
+ we could instead rely on the fact that genome IDs are allocated sequentially, and
+ pick a random pair of genomes from the previous population directly,
+ without using individual IDs.
+
+
+```{code-cell} ipython3
+# For visualising unsimplified tree sequences, it can help to flag all nodes as samples
+default_node_flags = tskit.NODE_IS_SAMPLE
+
+def make_diploid(tables, time, parent_individuals=None) -> tuple[int, tuple[int, int]]:
+    """
+    Make an individual and its diploid genomes by adding to tables, returning the IDs.
+    Specifying parent_individuals is optional but results in the pedigree being stored.
+    """
+    individual_id = tables.individuals.add_row(parents=parent_individuals)
+    return individual_id, (
+        tables.nodes.add_row(default_node_flags, time, individual=individual_id),
+        tables.nodes.add_row(default_node_flags, time, individual=individual_id),
+    )
+
+def new_population(tables, time, prev_pop, recombination_rate) -> dict[int, tuple[int, int]]:
+    pop = {}  # fill with individual_ID: (maternal_genome_ID, paternal_genome_ID)
+
+    # Cache the list of individual IDs in the previous population, for efficiency
+    prev_individuals = np.array([i for i in prev_pop.keys()], dtype=np.int32)
+
+    for _ in range(len(prev_pop)):
+        # 1. Pick two individual parent IDs at random, `replace=True` allows selfing
+        mother_and_father = random.choice(prev_individuals, 2, replace=True)
+
+        # 2. Get 1 new individual ID + 2 new node IDs
+        child_id, child_genomes = make_diploid(tables, time, mother_and_father)
+        pop[child_id] = child_genomes  # store the genome IDs
+        
+        # 3. Add inheritance paths to both child genomes
+        for child_genome, parent_individual in zip(child_genomes, mother_and_father):
+            parent_genomes = prev_pop[parent_individual]
+            add_inheritance_paths(tables, parent_genomes, child_genome, recombination_rate)
+    return pop
+
+def initialise_population(tables, time, size) -> dict[int, tuple[int, int]]:
+    # Just return a dictionary by repeating step 2 above
+    return dict(make_diploid(tables, time) for _ in range(size))
+```
+
+:::{note}
+For simplicity, the code above assumes any parent can be a mother or a father (i.e. this is a hermaphrodite species).
+It also allows the same parent to be chosen as a mother and as a father (i.e. "selfing" is allowed),
+which gives simpler theoretical results. This is easy to change if required.
+:::
+
+Our forward-time simulator simply involves repeatedly running the `next_population()` routine,
+replacing the old population with the new one. For efficiency reasons, `tskit` has strict requirements
+for the order of edges in the edge table, so we need to {meth}`~TableCollection.sort` the tables before we output the final tree sequence.
+
+```{code-cell} ipython3
+def forward_WF(num_diploids, seq_len, generations, recombination_rate=0, random_seed=2):
+    """
+    Run a forward-time Wright Fisher simulation of a diploid population, returning
+    a tree sequence representing the genetic genealogy of the simulated genomes.
+    """
+    global random
+    random = np.random.default_rng(random_seed) 
+    tables = tskit.TableCollection(seq_len)
+    tables.time_units = "generations"  # optional, but helpful when plotting
+
+    pop = initialise_population(tables, generations, num_diploids)
+    while generations > 0:
+        generations = generations - 1
+        pop = new_population(tables, generations, pop, recombination_rate)
+
+    tables.sort()
+    return tables.tree_sequence()
+```
+
+### Inheritance without recombination
+
+The final piece of the simulation is to define the `add_inheritance_paths()` function,
+which saves the inheritance paths in the {ref}`sec_edge_table_definition`.
+For reference, the simplest case (a small focal region in which there is no recombination)
+can be coded as follows:
+
+```{code-cell} ipython3
+def add_inheritance_paths(tables, parent_nodes, child_node, recombination_rate):
+    "Add inheritance paths from a randomly chosen parent genome to the child genome."
+    assert recombination_rate == 0
+    left, right = [20_000, 21_000]  # only define inheritance in this focal region
+    inherit_from = random.integers(2)  # randomly choose the 1st or the 2nd parent node
+    tables.edges.add_row(left, right, parent_nodes[inherit_from], child_node)
+```
+
+### Inheritance with recombination
+
+Recombination adds complexity to the inheritance paths from a child to its parents,
+that the child inherits a mosaic of the two genomes present in each parent.
+
+The exact details of the mosaic will depend on the model of recombination you
+wish to implement. For instance, a simple model such as that in the
+{ref}`sec_completing_forwards_simulations` tutorial might assume exactly one
+crossover per chromosome. A complex model might allow not just multiple
+crossovers with e.g. recombination "hotspots", but also non-crossover events
+such as {ref}`msprime:sec_ancestry_gene_conversion`.
+
+Below is a redefined `add_inheritance_paths()` function of intermediate complexity,
+which models recombination as a uniform Poisson process along a genomic interval.
+We generate a set of "breakpoints" along the genome, then allocates edges from the
+child genome to one or the other genome from the parent, with left and right positions
+reflecting the breakpoint positions. Note that real recombination rates are usually
+such that they result in relatively
+few breakpoints per chromosome (in humans, around 1 or 2).
+
+```{code-cell} ipython3
+def add_inheritance_paths(tables, parent_genomes, child_genome, recombination_rate):
+    "Add paths from parent genomes to the child genome, with crossover recombination."
+    L = tables.sequence_length
+    num_recombinations = random.poisson(recombination_rate * L)
+    breakpoints = random.uniform(0, L, size=num_recombinations)
+    breakpoints = np.concatenate(([0], np.unique(breakpoints), [L]))
+    inherit_from = random.integers(2)  # starting parental genome
+
+    # iterate over pairs of ([0, b1], [b1, b2], [b2, b3], ... [bN, L])
+    for left, right in zip(breakpoints[:-1], breakpoints[1:]):
+        tables.edges.add_row(
+            left, right, parent_genomes[inherit_from], child_genome)
+        inherit_from = 1 - inherit_from  # switch to other parent genome
+```
+
+:::{note}
+Above, breakpoint positions occur on a continuous line (i.e. "infinite breakpoint positions"),
+to match population genetic theory. It is relatively easy to alter this to
+allow recombinations only at integer positions
 :::
+
+### Basic examples
+
+Now we can test the `forward_WF()` function for a single generation with a small
+population size of 6 diploids, and print out the resulting tree sequence. For simplicity,
+we will omit recombination for now.
+
+```{code-cell} ipython3
+ts = forward_WF(6, L, generations=1)
+ts.draw_svg(y_axis=True, size=(500, 200))
+```
+
+It looks like it is working correctly: all 12 genomes (6 diploids) in the current generation at time=0 trace back to a 
+genome in the initial generation at time=1. Note that not all individuals in the initial generation have passed on genetic
+material at this genomic position (they appear as isolated nodes at the top of the plot).
+
+Now let's simulate for a longer time period, and set a few helpful plotting parameters.
+
+:::{note}
+By convention we plot the most recent generation at the bottom of the plot
+(i.e. perversely, each "tree" has leaves towards the bottom, and roots at the top)
+:::
+
+```{code-cell} ipython3
+ts = forward_WF(6, L, generations=20)
+
+graphics_params = {
+    "y_axis": True,
+    "y_label": f"Time ({ts.time_units} ago)",
+    "y_ticks": {i: 'Current' if i==0 else str(i) for i in range(21)},
+}
+ts.draw_svg(size=(1200, 400), **graphics_params)
+```
+
+This is starting to look like a real genealogy! Clearly, however, there are many
+"extinct" lineages that have not made it to the current day.
+
+(sec_tskit_forward_simulations_simplification)=
+
+## Simplification
+
+The key to efficent forward-time genealogical simulation is the process of {ref}`sec_simplification`,
+which can reduce much of the complexity shown in the tree above.
+Typically, we want to remove all the lineages that do not contribute to the current day genomes.
+We do this via the {meth}`~tskit.TreeSequence.simplify` method, specifying that only the nodes
+in the current generation are "samples".
+
+```{code-cell} ipython3
+current_day_genomes = ts.samples(time=0)
+simplified_ts = ts.simplify(current_day_genomes, keep_unary=True, filter_nodes=False)
+simplified_ts.draw_svg(size=(600, 400), **graphics_params)
+```
+
+(sec_tskit_forward_simulations_simplification_id_changes)=
+
+### ID changes
+
+We just simplified with `filter_nodes=False`, meaning that the tree sequence retained
+all nodes even after simplification, even those that are no longer part of
+the genealogy. By default (if `filter_nodes` is not specified), these nodes are removed,
+which changes the node IDs.
+
+```{code-cell} ipython3
+simplified_ts = ts.simplify(current_day_genomes, keep_unary=True)
+simplified_ts.draw_svg(size=(600, 300), **graphics_params)
+```
+
+You can see that the list of nodes passed to {meth}`~tskit.TreeSequence.simplify` (i.e. the current-day genomes)
+have become the first nodes in the table, numbered from 0..11;
+the remaining (internal) nodes have been renumbered from youngest to oldest.
+
+### Removing intermediate nodes
+
+The `keep_unary=True` parameter meant that we kept intermediate ("unary") nodes,
+even those that do not not represent branch-points in the tree.
+Often these are also unneeded, and by default we remove those too, although
+this will mean that we lose track of the pedigree of the individuals
+(which is stored in the parents column of the {ref}`sec_individual_table_definition`).
+Since we are removing more nodes, the node IDs of non-samples will again change. 
+
+```{code-cell} ipython3
+simplified_ts = ts.simplify(current_day_genomes)
+simplified_ts.draw_svg(size=(400, 300), y_axis=True)
+```
+
+This is now looking much more like a "normal" genetic genealogy (a "gene tree"),
+in which all the sample genomes trace back to a single common ancestor.
+
+## Recombination
+
+If we pass a non-zero recombination rate to the `forward_WF()` function, different regions
+of the genome may have different ancestries. This results in multiple trees along the genome.
+
+```{code-cell} ipython3
+rho = 1e-7
+ts = forward_WF(6, L, generations=50, recombination_rate=rho)
+print(f"A recombination rate of {rho} has created {ts.num_trees} trees over {ts.sequence_length} bp")
+```
+
+Here's how the full (unsimplified) genealogy looks (labels omitted for clarity):
+
+```{code-cell} ipython3
+graphics_params["y_ticks"] = [0, 10, 20, 30, 40 ,50]
+ts.draw_svg(size=(1000, 400), node_labels={}, symbol_size=2, **graphics_params)
+```
+
+Because we are showing the extinct lineages and the recombinations associated with them,
+this plot has many trees and nodes, and is rather confusing.
+However, {ref}`sec_tskit_forward_simulations_simplification` can be equally applied to a
+recombinant genealogy, and will reduce the genealogy to something more managable for
+analysis and visualization.
+
+```{code-cell} ipython3
+ts = forward_WF(6, L, generations=100, recombination_rate=rho)
+simplified_ts = ts.simplify(ts.samples(time=0))
+graphics_params["y_ticks"] = [0, 10, 20, 30, 40 ,50]
+simplified_ts.draw_svg(size=(1000, 300), **graphics_params)
+```
+
+
+### Larger simulations
+
+So far we have only simulated a relatively small population size (12 genomes / 6 diploids)
+for a short time. We can easily simulate for longer times, and increase the population size.
+In this case, simplification
+can have a dramatic effect on the disk storage and memory required:
+
+```{code-cell}
+from datetime import datetime
+
+population_size = 100
+gens = 1000  # ten times the population size
+L = 500_000  # ten times the sequence length
+
+start = datetime.now()
+large_ts = forward_WF(population_size, L, gens, recombination_rate=rho)
+print(
+    f"Simulated {population_size} individuals ({population_size * 2} genomes)",
+    f"for {gens} generations in {(datetime.now() - start).seconds:.1f} seconds",
+)
+```
+
+```{code-cell}
+print(f"Full tree sequence including dead lineages: {large_ts.nbytes/1024/1024:.2f} MB")
+current_day_genomes = large_ts.samples(time=0)
+simplified_ts = large_ts.simplify(current_day_genomes, keep_input_roots=True)
+print(
+    f"Tree sequence of current-day individuals: {simplified_ts.nbytes/1024/1024:.2f} MB,",
+    f"{simplified_ts.num_trees} trees."
+)
+print(
+    "Simplification has reduced the size by a factor of",
+    f"{large_ts.nbytes / simplified_ts.nbytes:.2f}"
+)
+```
+
+The most obvious improvement when simulating genealogies in forward time
+is therefore to carry out regular simplification steps *during* the simulation,
+rather than just at the end. This is
+described in the next tutorial: 
+
+
+(sec_forward_simulations_ensuring_coalescence)=
+
+## Ensuring coalescence
+
+Even though the simulation above was run for hundreds of generations, there are still
+trees with multiple roots in some regions of the genome. 1000 generations is not therefore
+long enough to capture the ancestry back to a single common ancestor
+(i.e. to ensure "full coalescence" of all local trees):
+
+```{code-cell}
+from matplotlib import pyplot as plt
+plt.figure(figsize=(7, 2))
+plt.stairs(
+    [tree.num_roots for tree in simplified_ts.trees()],
+    simplified_ts.breakpoints(as_array=True),
+    baseline=None,
+)
+plt.xlabel("Genome position (bp)")
+plt.ylabel("Number of roots")
+plt.show()
+```
+
+Most of our trees have one root, but a handful have two. 
+What is happening is that even though we simulated for $10N$ generations,
+and the theoretical expectation for the time to the most recent common ancestor
+is $E[tMRCA]=4N$, the variance is pretty big, such that some marginal
+trees are not completely coalesced.  Let's take a look at the distribution of
+tMRCAs using `msprime` to carry out an equivalent backward-time simulation
+(which is much faster, so we can replicate it, say, 1000 times).
+
+```{code-cell}
+import msprime
+
+tmrca=[]
+sequence_length = 50_000
+rho = 5e-7
+for sim in msprime.sim_ancestry(
+    100,
+    sequence_length=sequence_length,
+    recombination_rate=rho,
+    random_seed=42,
+    num_replicates=1000,
+):
+    for tree in sim.trees():
+        tmrca.append(tree.time(tree.root))
+
+plt.figure(figsize=(7, 3))
+plt.hist(tmrca, bins=30, density=True)
+plt.xlabel("tMRCA (units of N generations)")
+plt.ylabel("Proportion of trees")
+plt.show()
+```
+
+So, some fraction of local trees have very large tMRCAs! And of course
+the larger the population, the longer the time needed to ensure full coalescence.
+Hence for large forward simulation models, the number of generations required 
+to ensure full coalescence can be prohibitive.
+
+Uncoalesced local trees matter, because they result in some pairs of
+samples which are not connected, and so the genetic distance betweem
+them is undefined (another way to imagine this is that we cannot spot
+mutations that occur above the local roots of the tree). Hence the
+simulation is not capturing the full genetic diversity within the sample.
+
+A powerful way to get around this problem is *recapitation*,
+in which an alternative technique, such as backward-in-time coalescent simulation,
+is used to to fill in the "head" of the tree sequence. In other words,
+we can use a fast backward-time simulator such as `msprime` to simulate the
+prior genealogy of the oldest nodes in the simplified tree sequence.
+Details are described in the {ref}`sec_completing_forwards_simulations`
+tutorial.
+
+## More complex forward-simulations
+
+The next tutorial shows the principles behind more complex simulations,
+including e.g. regular simplification during the simulation,
+adding mutations, and adding metadata.
+It also details several extra tips and tricks we have learned when building forward simulators.
diff --git a/more_forward_sims.md b/more_forward_sims.md
new file mode 100644
index 00000000..932c63c1
--- /dev/null
+++ b/more_forward_sims.md
@@ -0,0 +1,245 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.12
+    jupytext_version: 1.9.1
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+```{currentmodule} tskit
+```
+
+(sec_tskit_more_forward_simulations)=
+
+# _Advanced forward simulations_
+
+% remove underscores in title when tutorial is complete or near-complete
+
+:::{todo}
+Add further details on building a forward simulator
+(see issue [#14](https://github.com/tskit-dev/tutorials/issues/14))
+:::
+
+In the {ref}`previous tutorial<sec_tskit_forward_simulations>`, we developed the following
+basic forward-time Wright-Fisher (WF) simulator (refer back to that tutorial for a
+detailed run through of the code):
+
+```{code-cell}
+import tskit
+import numpy as np
+
+random_seed = 6
+random = np.random.default_rng(random_seed)  # A random number generator for general use
+
+L = 50_000  # The sequence length: 50 Kb
+
+def add_inheritance_paths(tables, parent_genomes, child_genome, recombination_rate):
+    "Add paths from parent genomes to the child genome, with crossover recombination"
+    L = tables.sequence_length
+    num_recombinations = random.poisson(recombination_rate * L)
+    breakpoints = random.integers(0, L - 1, size=num_recombinations)
+    break_pos, counts = np.unique(breakpoints, return_counts=True)
+    crossovers = break_pos[counts % 2 == 1]  # no crossover if e.g. 2 breaks at same pos
+    left_positions = np.insert(crossovers, 0, 0)
+    right_positions = np.append(crossovers, L)
+
+    inherit_from = random.integers(2)
+    for left, right in zip(left_positions, right_positions):
+        tables.edges.add_row(
+            left, right, parent_genomes[inherit_from], child_genome)
+        inherit_from = 1 - inherit_from  # switch to other parent genome
+
+def make_diploid(tables, time, parent_individuals=None):
+    individual_id = tables.individuals.add_row(parents=parent_individuals)
+    return individual_id, (
+        tables.nodes.add_row(time=time, individual=individual_id),
+        tables.nodes.add_row(time=time, individual=individual_id),
+    )
+
+def new_population(tables, time, prev_pop, recombination_rate):
+    pop = {}
+    prev_individuals = np.array([i for i in prev_pop.keys()], dtype=np.int32)
+    for _ in range(len(prev_pop)):
+        mother_and_father = random.choice(prev_individuals, 2, replace=True)
+        child_id, child_genomes = make_diploid(tables, time, mother_and_father)
+        pop[child_id] = child_genomes  # store the genome IDs
+        for child_genome, parent_individual in zip(child_genomes, mother_and_father):
+            parent_genomes = prev_pop[parent_individual]
+            add_inheritance_paths(tables, parent_genomes, child_genome, recombination_rate)
+    return pop
+
+def initialise_population(tables, time, size) -> dict:
+    return dict(make_diploid(tables, time) for _ in range(size))
+
+def forward_WF(num_diploids, seq_len, generations, recombination_rate=0, random_seed=7):
+    global random
+    random = np.random.default_rng(random_seed) 
+    tables = tskit.TableCollection(seq_len)
+    tables.time_units = "generations"
+
+    pop = initialise_population(tables, generations, num_diploids)
+    while generations > 0:
+        generations = generations - 1
+        pop = new_population(tables, generations, pop, recombination_rate)
+
+    tables.sort()
+    return tables.tree_sequence()
+```
+
+## Repeated simplification
+
+We will now implement the WF simulation above but with simplification happening at
+repeated intervals. It is helpful to think of this as regular "garbage collection",
+as what we're really doing is getting rid of extinct lineages while also "trimming"
+extant lineages down to a minimal representation.
+
+:::{caution}
+Regular garbage collection forces us to reckon with the fact that simplification
+{ref}`changes the node IDs <sec_tskit_forward_simulations_simplification_id_changes>`.
+We therefore need to remap any node (and individual) IDs that are used outside of
+`tskit`. In the implementation described here, those IDs are stored in the `pop`
+variable.
+:::
+
+```{code-cell}
+def simplify_tables(tables, samples, pop) -> dict[int, tuple[int, int]]:
+    """
+    Simplify the tables with respect to the given samples, returning a
+    population dict in which individual and nodes are remapped.
+    """
+    tables.sort()
+    node_map = tables.simplify(samples, keep_input_roots=True)
+    
+    nodes_individual = tables.nodes.individual
+    remapped_pop = {}
+    for node1, node2 in pop.values():
+        node1, node2 = node_map[[node1, node2]]  # remap
+        assert nodes_individual[node1] == nodes_individual[node2]  # sanity check
+        remapped_pop[nodes_individual[node1]] = (node1, node2)
+    return remapped_pop
+
+
+def forward_WF(
+    num_diploids,
+    seq_len,
+    generations,
+    recombination_rate=None,
+    simplification_interval=None,  # default to simplifying only at end
+    show=None,
+):
+    tables = tskit.TableCollection(seq_len)
+    tables.time_units = "generations"  # optional, but helpful when plotting
+    if simplification_interval is None:
+        simplification_interval = generations
+    simplify_mod = generations % simplification_interval
+
+    pop = initialise_population(tables, generations, num_diploids)
+    while generations > 0:
+        generations = generations - 1
+        pop = new_population(tables, generations, pop, recombination_rate)
+        if generations > 0 and generations % simplification_interval == simplify_mod:
+            current_nodes = [u for nodes in pop.values() for u in nodes]
+            pop = simplify_tables(tables, current_nodes, pop)
+            if show:
+                print("Simplified", generations, "generations before end")
+
+    pop = simplify_tables(tables, [u for nodes in pop.values() for u in nodes], pop)
+    if show:
+        print("Final simplification")
+    return tables.tree_sequence()
+
+ts = forward_WF(6, L, generations=100, simplification_interval=25, show=True)
+ts.draw_svg(size=(800, 200))
+```
+
+### Invariance to simplification interval
+A critical concept to keep in mind is that the simulation itself is the only random component.
+The simplification algorithm is deterministic given a set of (nodes, edges) satisfying
+`tskit`'s sorting requirements. Therefore, the results of our new `forward_WF()` function
+must be the same for all simplification intervals
+
+:::{note}
+This invariance property only holds in some cases.
+We discuss this in more detail below when we add in mutation.
+:::
+
+```{code-cell}
+ts = diploid_sim(10, 500, simplification_interval=1, random_seed=42)
+
+# Iterate over a range of odd and even simplification intervals.
+print("Testing invariance to simplification interval")
+test_intervals = list(range(2, 500, 33))
+for i in test_intervals:
+    # Make sure each new sim starts with same random seed!
+    ts_test = diploid_sim(10, 500, simplification_interval=i, random_seed=42)
+    assert ts.equals(ts_test, ignore_provenance=True)
+print(f"Intervals {test_intervals} passed")
+```
+
+:::{tip}
+Testing your own code using loops like the one above is a very
+good way to identify subtle bugs in book-keeping.
+:::
+
+### Summary
+
+* Simplifying during a simulation changes IDs in the tree sequence tables, so we need to remap
+entities that store any of these IDs between generations.
+* Our code to carry out simplification gets called both during the simulation and at the end.
+It's therefore worth encapsulating it into a class or function for easier code re-use and testing.
+
+#### Technical notes
+
+We have found that it is possible to write a simulation where the results differ
+by simplification interval, but appear correct in distribution. 
+By this we mean that looking at distributions of numbers of mutations, their frequencies, etc.,
+match predictions from analytical theory.  However, our experience is that such simulations
+contain bugs and that the summaries being used for testing are too crude to catch them.
+For example, they may affect the variance in a subtle way that would require millions
+of simulations to catch.  Often what is going on is that parent/offspring relationships
+are not being properly recorded, resulting in lineages that either persist too long or
+not long enough.  (In other words, the variance in offspring number per diploid is no
+longer what it should be, meaning you've changed the effective population size.)
+Thus, please make sure you get the **same** `tskit` tables out of a simulation for
+any simplification interval.
+
+
+## Mutations
+
+In this section, we will add mutation to our simulation.  Mutations will occur according to the
+infinitely-many sites model, which means that a new mutation cannot arise at a currently-mutated
+position. $\theta = 4N\mu$ is the scaled mutation rate, and is equal to twice the expected number
+of new mutations per generation.  The parameter $\mu$ is the expected number of new mutations
+per gamete, per generation. Mutation positions will be uniformly distributed along the genome.
+
+Adding mutations changes the complexity of the simulation quite a bit, because now we must
+add to and simplify [site tables](sec_edge_table_definition) and
+[mutation tables](sec_mutation_table_definition) instances. We might also
+want to add *metadata* to the sites or mutations, recording details such as
+the selection coefficient of a mutation, or the type of mutation (e.g., synonymous vs. non-synonymous).
+
+We will write a mutation function here which we will re-use in future examples.
+
+:::{note}
+We will be treating mutations as neutral.  Doing so is odd, as one big
+selling point of `tskit` is the ability to skip the tracking of neutral mutations
+in forward simulations.  However, tracking neutral mutations plus metadata is the
+same as tracking selected mutations and their metadata, and being able to do neat
+things like put your selected mutations onto a figure of the genealogy
+is one of several possible use cases.
+:::
+
+:::{todo}
+The rest of this tutorial is still under construction, and needs porting from
+[this workbook](https://github.com/tskit-dev/tutorials/blob/main/old-content/notebooks/wfforward.ipynb).
+This will primarily deal with sites and mutations (and mutational metadata).
+We could also include details on selection, if that seems sensible.
+
+The section in that workbook on "Starting with a prior history" should be put in
+the {ref}`sec_completing_forwards_simulations` tutorial.
+:::
\ No newline at end of file