From 9650d23e77032ebbd65fc60e50571498eb7263d6 Mon Sep 17 00:00:00 2001
From: nicholas-leonard <nick@nikopia.org>
Date: Mon, 10 Aug 2015 21:42:01 -0400
Subject: [PATCH 1/2] doc readthedocs

---
 README.md          |  2 +-
 doc/containers.md  | 25 ++++++++++---------
 doc/convolution.md | 51 +++++++++++++++++++-------------------
 doc/criterion.md   | 42 +++++++++++++++----------------
 doc/index.md       | 23 +++++++++++++++++
 doc/module.md      | 48 ++++++++++++++++++------------------
 doc/overview.md    | 14 +++++++----
 doc/simple.md      | 61 +++++++++++++++++++++++-----------------------
 doc/table.md       | 35 +++++++++++++-------------
 doc/training.md    | 14 +++++------
 doc/transfer.md    | 32 ++++++++++++------------
 mkdocs.yml         | 18 ++++++++++++++
 12 files changed, 207 insertions(+), 158 deletions(-)
 create mode 100644 doc/index.md
 create mode 100644 mkdocs.yml
diff --git a/README.md b/README.md
index 907be66a3..378a4409d 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
 [![Build Status](https://travis-ci.org/torch/nn.svg?branch=master)](https://travis-ci.org/torch/nn)
-<a name="nn.dok"/>
+<a name="nn.dok"></a>
 # Neural Network Package #
 
 This package provides an easy and modular way to build and train simple or complex neural networks using [Torch](https://github.com/torch/torch7/blob/master/README.md):
diff --git a/doc/containers.md b/doc/containers.md
index d691f4133..8d02ab96b 100644
--- a/doc/containers.md
+++ b/doc/containers.md
@@ -1,6 +1,7 @@
-<a name="nn.Containers"/>
+<a name="nn.Containers"></a>
 # Containers #
 Complex neural networks are easily built using container classes:
+
  * [Container](#nn.Container) : abstract class inherited by containers ;
    * [Sequential](#nn.Sequential) : plugs layers in a feed-forward fully connected manner ;
    * [Parallel](#nn.Parallel) : applies its `ith` child module to the  `ith` slice of the input Tensor ;
@@ -9,7 +10,7 @@ Complex neural networks are easily built using container classes:
  
 See also the [Table Containers](#nn.TableContainers) for manipulating tables of [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md).
 
-<a name="nn.Container"/>
+<a name="nn.Container"></a>
 ## Container ##
 
 This is an abstract [Module](module.md#nn.Module) class which declares methods defined in all containers.
@@ -17,19 +18,19 @@ It reimplements many of the Module methods such that calls are propagated to the
 contained modules. For example, a call to [zeroGradParameters](module.md#nn.Module.zeroGradParameters)
 will be propagated to all contained modules.
 
-<a name="nn.Container.add"/>
+<a name="nn.Container.add"></a>
 ### add(module) ###
 Adds the given `module` to the container. The order is important
 
-<a name="nn.Container.get"/>
+<a name="nn.Container.get"></a>
 ### get(index) ###
 Returns the contained modules at index `index`.
 
-<a name="nn.Container.size"/>
+<a name="nn.Container.size"></a>
 ### size() ###
 Returns the number of contained modules.
 
-<a name="nn.Sequential"/>
+<a name="nn.Sequential"></a>
 ## Sequential ##
 
 Sequential provides a means to plug layers together
@@ -51,7 +52,7 @@ which gives the output:
 [torch.Tensor of dimension 1]
 ```
 
-<a name="nn.Sequential.remove"/>
+<a name="nn.Sequential.remove"></a>
 ### remove([index]) ###
 
 Remove the module at the given `index`. If `index` is not specified, remove the last layer.
@@ -71,7 +72,7 @@ nn.Sequential {
 ```
 
 
-<a name="nn.Sequential.insert"/>
+<a name="nn.Sequential.insert"></a>
 ### insert(module, [index]) ###
 
 Inserts the given `module` at the given `index`. If `index` is not specified, the incremented length of the sequence is used and so this is equivalent to use `add(module)`.
@@ -92,7 +93,7 @@ nn.Sequential {
 
 
 
-<a name="nn.Parallel"/>
+<a name="nn.Parallel"></a>
 ## Parallel ##
 
 `module` = `Parallel(inputDimension,outputDimension)`
@@ -149,7 +150,7 @@ end
 ```
 
 
-<a name="nn.Concat"/>
+<a name="nn.Concat"></a>
 ## Concat ##
 
 ```lua
@@ -179,7 +180,7 @@ which gives the output:
 [torch.Tensor of dimension 10]
 ```
 
-<a name="nn.DepthConcat"/>
+<a name="nn.DepthConcat"></a>
 ## DepthConcat ##
 
 ```lua
@@ -273,7 +274,7 @@ module output tensors non-`dim` sizes aren't all odd or even.
 Such that in order to keep the mappings aligned, one need 
 only ensure that these be all odd (or even).
 
-<a name="nn.TableContainers"/>
+<a name="nn.TableContainers"></a>
 ## Table Containers ##
 While the above containers are used for manipulating input [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md), table containers are used for manipulating tables :
  * [ConcatTable](table.md#nn.ConcatTable)
diff --git a/doc/convolution.md b/doc/convolution.md
index 8d9e77bf6..4f716c639 100755
--- a/doc/convolution.md
+++ b/doc/convolution.md
@@ -1,7 +1,8 @@
-<a name="nn.convlayers.dok"/>
+<a name="nn.convlayers.dok"></a>
 # Convolutional layers #
 
 A convolution is an integral that expresses the amount of overlap of one function `g` as it is shifted over another function `f`. It therefore "blends" one function with another. The neural network package supports convolution, pooling, subsampling and other relevant facilities. These are divided base on the dimensionality of the input and output [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor):
+ 
  * [Temporal Modules](#nn.TemporalModules) apply to sequences with a one-dimensional relationship
 (e.g. sequences of words, phonemes and letters. Strings of some kind).
    * [TemporalConvolution](#nn.TemporalConvolution) : a 1D convolution over an input sequence ;
@@ -25,7 +26,7 @@ a kernel for computing the weighted average in a neighborhood ;
    * [VolumetricMaxPooling](#nn.VolumetricMaxPooling) : a 3D max-pooling operation over an input video.
    * [VolumetricAveragePooling](#nn.VolumetricAveragePooling) : a 3D average-pooling operation over an input video.
 
-<a name="nn.TemporalModules"/>
+<a name="nn.TemporalModules"></a>
 ## Temporal Modules ##
 Excluding an optional first batch dimension, temporal layers expect a 2D Tensor as input. The
 first dimension is the number of frames in the sequence (e.g. `nInputFrame`), the last dimenstion
@@ -35,7 +36,7 @@ of dimensions, although the size of each dimension may change. These are commonl
 Note: The [LookupTable](#nn.LookupTable) is special in that while it does output a temporal Tensor of size `nOutputFrame x outputFrameSize`,
 its input is a 1D Tensor of indices of size `nIndices`. Again, this is excluding the option first batch dimension.
 
-<a name="nn.TemporalConvolution"/>
+<a name="nn.TemporalConvolution"></a>
 ## TemporalConvolution ##
 
 ```lua
@@ -121,7 +122,7 @@ which gives:
 -0.63871422284166
 ```
 
-<a name="nn.TemporalMaxPooling"/>
+<a name="nn.TemporalMaxPooling"></a>
 ## TemporalMaxPooling ##
 
 ```lua
@@ -139,7 +140,7 @@ If the input sequence is a 2D tensor of dimension `nInputFrame x inputFrameSize`
 nOutputFrame = (nInputFrame - kW) / dW + 1
 ```
 
-<a name="nn.TemporalSubSampling"/>
+<a name="nn.TemporalSubSampling"></a>
 ## TemporalSubSampling ##
 
 ```lua
@@ -175,7 +176,7 @@ The output value of the layer can be precisely described as:
 output[i][t] = bias[i] + weight[i] * sum_{k=1}^kW input[i][dW*(t-1)+k)]
 ```
 
-<a name="nn.LookupTable"/>
+<a name="nn.LookupTable"></a>
 ## LookupTable ##
 
 ```lua
@@ -253,13 +254,13 @@ Outputs something like:
 [torch.DoubleTensor of dimension 2x4x3]
 ```
 
-<a name="nn.SpatialModules"/>
+<a name="nn.SpatialModules"></a>
 ## Spatial Modules ##
 Excluding and optional batch dimension, spatial layers expect a 3D Tensor as input. The
 first dimension is the number of features (e.g. `frameSize`), the last two dimenstions
 are spatial (e.g. `height x width`). These are commonly used for processing images.
 
-<a name="nn.SpatialConvolution"/>
+<a name="nn.SpatialConvolution"></a>
 ### SpatialConvolution ###
 
 ```lua
@@ -303,7 +304,7 @@ output[i][j][k] = bias[k]
 ```
 
 
-<a name="nn.SpatialConvolutionMap"/>
+<a name="nn.SpatialConvolutionMap"></a>
 ### SpatialConvolutionMap ###
 
 ```lua
@@ -317,7 +318,7 @@ connection table between input and output features. The
 using a [full connection table](#nn.tables.full). One can specify
 different types of connection tables.
 
-<a name="nn.tables.full"/>
+<a name="nn.tables.full"></a>
 #### Full Connection Table ####
 
 ```lua
@@ -327,7 +328,7 @@ table = nn.tables.full(nin,nout)
 This is a precomputed table that specifies connections between every
 input and output node.
 
-<a name="nn.tables.onetoone"/>
+<a name="nn.tables.onetoone"></a>
 #### One to One Connection Table ####
 
 ```lua
@@ -337,7 +338,7 @@ table = nn.tables.oneToOne(n)
 This is a precomputed table that specifies a single connection to each
 output node from corresponding input node.
 
-<a name="nn.tables.random"/>
+<a name="nn.tables.random"></a>
 #### Random Connection Table ####
 
 ```lua
@@ -348,7 +349,7 @@ This table is randomly populated such that each output unit has
 `nto` incoming connections. The algorihtm tries to assign uniform
 number of outgoing connections to each input node if possible.
 
-<a name="nn.SpatialLPPooling"/>
+<a name="nn.SpatialLPPooling"></a>
 ### SpatialLPPooling ###
 
 ```lua
@@ -357,7 +358,7 @@ module = nn.SpatialLPPooling(nInputPlane, pnorm, kW, kH, [dW], [dH])
 
 Computes the `p` norm in a convolutional manner on a set of 2D input planes.
 
-<a name="nn.SpatialMaxPooling"/>
+<a name="nn.SpatialMaxPooling"></a>
 ### SpatialMaxPooling ###
 
 ```lua
@@ -379,7 +380,7 @@ oheight = op((height + 2*padH - kH) / dH + 1)
 `op` is a rounding operator. By default, it is `floor`. It can be changed
 by calling `:ceil()` or `:floor()` methods.
 
-<a name="nn.SpatialAveragePooling"/>
+<a name="nn.SpatialAveragePooling"></a>
 ### SpatialAveragePooling ###
 
 ```lua
@@ -390,7 +391,7 @@ Applies 2D average-pooling operation in `kWxkH` regions by step size
 `dWxdH` steps. The number of output features is equal to the number of
 input planes.
 
-<a name="nn.SpatialAdaptiveMaxPooling"/>
+<a name="nn.SpatialAdaptiveMaxPooling"></a>
 ### SpatialAdaptiveMaxPooling ###
 
 ```lua
@@ -413,7 +414,7 @@ y_i_start = floor((i   /oheight) * iheight)
 y_i_end   = ceil(((i+1)/oheight) * iheight)
 ```
 
-<a name="nn.SpatialSubSampling"/>
+<a name="nn.SpatialSubSampling"></a>
 ### SpatialSubSampling ###
 
 ```lua
@@ -454,7 +455,7 @@ output[i][j][k] = bias[k]
   + weight[k] sum_{s=1}^kW sum_{t=1}^kH input[dW*(i-1)+s)][dH*(j-1)+t][k]
 ```
 
-<a name="nn.SpatialUpSamplingNearest"/>
+<a name="nn.SpatialUpSamplingNearest"></a>
 ### SpatialUpSamplingNearest ###
 
 ```lua
@@ -475,7 +476,7 @@ output(u,v) = input(floor((u-1)/scale)+1, floor((v-1)/scale)+1)
 
 Where `u` and `v` are index from 1 (as per lua convention).  There are no learnable parameters.
 
-<a name="nn.SpatialZeroPadding"/>
+<a name="nn.SpatialZeroPadding"></a>
 ### SpatialZeroPadding ###
 
 ```lua
@@ -485,7 +486,7 @@ module = nn.SpatialZeroPadding(padLeft, padRight, padTop, padBottom)
 Each feature map of a given input is padded with specified number of
 zeros. If padding values are negative, then input is cropped.
 
-<a name="nn.SpatialSubtractiveNormalization"/>
+<a name="nn.SpatialSubtractiveNormalization"></a>
 ### SpatialSubtractiveNormalization ###
 
 ```lua
@@ -522,7 +523,7 @@ w2=image.display(processed)
 ```
 ![](image/lena.jpg)![](image/lenap.jpg)
 
-<a name="nn.SpatialBatchNormalization"/>
+<a name="nn.SpatialBatchNormalization"></a>
 ## SpatialBatchNormalization ##
 
 `module` = `nn.SpatialBatchNormalization(N [,eps] [, momentum] [,affine])`
@@ -565,13 +566,13 @@ A = torch.randn(b, m, h, w)
 C = model.forward(A)  -- C will be of size `b x m x h x w`
 ```
 
-<a name="nn.VolumetricModules"/>
+<a name="nn.VolumetricModules"></a>
 ## Volumetric Modules ##
 Excluding and optional batch dimension, volumetric layers expect a 4D Tensor as input. The
 first dimension is the number of features (e.g. `frameSize`), the second is sequential (e.g. `time`) and the
 last two dimenstions are spatial (e.g. `height x width`). These are commonly used for processing videos (sequences of images).
 
-<a name="nn.VolumetricConvolution"/>
+<a name="nn.VolumetricConvolution"></a>
 ### VolumetricConvolution ###
 
 ```lua
@@ -608,7 +609,7 @@ size `nOutputPlane x nInputPlane x kT x kH x kW`) and `self.bias` (Tensor of
 size `nOutputPlane`). The corresponding gradients can be found in
 `self.gradWeight` and `self.gradBias`.
 
-<a name="nn.VolumetricMaxPooling"/>
+<a name="nn.VolumetricMaxPooling"></a>
 ### VolumetricMaxPooling ###
 
 ```lua
@@ -619,7 +620,7 @@ Applies 3D max-pooling operation in `kTxkWxkH` regions by step size
 `dTxdWxdH` steps. The number of output features is equal to the number of
 input planes / dT.
 
-<a name="nn.VolumetricAveragePooling"/>
+<a name="nn.VolumetricAveragePooling"></a>
 ### VolumetricAveragePooling ###
 
 ```lua
diff --git a/doc/criterion.md b/doc/criterion.md
index 64e6d6326..4f89338c9 100755
--- a/doc/criterion.md
+++ b/doc/criterion.md
@@ -1,4 +1,4 @@
-<a name="nn.Criterions"/>
+<a name="nn.Criterions"></a>
 # Criterions #
 
 [`Criterions`](#nn.Criterion) are helpful to train a neural network. Given an input and a
@@ -24,13 +24,13 @@ target, they compute a gradient according to a given loss function.
   * [`ParallelCriterion`](#nn.ParallelCriterion) : a weighted sum of other criterions each applied to a different input and target;
   * [`MarginRankingCriterion`](#nn.MarginRankingCriterion): ranks two inputs;
 
-<a name="nn.Criterion"/>
+<a name="nn.Criterion"></a>
 ## Criterion ##
 
 This is an abstract class which declares methods defined in all criterions.
 This class is [serializable](https://github.com/torch/torch7/blob/master/doc/file.md#serialization-methods).
 
-<a name="nn.Criterion.forward"/>
+<a name="nn.Criterion.forward"></a>
 ### [output] forward(input, target) ###
 
 Given an `input` and a `target`, compute the loss function associated to the criterion and return the result.
@@ -41,7 +41,7 @@ The `output` returned should be a scalar in general.
 The state variable [`self.output`](#nn.Criterion.output) should be updated after a call to `forward()`.
 
 
-<a name="nn.Criterion.backward"/>
+<a name="nn.Criterion.backward"></a>
 ### [gradInput] backward(input, target) ###
 
 Given an `input` and a `target`, compute the gradients of the loss function associated to the criterion and return the result.
@@ -50,19 +50,19 @@ In general `input`, `target` and `gradInput` are [`Tensor`s](..:torch:tensor), b
 The state variable [`self.gradInput`](#nn.Criterion.gradInput) should be updated after a call to `backward()`.
 
 
-<a name="nn.Criterion.output"/>
+<a name="nn.Criterion.output"></a>
 ### State variable: output ###
 
 State variable which contains the result of the last [`forward(input, target)`](#nn.Criterion.forward) call.
 
 
-<a name="nn.Criterion.gradInput"/>
+<a name="nn.Criterion.gradInput"></a>
 ### State variable: gradInput ###
 
 State variable which contains the result of the last [`backward(input, target)`](#nn.Criterion.backward) call.
 
 
-<a name="nn.AbsCriterion"/>
+<a name="nn.AbsCriterion"></a>
 ## AbsCriterion ##
 
 ```lua
@@ -85,7 +85,7 @@ criterion.sizeAverage = false
 ```
 
 
-<a name="nn.ClassNLLCriterion"/>
+<a name="nn.ClassNLLCriterion"></a>
 ## ClassNLLCriterion ##
 
 ```lua
@@ -128,7 +128,7 @@ end
 ```
 
 
-<a name="nn.CrossEntropyCriterion"/>
+<a name="nn.CrossEntropyCriterion"></a>
 ## CrossEntropyCriterion ##
 
 ```lua
@@ -157,7 +157,7 @@ loss(x, class) = weights[class] * (-x[class] + log(\sum_j exp(x[j])))
 ```
 
 
-<a name="nn.DistKLDivCriterion"/>
+<a name="nn.DistKLDivCriterion"></a>
 ## DistKLDivCriterion ##
 
 ```lua
@@ -177,7 +177,7 @@ loss(x, target) = \sum(target_i * (log(target_i) - x_i))
 ```
 
 
-<a name="nn.BCECriterion"/>
+<a name="nn.BCECriterion"></a>
 ## BCECriterion
 
 ```lua
@@ -193,7 +193,7 @@ loss(t, o) = -(t * log(o) + (1 - t) * log(1 - o))
 This is used for measuring the error of a reconstruction in for example an auto-encoder.
 
 
-<a name="nn.MarginCriterion"/>
+<a name="nn.MarginCriterion"></a>
 ## MarginCriterion ##
 
 ```lua
@@ -256,7 +256,7 @@ gives the output:
 i.e. the mlp successfully separates the two data points such that they both have a `margin` of `1`, and hence a loss of `0`.
 
 
-<a name="nn.MultiMarginCriterion"/>
+<a name="nn.MultiMarginCriterion"></a>
 ## MultiMarginCriterion ##
 
 ```lua
@@ -281,7 +281,7 @@ mlp:add(nn.MulConstant(-1)) -- distance to similarity
 ```
 
 
-<a name="nn.MultiLabelMarginCriterion"/>
+<a name="nn.MultiLabelMarginCriterion"></a>
 ## MultiLabelMarginCriterion ##
 
 ```lua
@@ -309,7 +309,7 @@ criterion:forward(input, target)
 ```
 
 
-<a name="nn.MSECriterion"/>
+<a name="nn.MSECriterion"></a>
 ## MSECriterion ##
 
 ```lua
@@ -333,7 +333,7 @@ criterion.sizeAverage = false
 ```
 
 
-<a name="nn.MultiCriterion"/>
+<a name="nn.MultiCriterion"></a>
 ## MultiCriterion ##
 
 ```lua
@@ -360,7 +360,7 @@ mc = nn.MultiCriterion():add(nll, 0.5):add(nll2)
 output = mc:forward(input, target)
 ```
 
-<a name="nn.ParallelCriterion"/>
+<a name="nn.ParallelCriterion"></a>
 ## ParallelCriterion ##
 
 ```lua
@@ -390,7 +390,7 @@ output = pc:forward(input, target)
 ```
 
 
-<a name="nn.HingeEmbeddingCriterion"/>
+<a name="nn.HingeEmbeddingCriterion"></a>
 ## HingeEmbeddingCriterion ##
 
 ```lua
@@ -469,7 +469,7 @@ end
 ```
 
 
-<a name="nn.L1HingeEmbeddingCriterion"/>
+<a name="nn.L1HingeEmbeddingCriterion"></a>
 ## L1HingeEmbeddingCriterion ##
 
 ```lua
@@ -486,7 +486,7 @@ loss(x, y) = ⎨
 
 The `margin` has a default value of `1`, or can be set in the constructor.
 
-<a name="nn.CosineEmbeddingCriterion"/>
+<a name="nn.CosineEmbeddingCriterion"></a>
 ## CosineEmbeddingCriterion ##
 
 ```lua
@@ -508,7 +508,7 @@ loss(x, y) = ⎨
 ```
 
 
-<a name="nn.MarginRankingCriterion"/>
+<a name="nn.MarginRankingCriterion"></a>
 ## MarginRankingCriterion ##
 
 ```lua
diff --git a/doc/index.md b/doc/index.md
new file mode 100644
index 000000000..5c3616673
--- /dev/null
+++ b/doc/index.md
@@ -0,0 +1,23 @@
+[![Build Status](https://travis-ci.org/torch/nn.svg?branch=master)](https://travis-ci.org/torch/nn)
+<a name="nn.dok"></a>
+# Neural Network Package #
+
+This package provides an easy and modular way to build and train simple or complex neural networks using [Torch](https://github.com/torch/torch7/blob/master/README.md):
+  
+  * Modules are the bricks used to build neural networks. Each are themselves neural networks, but can be combined with other networks using containers to create complex neural networks:
+    * [Module](module.md#nn.Module) : abstract class inherited by all modules;
+    * [Containers](containers.md#nn.Containers) : container classes like [Sequential](containers.md#nn.Sequential), [Parallel](containers.md#nn.Parallel) and [Concat](containers.md#nn.Concat);
+    * [Transfer functions](transfer.md#nn.transfer.dok) : non-linear functions like [Tanh](transfer.md#nn.Tanh) and [Sigmoid](transfer.md#nn.Sigmoid);
+    * [Simple layers](simple.md#nn.simplelayers.dok) : like [Linear](simple.md#nn.Linear), [Mean](simple.md#nn.Mean), [Max](simple.md#nn.Max) and [Reshape](simple.md#nn.Reshape); 
+    * [Table layers](table.md#nn.TableLayers) : layers for manipulating tables like [SplitTable](table.md#nn.SplitTable), [ConcatTable](table.md#nn.ConcatTable) and [JoinTable](table.md#nn.JoinTable);
+    * [Convolution layers](convolution.md#nn.convlayers.dok) : [Temporal](convolution.md#nn.TemporalModules),  [Spatial](convolution.md#nn.SpatialModules) and [Volumetric](convolution.md#nn.VolumetricModules) convolutions ; 
+  * Criterions compute a gradient according to a given loss function given an input and a target:
+    * [Criterions](criterion.md#nn.Criterions) : a list of all criterions, including [Criterion](criterion.md#nn.Criterion), the abstract class;
+    * [MSECriterion](criterion.md#nn.MSECriterion) : the Mean Squared Error criterion used for regression; 
+    * [ClassNLLCriterion](criterion.md#nn.ClassNLLCriterion) : the Negative Log Likelihood criterion used for classification;
+  * Additional documentation :
+    * [Overview](overview.md#nn.overview.dok) of the package essentials including modules, containers and training;
+    * [Training](training.md#nn.traningneuralnet.dok) : how to train a neural network using [StochasticGradient](training.md#nn.StochasticGradient);
+    * [Testing](testing.md) : how to test your modules.
+    * [Experimental Modules](https://github.com/clementfarabet/lua---nnx/blob/master/README.md) : a package containing experimental modules and criteria.
+
diff --git a/doc/module.md b/doc/module.md
index 50090c421..97e14a07c 100755
--- a/doc/module.md
+++ b/doc/module.md
@@ -1,4 +1,4 @@
-<a name="nn.Module"/>
+<a name="nn.Module"></a>
 ## Module ##
 
 `Module` is an abstract class which defines fundamental methods necessary
@@ -7,7 +7,7 @@ for a training a neural network. Modules are [serializable](https://github.com/t
 Modules contain two states variables: [output](#output) and
 [gradInput](#gradinput).
 
-<a name="nn.Module.forward"/>
+<a name="nn.Module.forward"></a>
 ### [output] forward(input) ###
 
 Takes an `input` object, and computes the corresponding `output` of the
@@ -24,7 +24,7 @@ implement [updateOutput(input)](#nn.Module.updateOutput)
 function. The forward module in the abstract parent class
 [Module](#nn.Module) will call `updateOutput(input)`.
 
-<a name="nn.Module.backward"/>
+<a name="nn.Module.backward"></a>
 ### [gradInput] backward(input, gradOutput) ###
 
 Performs a _backpropagation step_ through the module, with respect to the
@@ -52,14 +52,14 @@ is better to override
 [accGradParameters(input, gradOutput,scale)](#nn.Module.accGradParameters)
 functions.
 
-<a name="nn.Module.updateOutput"/>
+<a name="nn.Module.updateOutput"></a>
 ### updateOutput(input) ###
 
 Computes the output using the current parameter set of the class and
 input. This function returns the result which is stored in the
 [output](#output) field.
 
-<a name="nn.Module.updateGradInput"/>
+<a name="nn.Module.updateGradInput"></a>
 ### updateGradInput(input, gradOutput) ###
 
 Computing the gradient of the module with respect to its own
@@ -67,7 +67,7 @@ input. This is returned in `gradInput`. Also, the
 [gradInput](#gradinput) state variable is updated
 accordingly.
 
-<a name="nn.Module.accGradParameters"/>
+<a name="nn.Module.accGradParameters"></a>
 ### accGradParameters(input, gradOutput, scale) ###
 
 Computing the gradient of the module with respect to its
@@ -83,7 +83,7 @@ Zeroing this accumulation is achieved with
 the parameters according to this accumulation is done with
 [updateParameters()](#nn.Module.updateParameters).
 
-<a name="nn.Module.zeroGradParameters"/>
+<a name="nn.Module.zeroGradParameters"></a>
 ### zeroGradParameters() ###
 
 If the module has parameters, this will zero the accumulation of the
@@ -91,7 +91,7 @@ gradients with respect to these parameters, accumulated through
 [accGradParameters(input, gradOutput,scale)](#nn.Module.accGradParameters)
 calls. Otherwise, it does nothing.
 
-<a name="nn.Module.updateParameters"/>
+<a name="nn.Module.updateParameters"></a>
 ### updateParameters(learningRate) ###
 
 If the module has parameters, this will update these parameters, according
@@ -104,7 +104,7 @@ parameters = parameters - learningRate * gradients_wrt_parameters
 ```
 If the module does not have parameters, it does nothing.
 
-<a name="nn.Module.accUpdateGradParameters"/>
+<a name="nn.Module.accUpdateGradParameters"></a>
 ### accUpdateGradParameters(input, gradOutput, learningRate) ###
 
 This is a convenience module that performs two functions at
@@ -136,7 +136,7 @@ As it can be seen, the gradients are accumulated directly into
 weights. This assumption may not be true for a module that computes a
 nonlinear operation.
 
-<a name="nn.Module.share"/>
+<a name="nn.Module.share"></a>
 ### share(mlp,s1,s2,...,sn) ###
 
 This function modifies the parameters of the module named
@@ -174,7 +174,7 @@ print(mlp2:get(1).bias[1])
 
 ```
 
-<a name="nn.Module.clone"/>
+<a name="nn.Module.clone"></a>
 ### clone(mlp,...) ###
 
 Creates a deep copy of (i.e. not just a pointer to) the module,
@@ -205,29 +205,29 @@ print(mlp2:get(1).bias[1])
 
 ```
 
-<a name="nn.Module.type"/>
+<a name="nn.Module.type"></a>
 ### type(type) ###
 
 This function converts all the parameters of a module to the given
 `type`. The `type` can be one of the types defined for
 [torch.Tensor](https://github.com/torch/torch7/blob/master/doc/tensor.md).
 
-<a name="nn.Module.float"/>
+<a name="nn.Module.float"></a>
 ### float() ###
 
 Convenience method for calling [module:type('torch.FloatTensor')](#nn.Module.type)
 
-<a name="nn.Module.double"/>
+<a name="nn.Module.double"></a>
 ### double() ###
 
 Convenience method for calling [module:type('torch.DoubleTensor')](#nn.Module.type)
 
-<a name="nn.Module.cuda"/>
+<a name="nn.Module.cuda"></a>
 ### cuda() ###
 
 Convenience method for calling [module:type('torch.CudaTensor')](#nn.Module.type)
 
-<a name="nn.statevars.dok"/>
+<a name="nn.statevars.dok"></a>
 ### State Variables ###
 
 These state variables are useful objects if one wants to check the guts of
@@ -240,13 +240,13 @@ However, some special sub-classes
 like [table layers](table.md#nn.TableLayers) contain something else. Please,
 refer to each module specification for further information.
 
-<a name="nn.Module.output"/>
+<a name="nn.Module.output"></a>
 #### output ####
 
 This contains the output of the module, computed with the last call of
 [forward(input)](#nn.Module.forward).
 
-<a name="nn.Module.gradInput"/>
+<a name="nn.Module.gradInput"></a>
 #### gradInput ####
 
 This contains the gradients with respect to the inputs of the module, computed with the last call of
@@ -258,7 +258,7 @@ Some modules contain parameters (the ones that we actually want to
 train!). The name of these parameters, and gradients w.r.t these parameters
 are module dependent.
 
-<a name="nn.Module.parameters"/>
+<a name="nn.Module.parameters"></a>
 ### [{weights}, {gradWeights}] parameters() ###
 
 This function should returns two tables. One for the learnable
@@ -268,7 +268,7 @@ wrt to the learnable parameters `{gradWeights}`.
 Custom modules should override this function if they use learnable
 parameters that are stored in tensors.
 
-<a name="nn.Module.getParameters"/>
+<a name="nn.Module.getParameters"></a>
 ### [flatParameters, flatGradParameters] getParameters() ###
 
 This function returns two tensors. One for the flattened learnable
@@ -279,15 +279,15 @@ Custom modules should not override this function. They should instead override [
 
 This function will go over all the weights and gradWeights and make them view into a single tensor (one for weights and one for gradWeights). Since the storage of every weight and gradWeight is changed, this function should be called only once on a given network.
 
-<a name="nn.Module.training"/>
+<a name="nn.Module.training"></a>
 ### training() ###
 This sets the mode of the Module (or sub-modules) to `train=true`. This is useful for modules like [Dropout](simple.md#nn.Dropout) that have a different behaviour during training vs evaluation.
 
-<a name="nn.Module.evaluate"/>
+<a name="nn.Module.evaluate"></a>
 ### evaluate() ###
 This sets the mode of the Module (or sub-modules) to `train=false`. This is useful for modules like [Dropout](simple.md#nn.Dropout) that have a different behaviour during training vs evaluation.
 
-<a name="nn.Module.findModules"/>
+<a name="nn.Module.findModules"></a>
 ### findModules(typename) ###
 Find all instances of modules in the network of a certain `typename`.  It returns a flattened list of the matching nodes, as well as a flattened list of the container modules for each matching node.
 
@@ -331,7 +331,7 @@ for i = 1, #threshold_nodes do
 end
 ```
 
-<a name="nn.Module.listModules"/>
+<a name="nn.Module.listModules"></a>
 ### listModules() ###
 
 List all Modules instances in a network. Returns a flattened list of modules, 
diff --git a/doc/overview.md b/doc/overview.md
index c9eedaebc..6aec32176 100644
--- a/doc/overview.md
+++ b/doc/overview.md
@@ -1,4 +1,4 @@
-<a name="nn.overview.dok"/>
+<a name="nn.overview.dok"></a>
 # Overview #
 
 Each module of a network is composed of [Modules](module.md#nn.Modules) and there
@@ -23,31 +23,35 @@ easy with a simple for loop to [train a neural network yourself](training.md#nn.
 ## Detailed Overview ##
 This section provides a detailed overview of the neural network package. First the omnipresent [Module](#nn.overview.module) is examined, followed by some examples for [combining modules](#nn.overview.plugandplay) together. The last part explores facilities for [training a neural network](#nn.overview.training).
 
-<a name="nn.overview.module"/>
+<a name="nn.overview.module"></a>
 ### Module ###
 
 A neural network is called a [Module](module.md#nn.Module) (or simply
 _module_ in this documentation) in Torch. `Module` is an abstract
 class which defines four main methods:
+
   * [forward(input)](module.md#nn.Module.forward) which computes the output of the module given the `input` [Tensor](https://github.com/torch/torch7/blob/master/doc/tensor.md).
   * [backward(input, gradOutput)](module.md#nn.Module.backward) which computes the gradients of the module with respect to its own parameters, and its own inputs.
   * [zeroGradParameters()](module.md#nn.Module.zeroGradParameters) which zeroes the gradient with respect to the parameters of the module.
   * [updateParameters(learningRate)](module.md#nn.Module.updateParameters) which updates the parameters after one has computed the gradients with `backward()`
 
 It also declares two members:
+
   * [output](module.md#nn.Module.output) which is the output returned by `forward()`.
   * [gradInput](module.md#nn.Module.gradInput) which contains the gradients with respect to the input of the module, computed in a `backward()`.
 
 Two other perhaps less used but handy methods are also defined:
+
   * [share(mlp,s1,s2,...,sn)](module.md#nn.Module.share) which makes this module share the parameters s1,..sn of the module `mlp`. This is useful if you want to have modules that share the same weights.
   * [clone(...)](module.md#nn.Module.clone) which produces a deep copy of (i.e. not just a pointer to) this Module, including the current state of its parameters (if any).
 
 Some important remarks:
+
   * `output` contains only valid values after a [forward(input)](module.md#nn.Module.forward).
   * `gradInput` contains only valid values after a [backward(input, gradOutput)](module.md#nn.Module.backward).
   * [backward(input, gradOutput)](module.md#nn.Module.backward) uses certain computations obtained during [forward(input)](module.md#nn.Module.forward). You _must_ call `forward()` before calling a `backward()`, on the _same_ `input`, or your gradients are going to be incorrect!
 
-<a name="nn.overview.plugandplay"/>
+<a name="nn.overview.plugandplay"></a>
 ### Plug and play ###
 
 Building a simple neural network can be achieved by constructing an available layer.
@@ -75,7 +79,7 @@ Of course, `Sequential` and `Concat` can contains other
 networks you ever dreamt of! See the [[#nn.Modules|complete list of
 available modules]].
 
-<a name="nn.overview.training"/>
+<a name="nn.overview.training"></a>
 ### Training a neural network ###
 
 Once you built your neural network, you have to choose a particular
@@ -114,7 +118,7 @@ are implemented.  [See an example](containers.md#nn.DoItStochasticGradient).
 to cut-and-paste it and create a variant to it adapted to your needs
 (if the constraints of `StochasticGradient` do not satisfy you).
 
-<a name="nn.overview.lowlevel"/>
+<a name="nn.overview.lowlevel"></a>
 #### Low Level Training ####
 
 If you want to program the `StochasticGradient` by hand, you
diff --git a/doc/simple.md b/doc/simple.md
index 6ef7ed28a..bc4881b4b 100755
--- a/doc/simple.md
+++ b/doc/simple.md
@@ -1,6 +1,7 @@
-<a name="nn.simplelayers.dok"/>
+<a name="nn.simplelayers.dok"></a>
 # Simple layers #
 Simple Modules are used for various tasks like adapting Tensor methods and providing affine transformations :
+
  * Parameterized Modules :
    * [Linear](#nn.Linear) : a linear transformation ;
    * [SparseLinear](#nn.SparseLinear) : a linear transformation with sparse inputs ;
@@ -36,7 +37,7 @@ Simple Modules are used for various tasks like adapting Tensor methods and provi
    * [Padding](#nn.Padding) : adds padding to a dimension ;
    * [L1Penalty](#nn.L1Penalty) : adds an L1 penalty to an input (for sparsity);
 
-<a name="nn.Linear"/>
+<a name="nn.Linear"></a>
 ## Linear ##
 
 ```lua
@@ -79,7 +80,7 @@ x = torch.Tensor(10) -- 10 inputs
 y = module:forward(x)
 ```
 
-<a name="nn.SparseLinear"/>
+<a name="nn.SparseLinear"></a>
 ## SparseLinear ##
 
 ```lua
@@ -113,7 +114,7 @@ x = torch.Tensor({ {1, 0.1}, {2, 0.3}, {10, 0.3}, {31, 0.2} })
 
 The first column contains indices, the second column contains values in a a vector where all other elements are zeros. The indices should not exceed the stated dimensions of the input to the layer (10000 in the example).
 
-<a name="nn.Dropout"/>
+<a name="nn.Dropout"></a>
 ## Dropout ##
 
 ```lua
@@ -183,7 +184,7 @@ We can return to training our model by first calling [Module:training()](module.
 
 When used, `Dropout` should normally be applied to the input of parameterized [Modules](module.md#nn.Module) like [Linear](#nn.Linear) or [SpatialConvolution](convolution.md#nn.SpatialConvolution). A `p` of `0.5` (the default) is usually okay for hidden layers. `Dropout` can sometimes be used successfully on the dataset inputs with a `p` around `0.2`. It sometimes works best following [Transfer](transfer.md) Modules like [ReLU](transfer.md#nn.ReLU). All this depends a great deal on the dataset so its up to the user to try different combinations.
 
-<a name="nn.SpatialDropout"/>
+<a name="nn.SpatialDropout"></a>
 ## SpatialDropout ##
 
 `module` = `nn.SpatialDropout(p)`
@@ -194,7 +195,7 @@ As described in the paper "Efficient Object Localization Using Convolutional Net
 
 ```nn.SpatialDropout``` accepts 3D or 4D inputs.  If the input is 3D than a layout of (features x height x width) is assumed and for 4D (batch x features x height x width) is assumed.
 
-<a name="nn.Abs"/>
+<a name="nn.Abs"></a>
 ## Abs ##
 
 ```lua
@@ -214,7 +215,7 @@ gnuplot.grid(true)
 ![](image/abs.png)
 
 
-<a name='nn.Add'/>
+<a name='nn.Add'></a>
 ## Add ##
 
 ```lua
@@ -264,7 +265,7 @@ gives the output:
 i.e. the network successfully learns the input `x` has been shifted to produce the output `y`.
 
 
-<a name="nn.Mul"/>
+<a name="nn.Mul"></a>
 ## Mul ##
 
 ```lua
@@ -309,7 +310,7 @@ gives the output:
 
 i.e. the network successfully learns the input `x` has been scaled by pi.
 
-<a name='nn.CMul'/>
+<a name='nn.CMul'></a>
 ## CMul ##
 
 ```lua
@@ -362,7 +363,7 @@ gives the output:
 i.e. the network successfully learns the input `x` has been scaled by those scaling factors to produce the output `y`.
 
 
-<a name="nn.Max"/>
+<a name="nn.Max"></a>
 ## Max ##
 
 ```lua
@@ -373,7 +374,7 @@ Applies a max operation over dimension `dimension`.
 Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` then an `nxq` matrix would be output.
 
 
-<a name="nn.Min"/>
+<a name="nn.Min"></a>
 ## Min ##
 
 ```lua
@@ -384,7 +385,7 @@ Applies a min operation over dimension `dimension`.
 Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` then an `nxq` matrix would be output.
 
 
-<a name="nn.Mean"/>
+<a name="nn.Mean"></a>
 ## Mean ##
 
 ```lua
@@ -394,7 +395,7 @@ module = nn.Mean(dimension)
 Applies a mean operation over dimension `dimension`.
 Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` then an `nxq` matrix would be output.
 
-<a name="nn.Sum"/>
+<a name="nn.Sum"></a>
 ## Sum ##
 
 ```lua
@@ -405,7 +406,7 @@ Applies a sum operation over dimension `dimension`.
 Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` then an `nxq` matrix would be output.
 
 
-<a name="nn.Euclidean"/>
+<a name="nn.Euclidean"></a>
 ## Euclidean ##
 
 ```lua
@@ -416,7 +417,7 @@ Outputs the Euclidean distance of the input to `outputSize` centers, i.e. this l
 
 The distance `y_j` between center `j` and input `x` is formulated as `y_j = || w_j - x ||`.
 
-<a name="nn.WeightedEuclidean"/>
+<a name="nn.WeightedEuclidean"></a>
 ## WeightedEuclidean ##
 
 ```lua
@@ -429,7 +430,7 @@ In other words, for each of the `outputSize` centers `w_j`, there is a diagonal
 
 The distance `y_j` between center `j` and input `x` is formulated as `y_j = || c_j * (w_j - x) ||`.
 
-<a name="nn.Identity"/>
+<a name="nn.Identity"></a>
 ## Identity ##
 
 ```lua
@@ -488,7 +489,7 @@ for i = 1, 100 do           -- Do a few training iterations
 end
 ```
 
-<a name="nn.Copy"/>
+<a name="nn.Copy"></a>
 ## Copy ##
 
 ```lua
@@ -498,7 +499,7 @@ module = nn.Copy(inputType, outputType, [forceCopy, dontCast])
 This layer copies the input to output with type casting from input type from `inputType` to `outputType`. Unless `forceCopy` is true, when the first two arguments are the same, the input isn't copied, only transfered as the output. The default `forceCopy` is false.
 When `dontCast` is true, a call to `nn.Copy:type(type)` will not cast the module's `output` and `gradInput` Tensors to the new type. The default is false.
 
-<a name="nn.Narrow"/>
+<a name="nn.Narrow"></a>
 ## Narrow ##
 
 ```lua
@@ -507,7 +508,7 @@ module = nn.Narrow(dimension, offset, length)
 
 Narrow is application of [narrow](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-narrowdim-index-size) operation in a module.
 
-<a name="nn.Replicate"/>
+<a name="nn.Replicate"></a>
 ## Replicate ##
 
 ```lua
@@ -552,7 +553,7 @@ This allows the module to replicate the same non-batch dimension `dim` for both
 ```
 
 
-<a name="nn.Reshape"/>
+<a name="nn.Reshape"></a>
 ## Reshape ##
 
 ```lua
@@ -640,7 +641,7 @@ Example:
 
 ```
 
-<a name="nn.View"/>
+<a name="nn.View"></a>
 ## View ##
 
 ```lua
@@ -723,7 +724,7 @@ Example 2:
 [torch.LongStorage of size 2]
 ```
 
-<a name="nn.Select"/>
+<a name="nn.Select"></a>
 ## Select ##
 
 ```lua
@@ -798,7 +799,7 @@ for i = 1, 10000 do     -- Train for a few iterations
 end
 ```
 
-<a name="nn.Exp"/>
+<a name="nn.Exp"></a>
 ## Exp ##
 
 ```lua
@@ -820,7 +821,7 @@ gnuplot.grid(true)
 ![](image/exp.png)
 
 
-<a name="nn.Square"/>
+<a name="nn.Square"></a>
 ## Square ##
 
 ```lua
@@ -842,7 +843,7 @@ gnuplot.grid(true)
 ![](image/square.png)
 
 
-<a name="nn.Sqrt"/>
+<a name="nn.Sqrt"></a>
 ## Sqrt ##
 
 ```lua
@@ -864,7 +865,7 @@ gnuplot.grid(true)
 ![](image/sqrt.png)
 
 
-<a name="nn.Power"/>
+<a name="nn.Power"></a>
 ## Power ##
 
 ```lua
@@ -886,7 +887,7 @@ gnuplot.grid(true)
 ![](image/power.png)
 
 
-<a name="nn.MM"/>
+<a name="nn.MM"></a>
 ## MM ##
 
 ```lua
@@ -905,7 +906,7 @@ C = model.forward({A, B})  -- C will be of size `b x m x n`
 ```
 
 
-<a name="nn.BatchNormalization"/>
+<a name="nn.BatchNormalization"></a>
 ## BatchNormalization ##
 
 ```lua
@@ -945,7 +946,7 @@ A = torch.randn(b, m)
 C = model.forward(A)  -- C will be of size `b x m`
 ```
 
-<a name="nn.Padding"/>
+<a name="nn.Padding"></a>
 ## Padding ##
 
 `module` = `nn.Padding(dim, pad [, nInputDim, value])`
@@ -978,7 +979,7 @@ module:forward(torch.randn(2, 3)) --batch input
 ```
 
 
-<a name="nn.L1Penalty"/>
+<a name="nn.L1Penalty"></a>
 ## L1Penalty ##
 
 ```lua
diff --git a/doc/table.md b/doc/table.md
index 91ea209c9..221e4c37b 100755
--- a/doc/table.md
+++ b/doc/table.md
@@ -1,8 +1,9 @@
-<a name="nn.TableLayers"/>
+<a name="nn.TableLayers"></a>
 # Table Layers #
 
 This set of modules allows the manipulation of `table`s through the layers of a neural network.
 This allows one to build very rich architectures:
+
  * `table` Container Modules encapsulate sub-Modules:
    * [`ConcatTable`](#nn.ConcatTable): applies each member module to the same input     [`Tensor`](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor) and outputs a `table`;
    * [`ParallelTable`](#nn.ParallelTable): applies the `i`-th member module to the `i`-th input and outputs a `table`;
@@ -35,7 +36,7 @@ pred = mlp:forward(t)
 pred = mlp:forward{x, y, z}      -- This is equivalent to the line before
 ```
 
-<a name="nn.ConcatTable"/>
+<a name="nn.ConcatTable"></a>
 ## ConcatTable ##
 
 ```lua
@@ -115,7 +116,7 @@ which gives the output (using [th](https://github.com/torch/trepl)):
 ```
 
 
-<a name="nn.ParallelTable"/>
+<a name="nn.ParallelTable"></a>
 ## ParallelTable ##
 
 ```lua
@@ -164,7 +165,7 @@ which gives the output:
 ```
 
 
-<a name="nn.SplitTable"/>
+<a name="nn.SplitTable"></a>
 ## SplitTable ##
 
 ```lua
@@ -399,7 +400,7 @@ end
 ```
 
 
-<a name="nn.JoinTable"/>
+<a name="nn.JoinTable"></a>
 ## JoinTable ##
 
 ```lua
@@ -534,7 +535,7 @@ end
 ```
 
 
-<a name='nn.MixtureTable'/>
+<a name='nn.MixtureTable'></a>
 ## MixtureTable ##
 
 `module` = `MixtureTable([dim])`
@@ -632,7 +633,7 @@ Forwarding a batch of 2 examples gives us something like this:
 
 ```
 
-<a name="nn.SelectTable"/>
+<a name="nn.SelectTable"></a>
 ## SelectTable ##
 
 `module` = `SelectTable(index)`
@@ -725,7 +726,7 @@ Example 2:
 
 ```
 
-<a name="nn.NarrowTable"/>
+<a name="nn.NarrowTable"></a>
 ## NarrowTable ##
 
 `module` = `NarrowTable(offset [, length])`
@@ -765,7 +766,7 @@ Example:
 
 ```
 
-<a name="nn.FlattenTable"/>
+<a name="nn.FlattenTable"></a>
 ## FlattenTable ##
 
 `module` = `FlattenTable()`
@@ -802,7 +803,7 @@ gives the output:
 }
 ```
 
-<a name="nn.PairwiseDistance"/>
+<a name="nn.PairwiseDistance"></a>
 ## PairwiseDistance ##
 
 `module` = `PairwiseDistance(p)` creates a module that takes a `table` of two vectors as input and outputs the distance between them using the `p`-norm.
@@ -885,7 +886,7 @@ end
 
 ```
 
-<a name="nn.DotProduct"/>
+<a name="nn.DotProduct"></a>
 ## DotProduct ##
 
 `module` = `DotProduct()` creates a module that takes a `table` of two vectors as input and outputs the dot product between them.
@@ -978,7 +979,7 @@ end
 ```
 
 
-<a name="nn.CosineDistance"/>
+<a name="nn.CosineDistance"></a>
 ## CosineDistance ##
 
 `module` = `CosineDistance()` creates a module that takes a `table` of two vectors (or matrices if in batch mode) as input and outputs the cosine distance between them.
@@ -1065,7 +1066,7 @@ end
 
 
 
-<a name="nn.CriterionTable"/>
+<a name="nn.CriterionTable"></a>
 ## CriterionTable ##
 
 `module` = `CriterionTable(criterion)`
@@ -1115,7 +1116,7 @@ for i = 1, 20 do                                 -- Train for a few iterations
 end
 ```
 
-<a name="nn.CAddTable"/>
+<a name="nn.CAddTable"></a>
 ## CAddTable ##
 
 Takes a `table` of `Tensor`s and outputs summation of all `Tensor`s.
@@ -1157,7 +1158,7 @@ m = nn.CAddTable()
 ```
 
 
-<a name="nn.CSubTable"/>
+<a name="nn.CSubTable"></a>
 ## CSubTable ##
 
 Takes a `table` with two `Tensor` and returns the component-wise
@@ -1174,7 +1175,7 @@ m = nn.CSubTable()
 [torch.DoubleTensor of dimension 5]
 ```
 
-<a name="nn.CMulTable"/>
+<a name="nn.CMulTable"></a>
 ## CMulTable ##
 
 Takes a `table` of `Tensor`s and outputs the multiplication of all of them.
@@ -1192,7 +1193,7 @@ m = nn.CMulTable()
 
 ```
 
-<a name="nn.CDivTable"/>
+<a name="nn.CDivTable"></a>
 ## CDivTable ##
 
 Takes a `table` with two `Tensor` and returns the component-wise
diff --git a/doc/training.md b/doc/training.md
index 016c7c1ca..1a126d3e1 100644
--- a/doc/training.md
+++ b/doc/training.md
@@ -1,4 +1,4 @@
-<a name="nn.traningneuralnet.dok"/>
+<a name="nn.traningneuralnet.dok"></a>
 # Training a neural network #
 
 Training a neural network is easy with a [simple `for` loop](#nn.DoItYourself).
@@ -7,19 +7,19 @@ want sometimes a quick way of training neural
 networks. [StochasticGradient](#nn.StochasticGradient), a simple class
 which does the job for you is provided as standard.
 
-<a name="nn.StochasticGradient.dok"/>
+<a name="nn.StochasticGradient.dok"></a>
 ## StochasticGradient ##
 
 `StochasticGradient` is a high-level class for training [neural networks](#nn.Module), using a stochastic gradient
 algorithm. This class is [serializable](https://github.com/torch/torch7/blob/master/doc/serialization.md#serialization).
 
-<a name="nn.StochasticGradient"/>
+<a name="nn.StochasticGradient"></a>
 ### StochasticGradient(module, criterion) ###
 
 Create a `StochasticGradient` class, using the given [Module](module.md#nn.Module) and [Criterion](criterion.md#nn.Criterion).
 The class contains [several parameters](#nn.StochasticGradientParameters) you might want to set after initialization.
 
-<a name="nn.StochasticGradientTrain"/>
+<a name="nn.StochasticGradientTrain"></a>
 ### train(dataset) ###
 
 Train the module and criterion given in the
@@ -42,7 +42,7 @@ Such a dataset is easily constructed by using Lua tables, but it could any `C` o
 for example, as long as required operators/methods are implemented. 
 [See an example](#nn.DoItStochasticGradient).
 
-<a name="nn.StochasticGradientParameters"/>
+<a name="nn.StochasticGradientParameters"></a>
 ### Parameters ###
 
 `StochasticGradient` has several field which have an impact on a call to [train()](#nn.StochasticGradientTrain).
@@ -54,7 +54,7 @@ for example, as long as required operators/methods are implemented.
   * `hookExample`: A possible hook function which will be called (if non-nil) during training after each example forwarded and backwarded through the network. The function takes `(self, example)` as parameters. Default is `nil`.
   * `hookIteration`: A possible hook function which will be called (if non-nil) during training after a complete pass over the dataset. The function takes `(self, iteration)` as parameters. Default is `nil`.
 
-<a name="nn.DoItStochasticGradient"/>
+<a name="nn.DoItStochasticGradient"></a>
 ## Example of training using StochasticGradient ##
 
 We show an example here on a classical XOR problem.
@@ -134,7 +134,7 @@ You should see something like:
 [torch.Tensor of dimension 1]
 ```
 
-<a name="nn.DoItYourself"/>
+<a name="nn.DoItYourself"></a>
 ## Example of manual training of a neural network ##
 
 We show an example here on a classical XOR problem.
diff --git a/doc/transfer.md b/doc/transfer.md
index c03017de1..6b3be00de 100755
--- a/doc/transfer.md
+++ b/doc/transfer.md
@@ -1,8 +1,8 @@
-<a name="nn.transfer.dok"/>
+<a name="nn.transfer.dok"></a>
 # Transfer Function Layers #
 Transfer functions are normally used to introduce a non-linearity after a parameterized layer like [Linear](simple.md#nn.Linear) and  [SpatialConvolution](convolution.md#nn.SpatialConvolution). Non-linearities allows for dividing the problem space into more complex regions than what a simple logistic regressor would permit.
 
-<a name="nn.HardTanh"/>
+<a name="nn.HardTanh"></a>
 ## HardTanh ##
 
 Applies the `HardTanh` function element-wise to the input Tensor,
@@ -26,7 +26,7 @@ gnuplot.grid(true)
 ![](image/htanh.png)
 
 
-<a name="nn.HardShrink"/>
+<a name="nn.HardShrink"></a>
 ## HardShrink ##
 
 `module = nn.HardShrink(lambda)`
@@ -51,7 +51,7 @@ gnuplot.grid(true)
 ```
 ![](image/hshrink.png)
 
-<a name="nn.SoftShrink"/>
+<a name="nn.SoftShrink"></a>
 ## SoftShrink ##
 
 `module = nn.SoftShrink(lambda)`
@@ -77,7 +77,7 @@ gnuplot.grid(true)
 ![](image/sshrink.png)
 
 
-<a name="nn.SoftMax"/>
+<a name="nn.SoftMax"></a>
 ## SoftMax ##
 
 Applies the `Softmax` function to an n-dimensional input Tensor,
@@ -99,7 +99,7 @@ gnuplot.grid(true)
 
 Note that this module doesn't work directly with [ClassNLLCriterion](criterion.md#nn.ClassNLLCriterion), which expects the `nn.Log` to be computed between the `SoftMax` and itself. Use [LogSoftMax](#nn.LogSoftMax) instead (it's faster).
 
-<a name="nn.SoftMin"/>
+<a name="nn.SoftMin"></a>
 ## SoftMin ##
 
 Applies the `Softmin` function to an n-dimensional input Tensor,
@@ -119,7 +119,7 @@ gnuplot.grid(true)
 ```
 ![](image/softmin.png)
 
-<a name="nn.SoftPlus"/>
+<a name="nn.SoftPlus"></a>
 ### SoftPlus ###
 
 Applies the `SoftPlus` function to an n-dimensioanl input Tensor.
@@ -138,7 +138,7 @@ gnuplot.grid(true)
 ```
 ![](image/softplus.png)
 
-<a name="nn.SoftSign"/>
+<a name="nn.SoftSign"></a>
 ## SoftSign ##
 
 Applies the `SoftSign` function to an n-dimensioanl input Tensor.
@@ -156,7 +156,7 @@ gnuplot.grid(true)
 ```
 ![](image/softsign.png)
 
-<a name="nn.LogSigmoid"/>
+<a name="nn.LogSigmoid"></a>
 ## LogSigmoid ##
 
 Applies the `LogSigmoid` function to an n-dimensional input Tensor.
@@ -176,7 +176,7 @@ gnuplot.grid(true)
 ![](image/logsigmoid.png)
 
 
-<a name="nn.LogSoftMax"/>
+<a name="nn.LogSoftMax"></a>
 ## LogSoftMax ##
 
 Applies the `LogSoftmax` function to an n-dimensional input Tensor.
@@ -195,7 +195,7 @@ gnuplot.grid(true)
 ```
 ![](image/logsoftmax.png)
 
-<a name="nn.Sigmoid"/>
+<a name="nn.Sigmoid"></a>
 ## Sigmoid ##
 
 Applies the `Sigmoid` function element-wise to the input Tensor,
@@ -214,7 +214,7 @@ gnuplot.grid(true)
 ```
 ![](image/sigmoid.png)
 
-<a name="nn.Tanh"/>
+<a name="nn.Tanh"></a>
 ## Tanh ##
 
 Applies the `Tanh` function element-wise to the input Tensor,
@@ -231,7 +231,7 @@ gnuplot.grid(true)
 ```
 ![](image/tanh.png)
 
-<a name="nn.ReLU"/>
+<a name="nn.ReLU"></a>
 ## ReLU ##
 
 Applies the rectified linear unit (`ReLU`) function element-wise to the input Tensor,
@@ -253,7 +253,7 @@ gnuplot.grid(true)
 ```
 ![](image/relu.png)
 
-<a name="nn.PReLU"/>
+<a name="nn.PReLU"></a>
 ## PReLU ##
 
 Applies parametric ReLU, which parameter varies the slope of the negative part:
@@ -267,7 +267,7 @@ Note that weight decay should not be used on it. For reference see http://arxiv.
 
 ![](image/prelu.png)
 
-<a name="nn.AddConstant"/>
+<a name="nn.AddConstant"></a>
 ## AddConstant ##
 
 Adds a (non-learnable) scalar constant.  This module is sometimes useful for debuggging purposes:  `f(x)` = `x + k`, where `k` is a scalar.
@@ -278,7 +278,7 @@ m=nn.AddConstant(k,true) -- true = in-place, false = keeping separate state.
 ```
 In-place mode restores the original input value after the backward pass, allowing it's use after other in-place modules, like [MulConstant](#nn.MulConstant).
 
-<a name="nn.MulConstant"/>
+<a name="nn.MulConstant"></a>
 ## MulConstant ##
 
 Multiplies input tensor by a (non-learnable) scalar constant.  This module is sometimes useful for debuggging purposes:  `f(x)` = `k * x`, where `k` is a scalar.
diff --git a/mkdocs.yml b/mkdocs.yml
new file mode 100644
index 000000000..f38456dca
--- /dev/null
+++ b/mkdocs.yml
@@ -0,0 +1,18 @@
+site_name: nn
+theme : simplex
+repo_url : https://github.com/torch/nn
+use_directory_urls : false
+markdown_extensions: [extra]
+docs_dir : doc
+pages:
+- [index.md, Home]
+- [module.md, Modules, Module Interface]
+- [containers.md, Modules, Containers]
+- [transfer.md, Modules, Transfer Functions]
+- [simple.md, Modules, Simple Layers]
+- [table.md, Modules, Table Layers]
+- [convolution.md, Modules, Convolution Layers] 
+- [criterion.md, Criterion, Criterions]
+- [overview.md, Additional Documentation, Overview]
+- [training.md, Additional Documentation, Training]
+- [testing.md, Additional Documentation, Testing]

From 0e05ac975476fff3ecf75894595d60ba04b5e0d6 Mon Sep 17 00:00:00 2001
From: nicholas-leonard <nick@nikopia.org>
Date: Mon, 10 Aug 2015 22:15:15 -0400
Subject: [PATCH 2/2] fix lists

---
 doc/containers.md  | 10 +++----
 doc/convolution.md | 40 +++++++++++++--------------
 doc/criterion.md   | 38 +++++++++++++-------------
 doc/simple.md      | 68 +++++++++++++++++++++++-----------------------
 doc/table.md       | 42 ++++++++++++++--------------
 5 files changed, 99 insertions(+), 99 deletions(-)

diff --git a/doc/containers.md b/doc/containers.md
index 8d02ab96b..9a8360761 100644
--- a/doc/containers.md
+++ b/doc/containers.md
@@ -2,11 +2,11 @@
 # Containers #
 Complex neural networks are easily built using container classes:
 
- * [Container](#nn.Container) : abstract class inherited by containers ;
-   * [Sequential](#nn.Sequential) : plugs layers in a feed-forward fully connected manner ;
-   * [Parallel](#nn.Parallel) : applies its `ith` child module to the  `ith` slice of the input Tensor ;
-   * [Concat](#nn.Concat) : concatenates in one layer several modules along dimension `dim` ;
-     * [DepthConcat](#nn.DepthConcat) : like Concat, but adds zero-padding when non-`dim` sizes don't match;
+  * [Container](#nn.Container) : abstract class inherited by containers ;
+    * [Sequential](#nn.Sequential) : plugs layers in a feed-forward fully connected manner ;
+    * [Parallel](#nn.Parallel) : applies its `ith` child module to the  `ith` slice of the input Tensor ;
+    * [Concat](#nn.Concat) : concatenates in one layer several modules along dimension `dim` ;
+      * [DepthConcat](#nn.DepthConcat) : like Concat, but adds zero-padding when non-`dim` sizes don't match;
  
 See also the [Table Containers](#nn.TableContainers) for manipulating tables of [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md).
 
diff --git a/doc/convolution.md b/doc/convolution.md
index 4f716c639..54b8da9cd 100755
--- a/doc/convolution.md
+++ b/doc/convolution.md
@@ -3,28 +3,28 @@
 
 A convolution is an integral that expresses the amount of overlap of one function `g` as it is shifted over another function `f`. It therefore "blends" one function with another. The neural network package supports convolution, pooling, subsampling and other relevant facilities. These are divided base on the dimensionality of the input and output [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor):
  
- * [Temporal Modules](#nn.TemporalModules) apply to sequences with a one-dimensional relationship
+  * [Temporal Modules](#nn.TemporalModules) apply to sequences with a one-dimensional relationship
 (e.g. sequences of words, phonemes and letters. Strings of some kind).
-   * [TemporalConvolution](#nn.TemporalConvolution) : a 1D convolution over an input sequence ;
-   * [TemporalSubSampling](#nn.TemporalSubSampling) : a 1D sub-sampling over an input sequence ;
-   * [TemporalMaxPooling](#nn.TemporalMaxPooling) : a 1D max-pooling operation over an input sequence ;
-   * [LookupTable](#nn.LookupTable) : a convolution of width `1`, commonly used for word embeddings ;
- * [Spatial Modules](#nn.SpatialModules) apply to inputs with two-dimensional relationships (e.g. images):
-   * [SpatialConvolution](#nn.SpatialConvolution) : a 2D convolution over an input image ;
-   * [SpatialSubSampling](#nn.SpatialSubSampling) : a 2D sub-sampling over an input image ;
-   * [SpatialMaxPooling](#nn.SpatialMaxPooling) : a 2D max-pooling operation over an input image ;
-   * [SpatialAveragePooling](#nn.SpatialAveragePooling) : a 2D average-pooling operation over an input image ;
-   * [SpatialAdaptiveMaxPooling](#nn.SpatialAdaptiveMaxPooling) : a 2D max-pooling operation which adapts its parameters dynamically such that the output is of fixed size ;
-   * [SpatialLPPooling](#nn.SpatialLPPooling) : computes the `p` norm in a convolutional manner on a set of input images ;
-   * [SpatialConvolutionMap](#nn.SpatialConvolutionMap) : a 2D convolution that uses a generic connection table ;
-   * [SpatialZeroPadding](#nn.SpatialZeroPadding) : padds a feature map with specified number of zeros ;
-   * [SpatialSubtractiveNormalization](#nn.SpatialSubtractiveNormalization) : a spatial subtraction operation on a series of 2D inputs using
-   * [SpatialBatchNormalization](#nn.SpatialBatchNormalization): mean/std normalization over the mini-batch inputs and pixels, with an optional affine transform that follows
+    * [TemporalConvolution](#nn.TemporalConvolution) : a 1D convolution over an input sequence ;
+    * [TemporalSubSampling](#nn.TemporalSubSampling) : a 1D sub-sampling over an input sequence ;
+    * [TemporalMaxPooling](#nn.TemporalMaxPooling) : a 1D max-pooling operation over an input sequence ;
+    * [LookupTable](#nn.LookupTable) : a convolution of width `1`, commonly used for word embeddings ;
+  * [Spatial Modules](#nn.SpatialModules) apply to inputs with two-dimensional relationships (e.g. images):
+    * [SpatialConvolution](#nn.SpatialConvolution) : a 2D convolution over an input image ;
+    * [SpatialSubSampling](#nn.SpatialSubSampling) : a 2D sub-sampling over an input image ;
+    * [SpatialMaxPooling](#nn.SpatialMaxPooling) : a 2D max-pooling operation over an input image ;
+    * [SpatialAveragePooling](#nn.SpatialAveragePooling) : a 2D average-pooling operation over an input image ;
+    * [SpatialAdaptiveMaxPooling](#nn.SpatialAdaptiveMaxPooling) : a 2D max-pooling operation which adapts its parameters dynamically such that the output is of fixed size ;
+    * [SpatialLPPooling](#nn.SpatialLPPooling) : computes the `p` norm in a convolutional manner on a set of input images ;
+    * [SpatialConvolutionMap](#nn.SpatialConvolutionMap) : a 2D convolution that uses a generic connection table ;
+    * [SpatialZeroPadding](#nn.SpatialZeroPadding) : padds a feature map with specified number of zeros ;
+    * [SpatialSubtractiveNormalization](#nn.SpatialSubtractiveNormalization) : a spatial subtraction operation on a series of 2D inputs using
+    * [SpatialBatchNormalization](#nn.SpatialBatchNormalization): mean/std normalization over the mini-batch inputs and pixels, with an optional affine transform that follows
 a kernel for computing the weighted average in a neighborhood ;
- * [Volumetric Modules](#nn.VolumetricModules) apply to inputs with three-dimensional relationships (e.g. videos) :
-   * [VolumetricConvolution](#nn.VolumetricConvolution) : a 3D convolution over an input video (a sequence of images) ;
-   * [VolumetricMaxPooling](#nn.VolumetricMaxPooling) : a 3D max-pooling operation over an input video.
-   * [VolumetricAveragePooling](#nn.VolumetricAveragePooling) : a 3D average-pooling operation over an input video.
+  * [Volumetric Modules](#nn.VolumetricModules) apply to inputs with three-dimensional relationships (e.g. videos) :
+    * [VolumetricConvolution](#nn.VolumetricConvolution) : a 3D convolution over an input video (a sequence of images) ;
+    * [VolumetricMaxPooling](#nn.VolumetricMaxPooling) : a 3D max-pooling operation over an input video.
+    * [VolumetricAveragePooling](#nn.VolumetricAveragePooling) : a 3D average-pooling operation over an input video.
 
 <a name="nn.TemporalModules"></a>
 ## Temporal Modules ##
diff --git a/doc/criterion.md b/doc/criterion.md
index 4f89338c9..292893874 100755
--- a/doc/criterion.md
+++ b/doc/criterion.md
@@ -4,25 +4,25 @@
 [`Criterions`](#nn.Criterion) are helpful to train a neural network. Given an input and a
 target, they compute a gradient according to a given loss function.
 
- * Classification criterions:
-  * [`BCECriterion`](#nn.BCECriterion): binary cross-entropy (two-class version of [`ClassNLLCriterion`](#nn.ClassNLLCriterion));
-  * [`ClassNLLCriterion`](#nn.ClassNLLCriterion): negative log-likelihood for [`LogSoftMax`](transfer.md#nn.LogSoftMax) (multi-class);
-  * [`CrossEntropyCriterion`](#nn.CrossEntropyCriterion): combines [`LogSoftMax`](transfer.md#nn.LogSoftMax) and [`ClassNLLCriterion`](#nn.ClassNLLCriterion);
-  * [`MarginCriterion`](#nn.MarginCriterion): two class margin-based loss;
-  * [`MultiMarginCriterion`](#nn.MultiMarginCriterion): multi-class margin-based loss;
-  * [`MultiLabelMarginCriterion`](#nn.MultiLabelMarginCriterion): multi-class multi-classification margin-based loss;
- * Regression criterions:
-  * [`AbsCriterion`](#nn.AbsCriterion): measures the mean absolute value of the element-wise difference between input;
-  * [`MSECriterion`](#nn.MSECriterion): mean square error (a classic);
-  * [`DistKLDivCriterion`](#nn.DistKLDivCriterion): Kullback–Leibler divergence (for fitting continuous probability distributions);
- * Embedding criterions (measuring whether two inputs are similar or dissimilar):
-  * [`HingeEmbeddingCriterion`](#nn.HingeEmbeddingCriterion): takes a distance as input;
-  * [`L1HingeEmbeddingCriterion`](#nn.L1HingeEmbeddingCriterion): L1 distance between two inputs;
-  * [`CosineEmbeddingCriterion`](#nn.CosineEmbeddingCriterion): cosine distance between two inputs;
- * Miscelaneus criterions:
-  * [`MultiCriterion`](#nn.MultiCriterion) : a weighted sum of other criterions each applied to the same input and target;
-  * [`ParallelCriterion`](#nn.ParallelCriterion) : a weighted sum of other criterions each applied to a different input and target;
-  * [`MarginRankingCriterion`](#nn.MarginRankingCriterion): ranks two inputs;
+  * Classification criterions:
+    * [`BCECriterion`](#nn.BCECriterion): binary cross-entropy (two-class version of [`ClassNLLCriterion`](#nn.ClassNLLCriterion));
+    * [`ClassNLLCriterion`](#nn.ClassNLLCriterion): negative log-likelihood for [`LogSoftMax`](transfer.md#nn.LogSoftMax) (multi-class);
+    * [`CrossEntropyCriterion`](#nn.CrossEntropyCriterion): combines [`LogSoftMax`](transfer.md#nn.LogSoftMax) and [`ClassNLLCriterion`](#nn.ClassNLLCriterion);
+    * [`MarginCriterion`](#nn.MarginCriterion): two class margin-based loss;
+    * [`MultiMarginCriterion`](#nn.MultiMarginCriterion): multi-class margin-based loss;
+    * [`MultiLabelMarginCriterion`](#nn.MultiLabelMarginCriterion): multi-class multi-classification margin-based loss;
+  * Regression criterions:
+    * [`AbsCriterion`](#nn.AbsCriterion): measures the mean absolute value of the element-wise difference between input;
+    * [`MSECriterion`](#nn.MSECriterion): mean square error (a classic);
+    * [`DistKLDivCriterion`](#nn.DistKLDivCriterion): Kullback–Leibler divergence (for fitting continuous probability distributions);
+  * Embedding criterions (measuring whether two inputs are similar or dissimilar):
+    * [`HingeEmbeddingCriterion`](#nn.HingeEmbeddingCriterion): takes a distance as input;
+    * [`L1HingeEmbeddingCriterion`](#nn.L1HingeEmbeddingCriterion): L1 distance between two inputs;
+    * [`CosineEmbeddingCriterion`](#nn.CosineEmbeddingCriterion): cosine distance between two inputs;
+  * Miscelaneus criterions:
+    * [`MultiCriterion`](#nn.MultiCriterion) : a weighted sum of other criterions each applied to the same input and target;
+    * [`ParallelCriterion`](#nn.ParallelCriterion) : a weighted sum of other criterions each applied to a different input and target;
+    * [`MarginRankingCriterion`](#nn.MarginRankingCriterion): ranks two inputs;
 
 <a name="nn.Criterion"></a>
 ## Criterion ##
diff --git a/doc/simple.md b/doc/simple.md
index bc4881b4b..ebb2d2fe9 100755
--- a/doc/simple.md
+++ b/doc/simple.md
@@ -2,40 +2,40 @@
 # Simple layers #
 Simple Modules are used for various tasks like adapting Tensor methods and providing affine transformations :
 
- * Parameterized Modules :
-   * [Linear](#nn.Linear) : a linear transformation ;
-   * [SparseLinear](#nn.SparseLinear) : a linear transformation with sparse inputs ;
-   * [Add](#nn.Add) : adds a bias term to the incoming data ;
-   * [Mul](#nn.Mul) : multiply a single scalar factor to the incoming data ;
-   * [CMul](#nn.CMul) : a component-wise multiplication to the incoming data ;
-   * [CDiv](#nn.CDiv) : a component-wise division to the incoming data ;
-   * [Euclidean](#nn.Euclidean) : the euclidean distance of the input to `k` mean centers ;
-   * [WeightedEuclidean](#nn.WeightedEuclidean) : similar to [Euclidean](#nn.Euclidean), but additionally learns a diagonal covariance matrix ;
- * Modules that adapt basic Tensor methods :
-   * [Copy](#nn.Copy) : a [copy](https://github.com/torch/torch7/blob/master/doc/tensor.md#torch.Tensor.copy) of the input with [type](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-or-string-typetype) casting ;
-   * [Narrow](#nn.Narrow) : a [narrow](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-narrowdim-index-size) operation over a given dimension ;
-   * [Replicate](#nn.Replicate) : [repeats](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-repeattensorresult-sizes) input `n` times along its first dimension ;
-   * [Reshape](#nn.Reshape) : a [reshape](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchreshaperes-x-m-n) of the inputs ;
-   * [View](#nn.View) : a [view](https://github.com/torch/torch7/blob/master/doc/tensor.md#result-viewresult-tensor-sizes) of the inputs ;
-   * [Select](#nn.Select) : a [select](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-selectdim-index) over a given dimension ;
- * Modules that adapt mathematical Tensor methods :
-   * [Max](#nn.Max) : a [max](https://github.com/torch/torch7/blob/master/doc/maths.md#torch.max) operation over a given dimension ;
-   * [Min](#nn.Min) : a [min](https://github.com/torch/torch7/blob/master/doc/maths.md#torchminresval-resind-x) operation over a given dimension ;
-   * [Mean](#nn.Mean) : a [mean](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchmeanres-x-dim) operation over a given dimension ;
-   * [Sum](#nn.Sum) : a [sum](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchsumres-x) operation over a given dimension ;
-   * [Exp](#nn.Exp) : an element-wise [exp](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchexpres-x) operation ;
-   * [Abs](#nn.Abs) : an element-wise [abs](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchabsres-x) operation ;
-   * [Power](#nn.Power) : an element-wise [pow](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchpowres-x) operation ;
-   * [Square](#nn.Square) : an element-wise square operation ;
-   * [Sqrt](#nn.Sqrt) : an element-wise [sqrt](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchsqrtres-x) operation ;
-   * [MM](#nn.MM) : matrix-matrix multiplication (also supports batches of matrices) ;
- * Miscellaneous Modules :
-   * [BatchNormalization](#nn.BatchNormalization) - mean/std normalization over the mini-batch inputs (with an optional affine transform) ;
-   * [Identity](#nn.Identity) : forward input as-is to output (useful with [ParallelTable](table.md#nn.ParallelTable));
-   * [Dropout](#nn.Dropout) : masks parts of the `input` using binary samples from a [bernoulli](http://en.wikipedia.org/wiki/Bernoulli_distribution) distribution ;
-   * [SpatialDropout](#nn.SpatialDropout) : Same as Dropout but for spatial inputs where adjacent pixels are strongly correlated ;
-   * [Padding](#nn.Padding) : adds padding to a dimension ;
-   * [L1Penalty](#nn.L1Penalty) : adds an L1 penalty to an input (for sparsity);
+  * Parameterized Modules :
+    * [Linear](#nn.Linear) : a linear transformation ;
+    * [SparseLinear](#nn.SparseLinear) : a linear transformation with sparse inputs ;
+    * [Add](#nn.Add) : adds a bias term to the incoming data ;
+    * [Mul](#nn.Mul) : multiply a single scalar factor to the incoming data ;
+    * [CMul](#nn.CMul) : a component-wise multiplication to the incoming data ;
+    * [CDiv](#nn.CDiv) : a component-wise division to the incoming data ;
+    * [Euclidean](#nn.Euclidean) : the euclidean distance of the input to `k` mean centers ;
+    * [WeightedEuclidean](#nn.WeightedEuclidean) : similar to [Euclidean](#nn.Euclidean), but additionally learns a diagonal covariance matrix ;
+  * Modules that adapt basic Tensor methods :
+    * [Copy](#nn.Copy) : a [copy](https://github.com/torch/torch7/blob/master/doc/tensor.md#torch.Tensor.copy) of the input with [type](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-or-string-typetype) casting ;
+    * [Narrow](#nn.Narrow) : a [narrow](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-narrowdim-index-size) operation over a given dimension ;
+    * [Replicate](#nn.Replicate) : [repeats](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-repeattensorresult-sizes) input `n` times along its first dimension ;
+    * [Reshape](#nn.Reshape) : a [reshape](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchreshaperes-x-m-n) of the inputs ;
+    * [View](#nn.View) : a [view](https://github.com/torch/torch7/blob/master/doc/tensor.md#result-viewresult-tensor-sizes) of the inputs ;
+    * [Select](#nn.Select) : a [select](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-selectdim-index) over a given dimension ;
+  * Modules that adapt mathematical Tensor methods :
+    * [Max](#nn.Max) : a [max](https://github.com/torch/torch7/blob/master/doc/maths.md#torch.max) operation over a given dimension ;
+    * [Min](#nn.Min) : a [min](https://github.com/torch/torch7/blob/master/doc/maths.md#torchminresval-resind-x) operation over a given dimension ;
+    * [Mean](#nn.Mean) : a [mean](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchmeanres-x-dim) operation over a given dimension ;
+    * [Sum](#nn.Sum) : a [sum](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchsumres-x) operation over a given dimension ;
+    * [Exp](#nn.Exp) : an element-wise [exp](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchexpres-x) operation ;
+    * [Abs](#nn.Abs) : an element-wise [abs](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchabsres-x) operation ;
+    * [Power](#nn.Power) : an element-wise [pow](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchpowres-x) operation ;
+    * [Square](#nn.Square) : an element-wise square operation ;
+    * [Sqrt](#nn.Sqrt) : an element-wise [sqrt](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchsqrtres-x) operation ;
+    * [MM](#nn.MM) : matrix-matrix multiplication (also supports batches of matrices) ;
+  * Miscellaneous Modules :
+    * [BatchNormalization](#nn.BatchNormalization) - mean/std normalization over the mini-batch inputs (with an optional affine transform) ;
+    * [Identity](#nn.Identity) : forward input as-is to output (useful with [ParallelTable](table.md#nn.ParallelTable));
+    * [Dropout](#nn.Dropout) : masks parts of the `input` using binary samples from a [bernoulli](http://en.wikipedia.org/wiki/Bernoulli_distribution) distribution ;
+    * [SpatialDropout](#nn.SpatialDropout) : Same as Dropout but for spatial inputs where adjacent pixels are strongly correlated ;
+    * [Padding](#nn.Padding) : adds padding to a dimension ;
+    * [L1Penalty](#nn.L1Penalty) : adds an L1 penalty to an input (for sparsity);
 
 <a name="nn.Linear"></a>
 ## Linear ##
diff --git a/doc/table.md b/doc/table.md
index 221e4c37b..61d108543 100755
--- a/doc/table.md
+++ b/doc/table.md
@@ -4,27 +4,27 @@
 This set of modules allows the manipulation of `table`s through the layers of a neural network.
 This allows one to build very rich architectures:
 
- * `table` Container Modules encapsulate sub-Modules:
-   * [`ConcatTable`](#nn.ConcatTable): applies each member module to the same input     [`Tensor`](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor) and outputs a `table`;
-   * [`ParallelTable`](#nn.ParallelTable): applies the `i`-th member module to the `i`-th input and outputs a `table`;
- * Table Conversion Modules convert between `table`s and `Tensor`s or `table`s:
-   * [`SplitTable`](#nn.SplitTable): splits a `Tensor` into a `table` of `Tensor`s;
-   * [`JoinTable`](#nn.JoinTable): joins a `table` of `Tensor`s into a `Tensor`;
-   * [`MixtureTable`](#nn.MixtureTable): mixture of experts weighted by a gater;
-   * [`SelectTable`](#nn.SelectTable): select one element from a `table`;
-   * [`NarrowTable`](#nn.NarrowTable): select a slice of elements from a `table`;
-   * [`FlattenTable`](#nn.FlattenTable): flattens a nested `table` hierarchy;
- * Pair Modules compute a measure like distance or similarity from a pair (`table`) of input `Tensor`s:
-   * [`PairwiseDistance`](#nn.PairwiseDistance): outputs the `p`-norm. distance between inputs;
-   * [`DotProduct`](#nn.DotProduct): outputs the dot product (similarity) between inputs;
-   * [`CosineDistance`](#nn.CosineDistance): outputs the cosine distance between inputs;
- * CMath Modules perform element-wise operations on a `table` of `Tensor`s:
-   * [`CAddTable`](#nn.CAddTable): addition of input `Tensor`s;
-   * [`CSubTable`](#nn.CSubTable): substraction of input `Tensor`s;
-   * [`CMulTable`](#nn.CMulTable): multiplication of input `Tensor`s;
-   * [`CDivTable`](#nn.CDivTable): division of input `Tensor`s;
- * `Table` of Criteria:
-   * [`CriterionTable`](#nn.CriterionTable): wraps a [Criterion](criterion.md#nn.Criterion) so that it can accept a `table` of inputs.
+  * `table` Container Modules encapsulate sub-Modules:
+    * [`ConcatTable`](#nn.ConcatTable): applies each member module to the same input     [`Tensor`](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor) and outputs a `table`;
+    * [`ParallelTable`](#nn.ParallelTable): applies the `i`-th member module to the `i`-th input and outputs a `table`;
+  * Table Conversion Modules convert between `table`s and `Tensor`s or `table`s:
+    * [`SplitTable`](#nn.SplitTable): splits a `Tensor` into a `table` of `Tensor`s;
+    * [`JoinTable`](#nn.JoinTable): joins a `table` of `Tensor`s into a `Tensor`;
+    * [`MixtureTable`](#nn.MixtureTable): mixture of experts weighted by a gater;
+    * [`SelectTable`](#nn.SelectTable): select one element from a `table`;
+    * [`NarrowTable`](#nn.NarrowTable): select a slice of elements from a `table`;
+    * [`FlattenTable`](#nn.FlattenTable): flattens a nested `table` hierarchy;
+  * Pair Modules compute a measure like distance or similarity from a pair (`table`) of input `Tensor`s:
+    * [`PairwiseDistance`](#nn.PairwiseDistance): outputs the `p`-norm. distance between inputs;
+    * [`DotProduct`](#nn.DotProduct): outputs the dot product (similarity) between inputs;
+    * [`CosineDistance`](#nn.CosineDistance): outputs the cosine distance between inputs;
+  * CMath Modules perform element-wise operations on a `table` of `Tensor`s:
+    * [`CAddTable`](#nn.CAddTable): addition of input `Tensor`s;
+    * [`CSubTable`](#nn.CSubTable): substraction of input `Tensor`s;
+    * [`CMulTable`](#nn.CMulTable): multiplication of input `Tensor`s;
+    * [`CDivTable`](#nn.CDivTable): division of input `Tensor`s;
+  * `Table` of Criteria:
+    * [`CriterionTable`](#nn.CriterionTable): wraps a [Criterion](criterion.md#nn.Criterion) so that it can accept a `table` of inputs.
 
 `table`-based modules work by supporting `forward()` and `backward()` methods that can accept `table`s as inputs.
 It turns out that the usual [`Sequential`](containers.md#nn.Sequential) module can do this, so all that is needed is other child modules that take advantage of such `table`s.