From 9650d23e77032ebbd65fc60e50571498eb7263d6 Mon Sep 17 00:00:00 2001 From: nicholas-leonard Date: Mon, 10 Aug 2015 21:42:01 -0400 Subject: [PATCH 1/2] doc readthedocs --- README.md | 2 +- doc/containers.md | 25 ++++++++++--------- doc/convolution.md | 51 +++++++++++++++++++------------------- doc/criterion.md | 42 +++++++++++++++---------------- doc/index.md | 23 +++++++++++++++++ doc/module.md | 48 ++++++++++++++++++------------------ doc/overview.md | 14 +++++++---- doc/simple.md | 61 +++++++++++++++++++++++----------------------- doc/table.md | 35 +++++++++++++------------- doc/training.md | 14 +++++------ doc/transfer.md | 32 ++++++++++++------------ mkdocs.yml | 18 ++++++++++++++ 12 files changed, 207 insertions(+), 158 deletions(-) create mode 100644 doc/index.md create mode 100644 mkdocs.yml diff --git a/README.md b/README.md index 907be66a3..378a4409d 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ [![Build Status](https://travis-ci.org/torch/nn.svg?branch=master)](https://travis-ci.org/torch/nn) - + # Neural Network Package # This package provides an easy and modular way to build and train simple or complex neural networks using [Torch](https://github.com/torch/torch7/blob/master/README.md): diff --git a/doc/containers.md b/doc/containers.md index d691f4133..8d02ab96b 100644 --- a/doc/containers.md +++ b/doc/containers.md @@ -1,6 +1,7 @@ - + # Containers # Complex neural networks are easily built using container classes: + * [Container](#nn.Container) : abstract class inherited by containers ; * [Sequential](#nn.Sequential) : plugs layers in a feed-forward fully connected manner ; * [Parallel](#nn.Parallel) : applies its `ith` child module to the `ith` slice of the input Tensor ; @@ -9,7 +10,7 @@ Complex neural networks are easily built using container classes: See also the [Table Containers](#nn.TableContainers) for manipulating tables of [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md). - + ## Container ## This is an abstract [Module](module.md#nn.Module) class which declares methods defined in all containers. @@ -17,19 +18,19 @@ It reimplements many of the Module methods such that calls are propagated to the contained modules. For example, a call to [zeroGradParameters](module.md#nn.Module.zeroGradParameters) will be propagated to all contained modules. - + ### add(module) ### Adds the given `module` to the container. The order is important - + ### get(index) ### Returns the contained modules at index `index`. - + ### size() ### Returns the number of contained modules. - + ## Sequential ## Sequential provides a means to plug layers together @@ -51,7 +52,7 @@ which gives the output: [torch.Tensor of dimension 1] ``` - + ### remove([index]) ### Remove the module at the given `index`. If `index` is not specified, remove the last layer. @@ -71,7 +72,7 @@ nn.Sequential { ``` - + ### insert(module, [index]) ### Inserts the given `module` at the given `index`. If `index` is not specified, the incremented length of the sequence is used and so this is equivalent to use `add(module)`. @@ -92,7 +93,7 @@ nn.Sequential { - + ## Parallel ## `module` = `Parallel(inputDimension,outputDimension)` @@ -149,7 +150,7 @@ end ``` - + ## Concat ## ```lua @@ -179,7 +180,7 @@ which gives the output: [torch.Tensor of dimension 10] ``` - + ## DepthConcat ## ```lua @@ -273,7 +274,7 @@ module output tensors non-`dim` sizes aren't all odd or even. Such that in order to keep the mappings aligned, one need only ensure that these be all odd (or even). - + ## Table Containers ## While the above containers are used for manipulating input [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md), table containers are used for manipulating tables : * [ConcatTable](table.md#nn.ConcatTable) diff --git a/doc/convolution.md b/doc/convolution.md index 8d9e77bf6..4f716c639 100755 --- a/doc/convolution.md +++ b/doc/convolution.md @@ -1,7 +1,8 @@ - + # Convolutional layers # A convolution is an integral that expresses the amount of overlap of one function `g` as it is shifted over another function `f`. It therefore "blends" one function with another. The neural network package supports convolution, pooling, subsampling and other relevant facilities. These are divided base on the dimensionality of the input and output [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor): + * [Temporal Modules](#nn.TemporalModules) apply to sequences with a one-dimensional relationship (e.g. sequences of words, phonemes and letters. Strings of some kind). * [TemporalConvolution](#nn.TemporalConvolution) : a 1D convolution over an input sequence ; @@ -25,7 +26,7 @@ a kernel for computing the weighted average in a neighborhood ; * [VolumetricMaxPooling](#nn.VolumetricMaxPooling) : a 3D max-pooling operation over an input video. * [VolumetricAveragePooling](#nn.VolumetricAveragePooling) : a 3D average-pooling operation over an input video. - + ## Temporal Modules ## Excluding an optional first batch dimension, temporal layers expect a 2D Tensor as input. The first dimension is the number of frames in the sequence (e.g. `nInputFrame`), the last dimenstion @@ -35,7 +36,7 @@ of dimensions, although the size of each dimension may change. These are commonl Note: The [LookupTable](#nn.LookupTable) is special in that while it does output a temporal Tensor of size `nOutputFrame x outputFrameSize`, its input is a 1D Tensor of indices of size `nIndices`. Again, this is excluding the option first batch dimension. - + ## TemporalConvolution ## ```lua @@ -121,7 +122,7 @@ which gives: -0.63871422284166 ``` - + ## TemporalMaxPooling ## ```lua @@ -139,7 +140,7 @@ If the input sequence is a 2D tensor of dimension `nInputFrame x inputFrameSize` nOutputFrame = (nInputFrame - kW) / dW + 1 ``` - + ## TemporalSubSampling ## ```lua @@ -175,7 +176,7 @@ The output value of the layer can be precisely described as: output[i][t] = bias[i] + weight[i] * sum_{k=1}^kW input[i][dW*(t-1)+k)] ``` - + ## LookupTable ## ```lua @@ -253,13 +254,13 @@ Outputs something like: [torch.DoubleTensor of dimension 2x4x3] ``` - + ## Spatial Modules ## Excluding and optional batch dimension, spatial layers expect a 3D Tensor as input. The first dimension is the number of features (e.g. `frameSize`), the last two dimenstions are spatial (e.g. `height x width`). These are commonly used for processing images. - + ### SpatialConvolution ### ```lua @@ -303,7 +304,7 @@ output[i][j][k] = bias[k] ``` - + ### SpatialConvolutionMap ### ```lua @@ -317,7 +318,7 @@ connection table between input and output features. The using a [full connection table](#nn.tables.full). One can specify different types of connection tables. - + #### Full Connection Table #### ```lua @@ -327,7 +328,7 @@ table = nn.tables.full(nin,nout) This is a precomputed table that specifies connections between every input and output node. - + #### One to One Connection Table #### ```lua @@ -337,7 +338,7 @@ table = nn.tables.oneToOne(n) This is a precomputed table that specifies a single connection to each output node from corresponding input node. - + #### Random Connection Table #### ```lua @@ -348,7 +349,7 @@ This table is randomly populated such that each output unit has `nto` incoming connections. The algorihtm tries to assign uniform number of outgoing connections to each input node if possible. - + ### SpatialLPPooling ### ```lua @@ -357,7 +358,7 @@ module = nn.SpatialLPPooling(nInputPlane, pnorm, kW, kH, [dW], [dH]) Computes the `p` norm in a convolutional manner on a set of 2D input planes. - + ### SpatialMaxPooling ### ```lua @@ -379,7 +380,7 @@ oheight = op((height + 2*padH - kH) / dH + 1) `op` is a rounding operator. By default, it is `floor`. It can be changed by calling `:ceil()` or `:floor()` methods. - + ### SpatialAveragePooling ### ```lua @@ -390,7 +391,7 @@ Applies 2D average-pooling operation in `kWxkH` regions by step size `dWxdH` steps. The number of output features is equal to the number of input planes. - + ### SpatialAdaptiveMaxPooling ### ```lua @@ -413,7 +414,7 @@ y_i_start = floor((i /oheight) * iheight) y_i_end = ceil(((i+1)/oheight) * iheight) ``` - + ### SpatialSubSampling ### ```lua @@ -454,7 +455,7 @@ output[i][j][k] = bias[k] + weight[k] sum_{s=1}^kW sum_{t=1}^kH input[dW*(i-1)+s)][dH*(j-1)+t][k] ``` - + ### SpatialUpSamplingNearest ### ```lua @@ -475,7 +476,7 @@ output(u,v) = input(floor((u-1)/scale)+1, floor((v-1)/scale)+1) Where `u` and `v` are index from 1 (as per lua convention). There are no learnable parameters. - + ### SpatialZeroPadding ### ```lua @@ -485,7 +486,7 @@ module = nn.SpatialZeroPadding(padLeft, padRight, padTop, padBottom) Each feature map of a given input is padded with specified number of zeros. If padding values are negative, then input is cropped. - + ### SpatialSubtractiveNormalization ### ```lua @@ -522,7 +523,7 @@ w2=image.display(processed) ``` ![](image/lena.jpg)![](image/lenap.jpg) - + ## SpatialBatchNormalization ## `module` = `nn.SpatialBatchNormalization(N [,eps] [, momentum] [,affine])` @@ -565,13 +566,13 @@ A = torch.randn(b, m, h, w) C = model.forward(A) -- C will be of size `b x m x h x w` ``` - + ## Volumetric Modules ## Excluding and optional batch dimension, volumetric layers expect a 4D Tensor as input. The first dimension is the number of features (e.g. `frameSize`), the second is sequential (e.g. `time`) and the last two dimenstions are spatial (e.g. `height x width`). These are commonly used for processing videos (sequences of images). - + ### VolumetricConvolution ### ```lua @@ -608,7 +609,7 @@ size `nOutputPlane x nInputPlane x kT x kH x kW`) and `self.bias` (Tensor of size `nOutputPlane`). The corresponding gradients can be found in `self.gradWeight` and `self.gradBias`. - + ### VolumetricMaxPooling ### ```lua @@ -619,7 +620,7 @@ Applies 3D max-pooling operation in `kTxkWxkH` regions by step size `dTxdWxdH` steps. The number of output features is equal to the number of input planes / dT. - + ### VolumetricAveragePooling ### ```lua diff --git a/doc/criterion.md b/doc/criterion.md index 64e6d6326..4f89338c9 100755 --- a/doc/criterion.md +++ b/doc/criterion.md @@ -1,4 +1,4 @@ - + # Criterions # [`Criterions`](#nn.Criterion) are helpful to train a neural network. Given an input and a @@ -24,13 +24,13 @@ target, they compute a gradient according to a given loss function. * [`ParallelCriterion`](#nn.ParallelCriterion) : a weighted sum of other criterions each applied to a different input and target; * [`MarginRankingCriterion`](#nn.MarginRankingCriterion): ranks two inputs; - + ## Criterion ## This is an abstract class which declares methods defined in all criterions. This class is [serializable](https://github.com/torch/torch7/blob/master/doc/file.md#serialization-methods). - + ### [output] forward(input, target) ### Given an `input` and a `target`, compute the loss function associated to the criterion and return the result. @@ -41,7 +41,7 @@ The `output` returned should be a scalar in general. The state variable [`self.output`](#nn.Criterion.output) should be updated after a call to `forward()`. - + ### [gradInput] backward(input, target) ### Given an `input` and a `target`, compute the gradients of the loss function associated to the criterion and return the result. @@ -50,19 +50,19 @@ In general `input`, `target` and `gradInput` are [`Tensor`s](..:torch:tensor), b The state variable [`self.gradInput`](#nn.Criterion.gradInput) should be updated after a call to `backward()`. - + ### State variable: output ### State variable which contains the result of the last [`forward(input, target)`](#nn.Criterion.forward) call. - + ### State variable: gradInput ### State variable which contains the result of the last [`backward(input, target)`](#nn.Criterion.backward) call. - + ## AbsCriterion ## ```lua @@ -85,7 +85,7 @@ criterion.sizeAverage = false ``` - + ## ClassNLLCriterion ## ```lua @@ -128,7 +128,7 @@ end ``` - + ## CrossEntropyCriterion ## ```lua @@ -157,7 +157,7 @@ loss(x, class) = weights[class] * (-x[class] + log(\sum_j exp(x[j]))) ``` - + ## DistKLDivCriterion ## ```lua @@ -177,7 +177,7 @@ loss(x, target) = \sum(target_i * (log(target_i) - x_i)) ``` - + ## BCECriterion ```lua @@ -193,7 +193,7 @@ loss(t, o) = -(t * log(o) + (1 - t) * log(1 - o)) This is used for measuring the error of a reconstruction in for example an auto-encoder. - + ## MarginCriterion ## ```lua @@ -256,7 +256,7 @@ gives the output: i.e. the mlp successfully separates the two data points such that they both have a `margin` of `1`, and hence a loss of `0`. - + ## MultiMarginCriterion ## ```lua @@ -281,7 +281,7 @@ mlp:add(nn.MulConstant(-1)) -- distance to similarity ``` - + ## MultiLabelMarginCriterion ## ```lua @@ -309,7 +309,7 @@ criterion:forward(input, target) ``` - + ## MSECriterion ## ```lua @@ -333,7 +333,7 @@ criterion.sizeAverage = false ``` - + ## MultiCriterion ## ```lua @@ -360,7 +360,7 @@ mc = nn.MultiCriterion():add(nll, 0.5):add(nll2) output = mc:forward(input, target) ``` - + ## ParallelCriterion ## ```lua @@ -390,7 +390,7 @@ output = pc:forward(input, target) ``` - + ## HingeEmbeddingCriterion ## ```lua @@ -469,7 +469,7 @@ end ``` - + ## L1HingeEmbeddingCriterion ## ```lua @@ -486,7 +486,7 @@ loss(x, y) = ⎨ The `margin` has a default value of `1`, or can be set in the constructor. - + ## CosineEmbeddingCriterion ## ```lua @@ -508,7 +508,7 @@ loss(x, y) = ⎨ ``` - + ## MarginRankingCriterion ## ```lua diff --git a/doc/index.md b/doc/index.md new file mode 100644 index 000000000..5c3616673 --- /dev/null +++ b/doc/index.md @@ -0,0 +1,23 @@ +[![Build Status](https://travis-ci.org/torch/nn.svg?branch=master)](https://travis-ci.org/torch/nn) + +# Neural Network Package # + +This package provides an easy and modular way to build and train simple or complex neural networks using [Torch](https://github.com/torch/torch7/blob/master/README.md): + + * Modules are the bricks used to build neural networks. Each are themselves neural networks, but can be combined with other networks using containers to create complex neural networks: + * [Module](module.md#nn.Module) : abstract class inherited by all modules; + * [Containers](containers.md#nn.Containers) : container classes like [Sequential](containers.md#nn.Sequential), [Parallel](containers.md#nn.Parallel) and [Concat](containers.md#nn.Concat); + * [Transfer functions](transfer.md#nn.transfer.dok) : non-linear functions like [Tanh](transfer.md#nn.Tanh) and [Sigmoid](transfer.md#nn.Sigmoid); + * [Simple layers](simple.md#nn.simplelayers.dok) : like [Linear](simple.md#nn.Linear), [Mean](simple.md#nn.Mean), [Max](simple.md#nn.Max) and [Reshape](simple.md#nn.Reshape); + * [Table layers](table.md#nn.TableLayers) : layers for manipulating tables like [SplitTable](table.md#nn.SplitTable), [ConcatTable](table.md#nn.ConcatTable) and [JoinTable](table.md#nn.JoinTable); + * [Convolution layers](convolution.md#nn.convlayers.dok) : [Temporal](convolution.md#nn.TemporalModules), [Spatial](convolution.md#nn.SpatialModules) and [Volumetric](convolution.md#nn.VolumetricModules) convolutions ; + * Criterions compute a gradient according to a given loss function given an input and a target: + * [Criterions](criterion.md#nn.Criterions) : a list of all criterions, including [Criterion](criterion.md#nn.Criterion), the abstract class; + * [MSECriterion](criterion.md#nn.MSECriterion) : the Mean Squared Error criterion used for regression; + * [ClassNLLCriterion](criterion.md#nn.ClassNLLCriterion) : the Negative Log Likelihood criterion used for classification; + * Additional documentation : + * [Overview](overview.md#nn.overview.dok) of the package essentials including modules, containers and training; + * [Training](training.md#nn.traningneuralnet.dok) : how to train a neural network using [StochasticGradient](training.md#nn.StochasticGradient); + * [Testing](testing.md) : how to test your modules. + * [Experimental Modules](https://github.com/clementfarabet/lua---nnx/blob/master/README.md) : a package containing experimental modules and criteria. + diff --git a/doc/module.md b/doc/module.md index 50090c421..97e14a07c 100755 --- a/doc/module.md +++ b/doc/module.md @@ -1,4 +1,4 @@ - + ## Module ## `Module` is an abstract class which defines fundamental methods necessary @@ -7,7 +7,7 @@ for a training a neural network. Modules are [serializable](https://github.com/t Modules contain two states variables: [output](#output) and [gradInput](#gradinput). - + ### [output] forward(input) ### Takes an `input` object, and computes the corresponding `output` of the @@ -24,7 +24,7 @@ implement [updateOutput(input)](#nn.Module.updateOutput) function. The forward module in the abstract parent class [Module](#nn.Module) will call `updateOutput(input)`. - + ### [gradInput] backward(input, gradOutput) ### Performs a _backpropagation step_ through the module, with respect to the @@ -52,14 +52,14 @@ is better to override [accGradParameters(input, gradOutput,scale)](#nn.Module.accGradParameters) functions. - + ### updateOutput(input) ### Computes the output using the current parameter set of the class and input. This function returns the result which is stored in the [output](#output) field. - + ### updateGradInput(input, gradOutput) ### Computing the gradient of the module with respect to its own @@ -67,7 +67,7 @@ input. This is returned in `gradInput`. Also, the [gradInput](#gradinput) state variable is updated accordingly. - + ### accGradParameters(input, gradOutput, scale) ### Computing the gradient of the module with respect to its @@ -83,7 +83,7 @@ Zeroing this accumulation is achieved with the parameters according to this accumulation is done with [updateParameters()](#nn.Module.updateParameters). - + ### zeroGradParameters() ### If the module has parameters, this will zero the accumulation of the @@ -91,7 +91,7 @@ gradients with respect to these parameters, accumulated through [accGradParameters(input, gradOutput,scale)](#nn.Module.accGradParameters) calls. Otherwise, it does nothing. - + ### updateParameters(learningRate) ### If the module has parameters, this will update these parameters, according @@ -104,7 +104,7 @@ parameters = parameters - learningRate * gradients_wrt_parameters ``` If the module does not have parameters, it does nothing. - + ### accUpdateGradParameters(input, gradOutput, learningRate) ### This is a convenience module that performs two functions at @@ -136,7 +136,7 @@ As it can be seen, the gradients are accumulated directly into weights. This assumption may not be true for a module that computes a nonlinear operation. - + ### share(mlp,s1,s2,...,sn) ### This function modifies the parameters of the module named @@ -174,7 +174,7 @@ print(mlp2:get(1).bias[1]) ``` - + ### clone(mlp,...) ### Creates a deep copy of (i.e. not just a pointer to) the module, @@ -205,29 +205,29 @@ print(mlp2:get(1).bias[1]) ``` - + ### type(type) ### This function converts all the parameters of a module to the given `type`. The `type` can be one of the types defined for [torch.Tensor](https://github.com/torch/torch7/blob/master/doc/tensor.md). - + ### float() ### Convenience method for calling [module:type('torch.FloatTensor')](#nn.Module.type) - + ### double() ### Convenience method for calling [module:type('torch.DoubleTensor')](#nn.Module.type) - + ### cuda() ### Convenience method for calling [module:type('torch.CudaTensor')](#nn.Module.type) - + ### State Variables ### These state variables are useful objects if one wants to check the guts of @@ -240,13 +240,13 @@ However, some special sub-classes like [table layers](table.md#nn.TableLayers) contain something else. Please, refer to each module specification for further information. - + #### output #### This contains the output of the module, computed with the last call of [forward(input)](#nn.Module.forward). - + #### gradInput #### This contains the gradients with respect to the inputs of the module, computed with the last call of @@ -258,7 +258,7 @@ Some modules contain parameters (the ones that we actually want to train!). The name of these parameters, and gradients w.r.t these parameters are module dependent. - + ### [{weights}, {gradWeights}] parameters() ### This function should returns two tables. One for the learnable @@ -268,7 +268,7 @@ wrt to the learnable parameters `{gradWeights}`. Custom modules should override this function if they use learnable parameters that are stored in tensors. - + ### [flatParameters, flatGradParameters] getParameters() ### This function returns two tensors. One for the flattened learnable @@ -279,15 +279,15 @@ Custom modules should not override this function. They should instead override [ This function will go over all the weights and gradWeights and make them view into a single tensor (one for weights and one for gradWeights). Since the storage of every weight and gradWeight is changed, this function should be called only once on a given network. - + ### training() ### This sets the mode of the Module (or sub-modules) to `train=true`. This is useful for modules like [Dropout](simple.md#nn.Dropout) that have a different behaviour during training vs evaluation. - + ### evaluate() ### This sets the mode of the Module (or sub-modules) to `train=false`. This is useful for modules like [Dropout](simple.md#nn.Dropout) that have a different behaviour during training vs evaluation. - + ### findModules(typename) ### Find all instances of modules in the network of a certain `typename`. It returns a flattened list of the matching nodes, as well as a flattened list of the container modules for each matching node. @@ -331,7 +331,7 @@ for i = 1, #threshold_nodes do end ``` - + ### listModules() ### List all Modules instances in a network. Returns a flattened list of modules, diff --git a/doc/overview.md b/doc/overview.md index c9eedaebc..6aec32176 100644 --- a/doc/overview.md +++ b/doc/overview.md @@ -1,4 +1,4 @@ - + # Overview # Each module of a network is composed of [Modules](module.md#nn.Modules) and there @@ -23,31 +23,35 @@ easy with a simple for loop to [train a neural network yourself](training.md#nn. ## Detailed Overview ## This section provides a detailed overview of the neural network package. First the omnipresent [Module](#nn.overview.module) is examined, followed by some examples for [combining modules](#nn.overview.plugandplay) together. The last part explores facilities for [training a neural network](#nn.overview.training). - + ### Module ### A neural network is called a [Module](module.md#nn.Module) (or simply _module_ in this documentation) in Torch. `Module` is an abstract class which defines four main methods: + * [forward(input)](module.md#nn.Module.forward) which computes the output of the module given the `input` [Tensor](https://github.com/torch/torch7/blob/master/doc/tensor.md). * [backward(input, gradOutput)](module.md#nn.Module.backward) which computes the gradients of the module with respect to its own parameters, and its own inputs. * [zeroGradParameters()](module.md#nn.Module.zeroGradParameters) which zeroes the gradient with respect to the parameters of the module. * [updateParameters(learningRate)](module.md#nn.Module.updateParameters) which updates the parameters after one has computed the gradients with `backward()` It also declares two members: + * [output](module.md#nn.Module.output) which is the output returned by `forward()`. * [gradInput](module.md#nn.Module.gradInput) which contains the gradients with respect to the input of the module, computed in a `backward()`. Two other perhaps less used but handy methods are also defined: + * [share(mlp,s1,s2,...,sn)](module.md#nn.Module.share) which makes this module share the parameters s1,..sn of the module `mlp`. This is useful if you want to have modules that share the same weights. * [clone(...)](module.md#nn.Module.clone) which produces a deep copy of (i.e. not just a pointer to) this Module, including the current state of its parameters (if any). Some important remarks: + * `output` contains only valid values after a [forward(input)](module.md#nn.Module.forward). * `gradInput` contains only valid values after a [backward(input, gradOutput)](module.md#nn.Module.backward). * [backward(input, gradOutput)](module.md#nn.Module.backward) uses certain computations obtained during [forward(input)](module.md#nn.Module.forward). You _must_ call `forward()` before calling a `backward()`, on the _same_ `input`, or your gradients are going to be incorrect! - + ### Plug and play ### Building a simple neural network can be achieved by constructing an available layer. @@ -75,7 +79,7 @@ Of course, `Sequential` and `Concat` can contains other networks you ever dreamt of! See the [[#nn.Modules|complete list of available modules]]. - + ### Training a neural network ### Once you built your neural network, you have to choose a particular @@ -114,7 +118,7 @@ are implemented. [See an example](containers.md#nn.DoItStochasticGradient). to cut-and-paste it and create a variant to it adapted to your needs (if the constraints of `StochasticGradient` do not satisfy you). - + #### Low Level Training #### If you want to program the `StochasticGradient` by hand, you diff --git a/doc/simple.md b/doc/simple.md index 6ef7ed28a..bc4881b4b 100755 --- a/doc/simple.md +++ b/doc/simple.md @@ -1,6 +1,7 @@ - + # Simple layers # Simple Modules are used for various tasks like adapting Tensor methods and providing affine transformations : + * Parameterized Modules : * [Linear](#nn.Linear) : a linear transformation ; * [SparseLinear](#nn.SparseLinear) : a linear transformation with sparse inputs ; @@ -36,7 +37,7 @@ Simple Modules are used for various tasks like adapting Tensor methods and provi * [Padding](#nn.Padding) : adds padding to a dimension ; * [L1Penalty](#nn.L1Penalty) : adds an L1 penalty to an input (for sparsity); - + ## Linear ## ```lua @@ -79,7 +80,7 @@ x = torch.Tensor(10) -- 10 inputs y = module:forward(x) ``` - + ## SparseLinear ## ```lua @@ -113,7 +114,7 @@ x = torch.Tensor({ {1, 0.1}, {2, 0.3}, {10, 0.3}, {31, 0.2} }) The first column contains indices, the second column contains values in a a vector where all other elements are zeros. The indices should not exceed the stated dimensions of the input to the layer (10000 in the example). - + ## Dropout ## ```lua @@ -183,7 +184,7 @@ We can return to training our model by first calling [Module:training()](module. When used, `Dropout` should normally be applied to the input of parameterized [Modules](module.md#nn.Module) like [Linear](#nn.Linear) or [SpatialConvolution](convolution.md#nn.SpatialConvolution). A `p` of `0.5` (the default) is usually okay for hidden layers. `Dropout` can sometimes be used successfully on the dataset inputs with a `p` around `0.2`. It sometimes works best following [Transfer](transfer.md) Modules like [ReLU](transfer.md#nn.ReLU). All this depends a great deal on the dataset so its up to the user to try different combinations. - + ## SpatialDropout ## `module` = `nn.SpatialDropout(p)` @@ -194,7 +195,7 @@ As described in the paper "Efficient Object Localization Using Convolutional Net ```nn.SpatialDropout``` accepts 3D or 4D inputs. If the input is 3D than a layout of (features x height x width) is assumed and for 4D (batch x features x height x width) is assumed. - + ## Abs ## ```lua @@ -214,7 +215,7 @@ gnuplot.grid(true) ![](image/abs.png) - + ## Add ## ```lua @@ -264,7 +265,7 @@ gives the output: i.e. the network successfully learns the input `x` has been shifted to produce the output `y`. - + ## Mul ## ```lua @@ -309,7 +310,7 @@ gives the output: i.e. the network successfully learns the input `x` has been scaled by pi. - + ## CMul ## ```lua @@ -362,7 +363,7 @@ gives the output: i.e. the network successfully learns the input `x` has been scaled by those scaling factors to produce the output `y`. - + ## Max ## ```lua @@ -373,7 +374,7 @@ Applies a max operation over dimension `dimension`. Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` then an `nxq` matrix would be output. - + ## Min ## ```lua @@ -384,7 +385,7 @@ Applies a min operation over dimension `dimension`. Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` then an `nxq` matrix would be output. - + ## Mean ## ```lua @@ -394,7 +395,7 @@ module = nn.Mean(dimension) Applies a mean operation over dimension `dimension`. Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` then an `nxq` matrix would be output. - + ## Sum ## ```lua @@ -405,7 +406,7 @@ Applies a sum operation over dimension `dimension`. Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` then an `nxq` matrix would be output. - + ## Euclidean ## ```lua @@ -416,7 +417,7 @@ Outputs the Euclidean distance of the input to `outputSize` centers, i.e. this l The distance `y_j` between center `j` and input `x` is formulated as `y_j = || w_j - x ||`. - + ## WeightedEuclidean ## ```lua @@ -429,7 +430,7 @@ In other words, for each of the `outputSize` centers `w_j`, there is a diagonal The distance `y_j` between center `j` and input `x` is formulated as `y_j = || c_j * (w_j - x) ||`. - + ## Identity ## ```lua @@ -488,7 +489,7 @@ for i = 1, 100 do -- Do a few training iterations end ``` - + ## Copy ## ```lua @@ -498,7 +499,7 @@ module = nn.Copy(inputType, outputType, [forceCopy, dontCast]) This layer copies the input to output with type casting from input type from `inputType` to `outputType`. Unless `forceCopy` is true, when the first two arguments are the same, the input isn't copied, only transfered as the output. The default `forceCopy` is false. When `dontCast` is true, a call to `nn.Copy:type(type)` will not cast the module's `output` and `gradInput` Tensors to the new type. The default is false. - + ## Narrow ## ```lua @@ -507,7 +508,7 @@ module = nn.Narrow(dimension, offset, length) Narrow is application of [narrow](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-narrowdim-index-size) operation in a module. - + ## Replicate ## ```lua @@ -552,7 +553,7 @@ This allows the module to replicate the same non-batch dimension `dim` for both ``` - + ## Reshape ## ```lua @@ -640,7 +641,7 @@ Example: ``` - + ## View ## ```lua @@ -723,7 +724,7 @@ Example 2: [torch.LongStorage of size 2] ``` - + ## Select ## ```lua @@ -798,7 +799,7 @@ for i = 1, 10000 do -- Train for a few iterations end ``` - + ## Exp ## ```lua @@ -820,7 +821,7 @@ gnuplot.grid(true) ![](image/exp.png) - + ## Square ## ```lua @@ -842,7 +843,7 @@ gnuplot.grid(true) ![](image/square.png) - + ## Sqrt ## ```lua @@ -864,7 +865,7 @@ gnuplot.grid(true) ![](image/sqrt.png) - + ## Power ## ```lua @@ -886,7 +887,7 @@ gnuplot.grid(true) ![](image/power.png) - + ## MM ## ```lua @@ -905,7 +906,7 @@ C = model.forward({A, B}) -- C will be of size `b x m x n` ``` - + ## BatchNormalization ## ```lua @@ -945,7 +946,7 @@ A = torch.randn(b, m) C = model.forward(A) -- C will be of size `b x m` ``` - + ## Padding ## `module` = `nn.Padding(dim, pad [, nInputDim, value])` @@ -978,7 +979,7 @@ module:forward(torch.randn(2, 3)) --batch input ``` - + ## L1Penalty ## ```lua diff --git a/doc/table.md b/doc/table.md index 91ea209c9..221e4c37b 100755 --- a/doc/table.md +++ b/doc/table.md @@ -1,8 +1,9 @@ - + # Table Layers # This set of modules allows the manipulation of `table`s through the layers of a neural network. This allows one to build very rich architectures: + * `table` Container Modules encapsulate sub-Modules: * [`ConcatTable`](#nn.ConcatTable): applies each member module to the same input [`Tensor`](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor) and outputs a `table`; * [`ParallelTable`](#nn.ParallelTable): applies the `i`-th member module to the `i`-th input and outputs a `table`; @@ -35,7 +36,7 @@ pred = mlp:forward(t) pred = mlp:forward{x, y, z} -- This is equivalent to the line before ``` - + ## ConcatTable ## ```lua @@ -115,7 +116,7 @@ which gives the output (using [th](https://github.com/torch/trepl)): ``` - + ## ParallelTable ## ```lua @@ -164,7 +165,7 @@ which gives the output: ``` - + ## SplitTable ## ```lua @@ -399,7 +400,7 @@ end ``` - + ## JoinTable ## ```lua @@ -534,7 +535,7 @@ end ``` - + ## MixtureTable ## `module` = `MixtureTable([dim])` @@ -632,7 +633,7 @@ Forwarding a batch of 2 examples gives us something like this: ``` - + ## SelectTable ## `module` = `SelectTable(index)` @@ -725,7 +726,7 @@ Example 2: ``` - + ## NarrowTable ## `module` = `NarrowTable(offset [, length])` @@ -765,7 +766,7 @@ Example: ``` - + ## FlattenTable ## `module` = `FlattenTable()` @@ -802,7 +803,7 @@ gives the output: } ``` - + ## PairwiseDistance ## `module` = `PairwiseDistance(p)` creates a module that takes a `table` of two vectors as input and outputs the distance between them using the `p`-norm. @@ -885,7 +886,7 @@ end ``` - + ## DotProduct ## `module` = `DotProduct()` creates a module that takes a `table` of two vectors as input and outputs the dot product between them. @@ -978,7 +979,7 @@ end ``` - + ## CosineDistance ## `module` = `CosineDistance()` creates a module that takes a `table` of two vectors (or matrices if in batch mode) as input and outputs the cosine distance between them. @@ -1065,7 +1066,7 @@ end - + ## CriterionTable ## `module` = `CriterionTable(criterion)` @@ -1115,7 +1116,7 @@ for i = 1, 20 do -- Train for a few iterations end ``` - + ## CAddTable ## Takes a `table` of `Tensor`s and outputs summation of all `Tensor`s. @@ -1157,7 +1158,7 @@ m = nn.CAddTable() ``` - + ## CSubTable ## Takes a `table` with two `Tensor` and returns the component-wise @@ -1174,7 +1175,7 @@ m = nn.CSubTable() [torch.DoubleTensor of dimension 5] ``` - + ## CMulTable ## Takes a `table` of `Tensor`s and outputs the multiplication of all of them. @@ -1192,7 +1193,7 @@ m = nn.CMulTable() ``` - + ## CDivTable ## Takes a `table` with two `Tensor` and returns the component-wise diff --git a/doc/training.md b/doc/training.md index 016c7c1ca..1a126d3e1 100644 --- a/doc/training.md +++ b/doc/training.md @@ -1,4 +1,4 @@ - + # Training a neural network # Training a neural network is easy with a [simple `for` loop](#nn.DoItYourself). @@ -7,19 +7,19 @@ want sometimes a quick way of training neural networks. [StochasticGradient](#nn.StochasticGradient), a simple class which does the job for you is provided as standard. - + ## StochasticGradient ## `StochasticGradient` is a high-level class for training [neural networks](#nn.Module), using a stochastic gradient algorithm. This class is [serializable](https://github.com/torch/torch7/blob/master/doc/serialization.md#serialization). - + ### StochasticGradient(module, criterion) ### Create a `StochasticGradient` class, using the given [Module](module.md#nn.Module) and [Criterion](criterion.md#nn.Criterion). The class contains [several parameters](#nn.StochasticGradientParameters) you might want to set after initialization. - + ### train(dataset) ### Train the module and criterion given in the @@ -42,7 +42,7 @@ Such a dataset is easily constructed by using Lua tables, but it could any `C` o for example, as long as required operators/methods are implemented. [See an example](#nn.DoItStochasticGradient). - + ### Parameters ### `StochasticGradient` has several field which have an impact on a call to [train()](#nn.StochasticGradientTrain). @@ -54,7 +54,7 @@ for example, as long as required operators/methods are implemented. * `hookExample`: A possible hook function which will be called (if non-nil) during training after each example forwarded and backwarded through the network. The function takes `(self, example)` as parameters. Default is `nil`. * `hookIteration`: A possible hook function which will be called (if non-nil) during training after a complete pass over the dataset. The function takes `(self, iteration)` as parameters. Default is `nil`. - + ## Example of training using StochasticGradient ## We show an example here on a classical XOR problem. @@ -134,7 +134,7 @@ You should see something like: [torch.Tensor of dimension 1] ``` - + ## Example of manual training of a neural network ## We show an example here on a classical XOR problem. diff --git a/doc/transfer.md b/doc/transfer.md index c03017de1..6b3be00de 100755 --- a/doc/transfer.md +++ b/doc/transfer.md @@ -1,8 +1,8 @@ - + # Transfer Function Layers # Transfer functions are normally used to introduce a non-linearity after a parameterized layer like [Linear](simple.md#nn.Linear) and [SpatialConvolution](convolution.md#nn.SpatialConvolution). Non-linearities allows for dividing the problem space into more complex regions than what a simple logistic regressor would permit. - + ## HardTanh ## Applies the `HardTanh` function element-wise to the input Tensor, @@ -26,7 +26,7 @@ gnuplot.grid(true) ![](image/htanh.png) - + ## HardShrink ## `module = nn.HardShrink(lambda)` @@ -51,7 +51,7 @@ gnuplot.grid(true) ``` ![](image/hshrink.png) - + ## SoftShrink ## `module = nn.SoftShrink(lambda)` @@ -77,7 +77,7 @@ gnuplot.grid(true) ![](image/sshrink.png) - + ## SoftMax ## Applies the `Softmax` function to an n-dimensional input Tensor, @@ -99,7 +99,7 @@ gnuplot.grid(true) Note that this module doesn't work directly with [ClassNLLCriterion](criterion.md#nn.ClassNLLCriterion), which expects the `nn.Log` to be computed between the `SoftMax` and itself. Use [LogSoftMax](#nn.LogSoftMax) instead (it's faster). - + ## SoftMin ## Applies the `Softmin` function to an n-dimensional input Tensor, @@ -119,7 +119,7 @@ gnuplot.grid(true) ``` ![](image/softmin.png) - + ### SoftPlus ### Applies the `SoftPlus` function to an n-dimensioanl input Tensor. @@ -138,7 +138,7 @@ gnuplot.grid(true) ``` ![](image/softplus.png) - + ## SoftSign ## Applies the `SoftSign` function to an n-dimensioanl input Tensor. @@ -156,7 +156,7 @@ gnuplot.grid(true) ``` ![](image/softsign.png) - + ## LogSigmoid ## Applies the `LogSigmoid` function to an n-dimensional input Tensor. @@ -176,7 +176,7 @@ gnuplot.grid(true) ![](image/logsigmoid.png) - + ## LogSoftMax ## Applies the `LogSoftmax` function to an n-dimensional input Tensor. @@ -195,7 +195,7 @@ gnuplot.grid(true) ``` ![](image/logsoftmax.png) - + ## Sigmoid ## Applies the `Sigmoid` function element-wise to the input Tensor, @@ -214,7 +214,7 @@ gnuplot.grid(true) ``` ![](image/sigmoid.png) - + ## Tanh ## Applies the `Tanh` function element-wise to the input Tensor, @@ -231,7 +231,7 @@ gnuplot.grid(true) ``` ![](image/tanh.png) - + ## ReLU ## Applies the rectified linear unit (`ReLU`) function element-wise to the input Tensor, @@ -253,7 +253,7 @@ gnuplot.grid(true) ``` ![](image/relu.png) - + ## PReLU ## Applies parametric ReLU, which parameter varies the slope of the negative part: @@ -267,7 +267,7 @@ Note that weight decay should not be used on it. For reference see http://arxiv. ![](image/prelu.png) - + ## AddConstant ## Adds a (non-learnable) scalar constant. This module is sometimes useful for debuggging purposes: `f(x)` = `x + k`, where `k` is a scalar. @@ -278,7 +278,7 @@ m=nn.AddConstant(k,true) -- true = in-place, false = keeping separate state. ``` In-place mode restores the original input value after the backward pass, allowing it's use after other in-place modules, like [MulConstant](#nn.MulConstant). - + ## MulConstant ## Multiplies input tensor by a (non-learnable) scalar constant. This module is sometimes useful for debuggging purposes: `f(x)` = `k * x`, where `k` is a scalar. diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 000000000..f38456dca --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,18 @@ +site_name: nn +theme : simplex +repo_url : https://github.com/torch/nn +use_directory_urls : false +markdown_extensions: [extra] +docs_dir : doc +pages: +- [index.md, Home] +- [module.md, Modules, Module Interface] +- [containers.md, Modules, Containers] +- [transfer.md, Modules, Transfer Functions] +- [simple.md, Modules, Simple Layers] +- [table.md, Modules, Table Layers] +- [convolution.md, Modules, Convolution Layers] +- [criterion.md, Criterion, Criterions] +- [overview.md, Additional Documentation, Overview] +- [training.md, Additional Documentation, Training] +- [testing.md, Additional Documentation, Testing] From 0e05ac975476fff3ecf75894595d60ba04b5e0d6 Mon Sep 17 00:00:00 2001 From: nicholas-leonard Date: Mon, 10 Aug 2015 22:15:15 -0400 Subject: [PATCH 2/2] fix lists --- doc/containers.md | 10 +++---- doc/convolution.md | 40 +++++++++++++-------------- doc/criterion.md | 38 +++++++++++++------------- doc/simple.md | 68 +++++++++++++++++++++++----------------------- doc/table.md | 42 ++++++++++++++-------------- 5 files changed, 99 insertions(+), 99 deletions(-) diff --git a/doc/containers.md b/doc/containers.md index 8d02ab96b..9a8360761 100644 --- a/doc/containers.md +++ b/doc/containers.md @@ -2,11 +2,11 @@ # Containers # Complex neural networks are easily built using container classes: - * [Container](#nn.Container) : abstract class inherited by containers ; - * [Sequential](#nn.Sequential) : plugs layers in a feed-forward fully connected manner ; - * [Parallel](#nn.Parallel) : applies its `ith` child module to the `ith` slice of the input Tensor ; - * [Concat](#nn.Concat) : concatenates in one layer several modules along dimension `dim` ; - * [DepthConcat](#nn.DepthConcat) : like Concat, but adds zero-padding when non-`dim` sizes don't match; + * [Container](#nn.Container) : abstract class inherited by containers ; + * [Sequential](#nn.Sequential) : plugs layers in a feed-forward fully connected manner ; + * [Parallel](#nn.Parallel) : applies its `ith` child module to the `ith` slice of the input Tensor ; + * [Concat](#nn.Concat) : concatenates in one layer several modules along dimension `dim` ; + * [DepthConcat](#nn.DepthConcat) : like Concat, but adds zero-padding when non-`dim` sizes don't match; See also the [Table Containers](#nn.TableContainers) for manipulating tables of [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md). diff --git a/doc/convolution.md b/doc/convolution.md index 4f716c639..54b8da9cd 100755 --- a/doc/convolution.md +++ b/doc/convolution.md @@ -3,28 +3,28 @@ A convolution is an integral that expresses the amount of overlap of one function `g` as it is shifted over another function `f`. It therefore "blends" one function with another. The neural network package supports convolution, pooling, subsampling and other relevant facilities. These are divided base on the dimensionality of the input and output [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor): - * [Temporal Modules](#nn.TemporalModules) apply to sequences with a one-dimensional relationship + * [Temporal Modules](#nn.TemporalModules) apply to sequences with a one-dimensional relationship (e.g. sequences of words, phonemes and letters. Strings of some kind). - * [TemporalConvolution](#nn.TemporalConvolution) : a 1D convolution over an input sequence ; - * [TemporalSubSampling](#nn.TemporalSubSampling) : a 1D sub-sampling over an input sequence ; - * [TemporalMaxPooling](#nn.TemporalMaxPooling) : a 1D max-pooling operation over an input sequence ; - * [LookupTable](#nn.LookupTable) : a convolution of width `1`, commonly used for word embeddings ; - * [Spatial Modules](#nn.SpatialModules) apply to inputs with two-dimensional relationships (e.g. images): - * [SpatialConvolution](#nn.SpatialConvolution) : a 2D convolution over an input image ; - * [SpatialSubSampling](#nn.SpatialSubSampling) : a 2D sub-sampling over an input image ; - * [SpatialMaxPooling](#nn.SpatialMaxPooling) : a 2D max-pooling operation over an input image ; - * [SpatialAveragePooling](#nn.SpatialAveragePooling) : a 2D average-pooling operation over an input image ; - * [SpatialAdaptiveMaxPooling](#nn.SpatialAdaptiveMaxPooling) : a 2D max-pooling operation which adapts its parameters dynamically such that the output is of fixed size ; - * [SpatialLPPooling](#nn.SpatialLPPooling) : computes the `p` norm in a convolutional manner on a set of input images ; - * [SpatialConvolutionMap](#nn.SpatialConvolutionMap) : a 2D convolution that uses a generic connection table ; - * [SpatialZeroPadding](#nn.SpatialZeroPadding) : padds a feature map with specified number of zeros ; - * [SpatialSubtractiveNormalization](#nn.SpatialSubtractiveNormalization) : a spatial subtraction operation on a series of 2D inputs using - * [SpatialBatchNormalization](#nn.SpatialBatchNormalization): mean/std normalization over the mini-batch inputs and pixels, with an optional affine transform that follows + * [TemporalConvolution](#nn.TemporalConvolution) : a 1D convolution over an input sequence ; + * [TemporalSubSampling](#nn.TemporalSubSampling) : a 1D sub-sampling over an input sequence ; + * [TemporalMaxPooling](#nn.TemporalMaxPooling) : a 1D max-pooling operation over an input sequence ; + * [LookupTable](#nn.LookupTable) : a convolution of width `1`, commonly used for word embeddings ; + * [Spatial Modules](#nn.SpatialModules) apply to inputs with two-dimensional relationships (e.g. images): + * [SpatialConvolution](#nn.SpatialConvolution) : a 2D convolution over an input image ; + * [SpatialSubSampling](#nn.SpatialSubSampling) : a 2D sub-sampling over an input image ; + * [SpatialMaxPooling](#nn.SpatialMaxPooling) : a 2D max-pooling operation over an input image ; + * [SpatialAveragePooling](#nn.SpatialAveragePooling) : a 2D average-pooling operation over an input image ; + * [SpatialAdaptiveMaxPooling](#nn.SpatialAdaptiveMaxPooling) : a 2D max-pooling operation which adapts its parameters dynamically such that the output is of fixed size ; + * [SpatialLPPooling](#nn.SpatialLPPooling) : computes the `p` norm in a convolutional manner on a set of input images ; + * [SpatialConvolutionMap](#nn.SpatialConvolutionMap) : a 2D convolution that uses a generic connection table ; + * [SpatialZeroPadding](#nn.SpatialZeroPadding) : padds a feature map with specified number of zeros ; + * [SpatialSubtractiveNormalization](#nn.SpatialSubtractiveNormalization) : a spatial subtraction operation on a series of 2D inputs using + * [SpatialBatchNormalization](#nn.SpatialBatchNormalization): mean/std normalization over the mini-batch inputs and pixels, with an optional affine transform that follows a kernel for computing the weighted average in a neighborhood ; - * [Volumetric Modules](#nn.VolumetricModules) apply to inputs with three-dimensional relationships (e.g. videos) : - * [VolumetricConvolution](#nn.VolumetricConvolution) : a 3D convolution over an input video (a sequence of images) ; - * [VolumetricMaxPooling](#nn.VolumetricMaxPooling) : a 3D max-pooling operation over an input video. - * [VolumetricAveragePooling](#nn.VolumetricAveragePooling) : a 3D average-pooling operation over an input video. + * [Volumetric Modules](#nn.VolumetricModules) apply to inputs with three-dimensional relationships (e.g. videos) : + * [VolumetricConvolution](#nn.VolumetricConvolution) : a 3D convolution over an input video (a sequence of images) ; + * [VolumetricMaxPooling](#nn.VolumetricMaxPooling) : a 3D max-pooling operation over an input video. + * [VolumetricAveragePooling](#nn.VolumetricAveragePooling) : a 3D average-pooling operation over an input video. ## Temporal Modules ## diff --git a/doc/criterion.md b/doc/criterion.md index 4f89338c9..292893874 100755 --- a/doc/criterion.md +++ b/doc/criterion.md @@ -4,25 +4,25 @@ [`Criterions`](#nn.Criterion) are helpful to train a neural network. Given an input and a target, they compute a gradient according to a given loss function. - * Classification criterions: - * [`BCECriterion`](#nn.BCECriterion): binary cross-entropy (two-class version of [`ClassNLLCriterion`](#nn.ClassNLLCriterion)); - * [`ClassNLLCriterion`](#nn.ClassNLLCriterion): negative log-likelihood for [`LogSoftMax`](transfer.md#nn.LogSoftMax) (multi-class); - * [`CrossEntropyCriterion`](#nn.CrossEntropyCriterion): combines [`LogSoftMax`](transfer.md#nn.LogSoftMax) and [`ClassNLLCriterion`](#nn.ClassNLLCriterion); - * [`MarginCriterion`](#nn.MarginCriterion): two class margin-based loss; - * [`MultiMarginCriterion`](#nn.MultiMarginCriterion): multi-class margin-based loss; - * [`MultiLabelMarginCriterion`](#nn.MultiLabelMarginCriterion): multi-class multi-classification margin-based loss; - * Regression criterions: - * [`AbsCriterion`](#nn.AbsCriterion): measures the mean absolute value of the element-wise difference between input; - * [`MSECriterion`](#nn.MSECriterion): mean square error (a classic); - * [`DistKLDivCriterion`](#nn.DistKLDivCriterion): Kullback–Leibler divergence (for fitting continuous probability distributions); - * Embedding criterions (measuring whether two inputs are similar or dissimilar): - * [`HingeEmbeddingCriterion`](#nn.HingeEmbeddingCriterion): takes a distance as input; - * [`L1HingeEmbeddingCriterion`](#nn.L1HingeEmbeddingCriterion): L1 distance between two inputs; - * [`CosineEmbeddingCriterion`](#nn.CosineEmbeddingCriterion): cosine distance between two inputs; - * Miscelaneus criterions: - * [`MultiCriterion`](#nn.MultiCriterion) : a weighted sum of other criterions each applied to the same input and target; - * [`ParallelCriterion`](#nn.ParallelCriterion) : a weighted sum of other criterions each applied to a different input and target; - * [`MarginRankingCriterion`](#nn.MarginRankingCriterion): ranks two inputs; + * Classification criterions: + * [`BCECriterion`](#nn.BCECriterion): binary cross-entropy (two-class version of [`ClassNLLCriterion`](#nn.ClassNLLCriterion)); + * [`ClassNLLCriterion`](#nn.ClassNLLCriterion): negative log-likelihood for [`LogSoftMax`](transfer.md#nn.LogSoftMax) (multi-class); + * [`CrossEntropyCriterion`](#nn.CrossEntropyCriterion): combines [`LogSoftMax`](transfer.md#nn.LogSoftMax) and [`ClassNLLCriterion`](#nn.ClassNLLCriterion); + * [`MarginCriterion`](#nn.MarginCriterion): two class margin-based loss; + * [`MultiMarginCriterion`](#nn.MultiMarginCriterion): multi-class margin-based loss; + * [`MultiLabelMarginCriterion`](#nn.MultiLabelMarginCriterion): multi-class multi-classification margin-based loss; + * Regression criterions: + * [`AbsCriterion`](#nn.AbsCriterion): measures the mean absolute value of the element-wise difference between input; + * [`MSECriterion`](#nn.MSECriterion): mean square error (a classic); + * [`DistKLDivCriterion`](#nn.DistKLDivCriterion): Kullback–Leibler divergence (for fitting continuous probability distributions); + * Embedding criterions (measuring whether two inputs are similar or dissimilar): + * [`HingeEmbeddingCriterion`](#nn.HingeEmbeddingCriterion): takes a distance as input; + * [`L1HingeEmbeddingCriterion`](#nn.L1HingeEmbeddingCriterion): L1 distance between two inputs; + * [`CosineEmbeddingCriterion`](#nn.CosineEmbeddingCriterion): cosine distance between two inputs; + * Miscelaneus criterions: + * [`MultiCriterion`](#nn.MultiCriterion) : a weighted sum of other criterions each applied to the same input and target; + * [`ParallelCriterion`](#nn.ParallelCriterion) : a weighted sum of other criterions each applied to a different input and target; + * [`MarginRankingCriterion`](#nn.MarginRankingCriterion): ranks two inputs; ## Criterion ## diff --git a/doc/simple.md b/doc/simple.md index bc4881b4b..ebb2d2fe9 100755 --- a/doc/simple.md +++ b/doc/simple.md @@ -2,40 +2,40 @@ # Simple layers # Simple Modules are used for various tasks like adapting Tensor methods and providing affine transformations : - * Parameterized Modules : - * [Linear](#nn.Linear) : a linear transformation ; - * [SparseLinear](#nn.SparseLinear) : a linear transformation with sparse inputs ; - * [Add](#nn.Add) : adds a bias term to the incoming data ; - * [Mul](#nn.Mul) : multiply a single scalar factor to the incoming data ; - * [CMul](#nn.CMul) : a component-wise multiplication to the incoming data ; - * [CDiv](#nn.CDiv) : a component-wise division to the incoming data ; - * [Euclidean](#nn.Euclidean) : the euclidean distance of the input to `k` mean centers ; - * [WeightedEuclidean](#nn.WeightedEuclidean) : similar to [Euclidean](#nn.Euclidean), but additionally learns a diagonal covariance matrix ; - * Modules that adapt basic Tensor methods : - * [Copy](#nn.Copy) : a [copy](https://github.com/torch/torch7/blob/master/doc/tensor.md#torch.Tensor.copy) of the input with [type](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-or-string-typetype) casting ; - * [Narrow](#nn.Narrow) : a [narrow](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-narrowdim-index-size) operation over a given dimension ; - * [Replicate](#nn.Replicate) : [repeats](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-repeattensorresult-sizes) input `n` times along its first dimension ; - * [Reshape](#nn.Reshape) : a [reshape](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchreshaperes-x-m-n) of the inputs ; - * [View](#nn.View) : a [view](https://github.com/torch/torch7/blob/master/doc/tensor.md#result-viewresult-tensor-sizes) of the inputs ; - * [Select](#nn.Select) : a [select](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-selectdim-index) over a given dimension ; - * Modules that adapt mathematical Tensor methods : - * [Max](#nn.Max) : a [max](https://github.com/torch/torch7/blob/master/doc/maths.md#torch.max) operation over a given dimension ; - * [Min](#nn.Min) : a [min](https://github.com/torch/torch7/blob/master/doc/maths.md#torchminresval-resind-x) operation over a given dimension ; - * [Mean](#nn.Mean) : a [mean](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchmeanres-x-dim) operation over a given dimension ; - * [Sum](#nn.Sum) : a [sum](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchsumres-x) operation over a given dimension ; - * [Exp](#nn.Exp) : an element-wise [exp](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchexpres-x) operation ; - * [Abs](#nn.Abs) : an element-wise [abs](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchabsres-x) operation ; - * [Power](#nn.Power) : an element-wise [pow](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchpowres-x) operation ; - * [Square](#nn.Square) : an element-wise square operation ; - * [Sqrt](#nn.Sqrt) : an element-wise [sqrt](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchsqrtres-x) operation ; - * [MM](#nn.MM) : matrix-matrix multiplication (also supports batches of matrices) ; - * Miscellaneous Modules : - * [BatchNormalization](#nn.BatchNormalization) - mean/std normalization over the mini-batch inputs (with an optional affine transform) ; - * [Identity](#nn.Identity) : forward input as-is to output (useful with [ParallelTable](table.md#nn.ParallelTable)); - * [Dropout](#nn.Dropout) : masks parts of the `input` using binary samples from a [bernoulli](http://en.wikipedia.org/wiki/Bernoulli_distribution) distribution ; - * [SpatialDropout](#nn.SpatialDropout) : Same as Dropout but for spatial inputs where adjacent pixels are strongly correlated ; - * [Padding](#nn.Padding) : adds padding to a dimension ; - * [L1Penalty](#nn.L1Penalty) : adds an L1 penalty to an input (for sparsity); + * Parameterized Modules : + * [Linear](#nn.Linear) : a linear transformation ; + * [SparseLinear](#nn.SparseLinear) : a linear transformation with sparse inputs ; + * [Add](#nn.Add) : adds a bias term to the incoming data ; + * [Mul](#nn.Mul) : multiply a single scalar factor to the incoming data ; + * [CMul](#nn.CMul) : a component-wise multiplication to the incoming data ; + * [CDiv](#nn.CDiv) : a component-wise division to the incoming data ; + * [Euclidean](#nn.Euclidean) : the euclidean distance of the input to `k` mean centers ; + * [WeightedEuclidean](#nn.WeightedEuclidean) : similar to [Euclidean](#nn.Euclidean), but additionally learns a diagonal covariance matrix ; + * Modules that adapt basic Tensor methods : + * [Copy](#nn.Copy) : a [copy](https://github.com/torch/torch7/blob/master/doc/tensor.md#torch.Tensor.copy) of the input with [type](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-or-string-typetype) casting ; + * [Narrow](#nn.Narrow) : a [narrow](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-narrowdim-index-size) operation over a given dimension ; + * [Replicate](#nn.Replicate) : [repeats](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-repeattensorresult-sizes) input `n` times along its first dimension ; + * [Reshape](#nn.Reshape) : a [reshape](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchreshaperes-x-m-n) of the inputs ; + * [View](#nn.View) : a [view](https://github.com/torch/torch7/blob/master/doc/tensor.md#result-viewresult-tensor-sizes) of the inputs ; + * [Select](#nn.Select) : a [select](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-selectdim-index) over a given dimension ; + * Modules that adapt mathematical Tensor methods : + * [Max](#nn.Max) : a [max](https://github.com/torch/torch7/blob/master/doc/maths.md#torch.max) operation over a given dimension ; + * [Min](#nn.Min) : a [min](https://github.com/torch/torch7/blob/master/doc/maths.md#torchminresval-resind-x) operation over a given dimension ; + * [Mean](#nn.Mean) : a [mean](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchmeanres-x-dim) operation over a given dimension ; + * [Sum](#nn.Sum) : a [sum](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchsumres-x) operation over a given dimension ; + * [Exp](#nn.Exp) : an element-wise [exp](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchexpres-x) operation ; + * [Abs](#nn.Abs) : an element-wise [abs](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchabsres-x) operation ; + * [Power](#nn.Power) : an element-wise [pow](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchpowres-x) operation ; + * [Square](#nn.Square) : an element-wise square operation ; + * [Sqrt](#nn.Sqrt) : an element-wise [sqrt](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchsqrtres-x) operation ; + * [MM](#nn.MM) : matrix-matrix multiplication (also supports batches of matrices) ; + * Miscellaneous Modules : + * [BatchNormalization](#nn.BatchNormalization) - mean/std normalization over the mini-batch inputs (with an optional affine transform) ; + * [Identity](#nn.Identity) : forward input as-is to output (useful with [ParallelTable](table.md#nn.ParallelTable)); + * [Dropout](#nn.Dropout) : masks parts of the `input` using binary samples from a [bernoulli](http://en.wikipedia.org/wiki/Bernoulli_distribution) distribution ; + * [SpatialDropout](#nn.SpatialDropout) : Same as Dropout but for spatial inputs where adjacent pixels are strongly correlated ; + * [Padding](#nn.Padding) : adds padding to a dimension ; + * [L1Penalty](#nn.L1Penalty) : adds an L1 penalty to an input (for sparsity); ## Linear ## diff --git a/doc/table.md b/doc/table.md index 221e4c37b..61d108543 100755 --- a/doc/table.md +++ b/doc/table.md @@ -4,27 +4,27 @@ This set of modules allows the manipulation of `table`s through the layers of a neural network. This allows one to build very rich architectures: - * `table` Container Modules encapsulate sub-Modules: - * [`ConcatTable`](#nn.ConcatTable): applies each member module to the same input [`Tensor`](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor) and outputs a `table`; - * [`ParallelTable`](#nn.ParallelTable): applies the `i`-th member module to the `i`-th input and outputs a `table`; - * Table Conversion Modules convert between `table`s and `Tensor`s or `table`s: - * [`SplitTable`](#nn.SplitTable): splits a `Tensor` into a `table` of `Tensor`s; - * [`JoinTable`](#nn.JoinTable): joins a `table` of `Tensor`s into a `Tensor`; - * [`MixtureTable`](#nn.MixtureTable): mixture of experts weighted by a gater; - * [`SelectTable`](#nn.SelectTable): select one element from a `table`; - * [`NarrowTable`](#nn.NarrowTable): select a slice of elements from a `table`; - * [`FlattenTable`](#nn.FlattenTable): flattens a nested `table` hierarchy; - * Pair Modules compute a measure like distance or similarity from a pair (`table`) of input `Tensor`s: - * [`PairwiseDistance`](#nn.PairwiseDistance): outputs the `p`-norm. distance between inputs; - * [`DotProduct`](#nn.DotProduct): outputs the dot product (similarity) between inputs; - * [`CosineDistance`](#nn.CosineDistance): outputs the cosine distance between inputs; - * CMath Modules perform element-wise operations on a `table` of `Tensor`s: - * [`CAddTable`](#nn.CAddTable): addition of input `Tensor`s; - * [`CSubTable`](#nn.CSubTable): substraction of input `Tensor`s; - * [`CMulTable`](#nn.CMulTable): multiplication of input `Tensor`s; - * [`CDivTable`](#nn.CDivTable): division of input `Tensor`s; - * `Table` of Criteria: - * [`CriterionTable`](#nn.CriterionTable): wraps a [Criterion](criterion.md#nn.Criterion) so that it can accept a `table` of inputs. + * `table` Container Modules encapsulate sub-Modules: + * [`ConcatTable`](#nn.ConcatTable): applies each member module to the same input [`Tensor`](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor) and outputs a `table`; + * [`ParallelTable`](#nn.ParallelTable): applies the `i`-th member module to the `i`-th input and outputs a `table`; + * Table Conversion Modules convert between `table`s and `Tensor`s or `table`s: + * [`SplitTable`](#nn.SplitTable): splits a `Tensor` into a `table` of `Tensor`s; + * [`JoinTable`](#nn.JoinTable): joins a `table` of `Tensor`s into a `Tensor`; + * [`MixtureTable`](#nn.MixtureTable): mixture of experts weighted by a gater; + * [`SelectTable`](#nn.SelectTable): select one element from a `table`; + * [`NarrowTable`](#nn.NarrowTable): select a slice of elements from a `table`; + * [`FlattenTable`](#nn.FlattenTable): flattens a nested `table` hierarchy; + * Pair Modules compute a measure like distance or similarity from a pair (`table`) of input `Tensor`s: + * [`PairwiseDistance`](#nn.PairwiseDistance): outputs the `p`-norm. distance between inputs; + * [`DotProduct`](#nn.DotProduct): outputs the dot product (similarity) between inputs; + * [`CosineDistance`](#nn.CosineDistance): outputs the cosine distance between inputs; + * CMath Modules perform element-wise operations on a `table` of `Tensor`s: + * [`CAddTable`](#nn.CAddTable): addition of input `Tensor`s; + * [`CSubTable`](#nn.CSubTable): substraction of input `Tensor`s; + * [`CMulTable`](#nn.CMulTable): multiplication of input `Tensor`s; + * [`CDivTable`](#nn.CDivTable): division of input `Tensor`s; + * `Table` of Criteria: + * [`CriterionTable`](#nn.CriterionTable): wraps a [Criterion](criterion.md#nn.Criterion) so that it can accept a `table` of inputs. `table`-based modules work by supporting `forward()` and `backward()` methods that can accept `table`s as inputs. It turns out that the usual [`Sequential`](containers.md#nn.Sequential) module can do this, so all that is needed is other child modules that take advantage of such `table`s.