Feed Forward Neural Network on MNIST

Example for training a Feed Forward Neural Network on the MNIST handwritten digit dataset.

Results

The code given below produces the following output that is quite similar to the results produced by an RBM.

1    0.1     0.0337166666667         0.0396
2    0.1     0.023                   0.0285
3    0.1     0.0198666666667         0.0276
4    0.1     0.0154                  0.0264
5    0.1     0.01385                 0.0239
6    0.1     0.01255                 0.0219
7    0.1     0.012                   0.0229
8    0.1     0.00926666666667        0.0207
9    0.1     0.0117                  0.0237
10   0.1     0.00881666666667        0.0214
11   0.1     0.007                   0.0191
12   0.1     0.00778333333333        0.0199
13   0.1     0.0067                  0.0183
14   0.1     0.00666666666667        0.0194
15   0.1     0.00665                 0.0197
16   0.1     0.00583333333333        0.0197
17   0.1     0.00563333333333        0.0193
18   0.1     0.005                   0.0181
19   0.1     0.00471666666667        0.0186
20   0.1     0.00431666666667        0.0191

Showing the Epoch / Learning Rate / Training Error / Test Error

See also RBM_MNIST_big.

Source code

../_images/download_icon.png
''' Toy example using FNN on MNIST.

    :Version:
        3.0

    :Date
        25.05.2019

    :Author:
        Jan Melchior

    :Contact:
        pydeep@gmail.com

    :License:

        Copyright (C) 2019  Jan Melchior

        This program is free software: you can redistribute it and/or modify
        it under the terms of the GNU General Public License as published by
        the Free Software Foundation, either version 3 of the License, or
        (at your option) any later version.

        This program is distributed in the hope that it will be useful,
        but WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
        GNU General Public License for more details.

        You should have received a copy of the GNU General Public License
        along with this program.  If not, see <http://www.gnu.org/licenses/>.

'''

import numpy as numx

import pydeep.fnn.model as MODEL
import pydeep.fnn.layer as LAYER
import pydeep.fnn.trainer as TRAINER
import pydeep.base.activationfunction as ACT
import pydeep.base.costfunction as COST
import pydeep.base.corruptor as CORR
import pydeep.misc.io as IO
import pydeep.base.numpyextension as npExt


# Set random seed (optional)
numx.random.seed(42)


# Load data and whiten it
train_data,train_label,valid_data, valid_label,test_data, test_label = IO.load_mnist("mnist.pkl.gz",False)
train_data = numx.vstack((train_data,valid_data))
train_label = numx.hstack((train_label,valid_label)).T
train_label = npExt.get_binary_label(train_label)
test_label = npExt.get_binary_label(test_label)

# Create model
l1 = LAYER.FullConnLayer(input_dim = train_data.shape[1],
                         output_dim = 1000,
                         activation_function=ACT.ExponentialLinear(),
                         initial_weights='AUTO',
                         initial_bias=0.0,
                         initial_offset=numx.mean(train_data,axis = 0).reshape(1,train_data.shape[1]),
                         connections=None,
                         dtype=numx.float64)
l2 = LAYER.FullConnLayer(input_dim = 1000,
                         output_dim = train_label.shape[1],
                         activation_function=ACT.SoftMax(),
                         initial_weights='AUTO',
                         initial_bias=0.0,
                         initial_offset=0.0,
                         connections=None,
                         dtype=numx.float64)
model = MODEL.Model([l1,l2])

# Choose an Optimizer
trainer = TRAINER.ADAGDTrainer(model)
#trainer = TRAINER.GDTrainer(model)

# Train model
max_epochs =20
batch_size = 20
eps = 0.1
print 'Training'
for epoch in range(1, max_epochs + 1):
    train_data, train_label = npExt.shuffle_dataset(train_data, train_label)
    for b in range(0, train_data.shape[0], batch_size):
        trainer.train(data=train_data[b:b + batch_size, :],
                      labels=[None,train_label[b:b + batch_size, :]],
                      costs = [None,COST.CrossEntropyError()],
                      reg_costs = [0.0,1.0],
                      #momentum=[0.0]*model.num_layers,
                      epsilon = [eps]*model.num_layers,
                      update_offsets = [0.0]*model.num_layers,
                      corruptor = [CORR.Dropout(0.2),CORR.Dropout(0.5),None],
                      reg_L1Norm = [0.0]*model.num_layers,
                      reg_L2Norm = [0.0]*model.num_layers,
                      reg_sparseness  = [0.0]*model.num_layers,
                      desired_sparseness = [0.0]*model.num_layers,
                      costs_sparseness = [None]*model.num_layers,
                      restrict_gradient = [0.0]*model.num_layers,
                      restriction_norm = 'Mat')
    print epoch,'\t',eps,'\t',
    print numx.mean(npExt.compare_index_of_max(model.forward_propagate(train_data),train_label)),'\t',
    print numx.mean(npExt.compare_index_of_max(model.forward_propagate(test_data), test_label))