I know I probably haven’t been posting as frequently as many of you would like or even at my normal quality because… well, like for many of you, this year has just sucked!

Someone I’ve known my whole life died recently, not from the virus though it didn’t help things.

She went in for a “routine” procedure where they needed to use general anesthesia and there were “complications” during the procedure. Something to do with her heart but if I’m being honest, I don’t know all the details at this time.

Also, I’m not sure how by anyone’s definition anything involving anesthesia is routine?

An ambulance was called and she was rushed to the hospital, long story short, despite being otherwise fine when she went in, she never woke up from her coma. 😥

The hospital is/was on lock down like everyone else and so friends and family were unable to visit her before she died.

Her family intends to sue the Dr. for malpractice, personally… I think they should!

To add insult to injury, she was cremated without a funeral due to the whole pandemic social distancing BS that I’m just about ready to tell the government to go fuck itself over! 😦

I’m sorry, do my harsh words offend you? SHE DIED ALONE! That offends me!

Going forward, my advice… any procedure where they need to administer general anesthesia to you… or maybe any procedure at all… make sure it’s in a hospital or hospital adjacent (NOT A CLINIC) because those minutes waiting for an ambulance really do mean your life!

And if your doctor is like, “No worries this is routine… I’ve done this a thousand times”, maybe think carefully before putting your trust in that person.

Yes, we want doctors that are confident in their ability to treat us but make sure that it is confidence and not complacent hubris!

Further, no procedure is truly “routine” and a doctor, of all people, should know that and act accordingly!

“Primum non nocere”

~Hippocrates… (allegedly)

Regardless of the historical veracity of that quote, does the spirit of that principle still not apply?

Look, I’m not saying this to detract from the important life saving work doctors and medical workers do every day, it’s just that this is part of what’s going on in my life right now (and for many of you as well) and I’m sharing because I guess that’s what you do when you have a blog.

Additionally, less close to home, though still another terrible loss, John Horton Conway, notable math hero to geeks and nerds alike died as a result of complications from his contracting the Covid-19 virus. 😦

I’ve previously written a little about Conway’s work in my ancestor simulations series of posts.

Mysterious Game of Life Posts:

But that only scratches the surface of his work and famously Conway’s Game of Life was perhaps his least favorite but most well known work among non-mathematicians and it would both amuse and bug him if I only mentioned his game of life here so I’m not going to list his other accomplishments.

I’ll have a little chuckle off camera on his behalf. 😛

He really was a math genius and you would learn a lot of interesting, not to mention surreal… but I’ve said too much, ideas by reading about his accomplishments, which I encourage you to do!

In any case, people I know and admire need to stop dying because its killing me… not to mention my ratings and readership because I keep talking about it! 😛

I may have a terribly dark sense of humor at times, but going forward I demand strict adherence from all of you to the Oasis Doctrine! 😥

Oh, and speaking of pretentious art…

The OCR 2 Wallpaper

The original OCR didn’t exactly have a wallpaper but I did create an image/logo to go along with the project and its blog posts:

For the reason you might think I made it look like an eye… because it looks like an non-evil Hal 9000! 😛

Also, I like the idea of depicting a robotic eye in relation to AI and neural networks because, even though I am not superstitious in any way, it carries some of the symbology of Illuminati, “The gaze of the Beholder”, “The Eye of Providence”, “The Evil Eye”, The Eye of Horus, The Eye of Ra, Eye of newt and needle… sorry. 😛

In this case, the eye of a robot invokes a sense of literal “Deus ex machina” (God from the machine) and it illustrates some peoples fears of “The Singularity” and of the possibility of an intelligence that is so much greater than our own that it calls in to question our ability to even comprehend it… hmmm… is that too lovecraftian? 😛

Anyway, because I enjoy the thought provoking symbology (maybe it’s just me), I wanted to keep the same concept of the robot eye but update it to look a little less like a simple cartoon to subtly imply it’s a more advanced version of OCR but that it still fundamentally does the same thing, which is most of the reasoning behind this wallpaper.

In any case, I hope you enjoy it.

OCR 2 Wallpaper
OCR 2 Wallpaper

If you’d like the wallpaper with the feature image text here’s that version.

OCR 2 Wallpaper (with text)
OCR 2 Wallpaper (with text)

So I guess having shared a few of the recent tragedies in my personal life and a couple of wallpapers, we should probably get mogating and talk about the point of today’s post!

We’re going to look at doing hand-written number (0-9) Optical Character Recognition using the MNIST database.

OCR 2 – The MNIST Dataset with PHP and FANN

I was recently contacted by a full-stack developer who wanted advice on creating his own OCR system for “stickers on internal vehicles”.

I think he means, some kind of warehouse robots?

He had seen my OCR ANN and seemingly preferred to work with PHP over Python, which if I’m being honest… I can’t exactly argue with!

PHP is C++ for the web and powers like almost 80-90% of the internet so it should come as no surprise to anyone (even though it does) that there are people who want to use it to build bots! 😛

But, if you would rather work with a different language there is a better than decent chance FANN has bindings for it so you should be able to use the ANN’s even if you are not using PHP.

So anyway, he gave me a dollar for my advice through Patreon and we had a brief conversation over messaging where I offered him a few suggestions and walked him through getting started.

Ultimately, because he lacks an AI/ML background and/or a sufficient familiarity with an AI/ML workflow he wasn’t very confident about proceeding so I recommended he follow my existing tutorials which should help him learn the basics of how to proceed.

Now here’s the thing, even among people who like my content and value my efforts, few people are generous enough to give me money for my advice and when they do, I genuinely appreciate it! 🙂

So, as a thank you I want to offer another (more complete) example of how to use a neural network to do OCR.

If he followed my advice, he should be fairly close to being ready for a more complete real world OCR ANN example (assuming he is still reading 😛 ) but if not, his loss is still your gain!

Today’s code implements OCR using the MNIST dataset and I demonstrate a basic form of pooling (though the stride is not adjustable as is) and I show convolutions using the GD image library, image convolution function and include 17 demonstration kernel matrices that you can experiment with, though not all are relevant or necessary for this project.

This is still very basic but everything you need to get started experimenting with OCR is here.

Having said that, in all honesty, to accomplish your goal requires building your own dataset and modifying the code I present here to meet your needs.

Neither are exactly hard but will require significant time and dedication to testing and refining your processes.

Obviously that’s not something I can cover in a single post or even assist you with for only a dollar, but since so few people show me the kindness and consideration you have, at a time of shrinking economies no less, I wanted to offer you this working OCR prototype to help you along your way.

Our Method

1. Download the MNIST dataset (link below, but it’s in the GitHub repo too).

2. Unpack/Export the data from the files to images and labels.

(technically we could even skip the images and go directly to a training file but I think it’s nice to have the images and labels in a human viewable format)

3. Create training and test data from images and labels.

4. Train the network.

5. Test the network.

The MNIST Dataset

MNIST stands for Modified National Institute of Standards and Technology database.

And since I’m still recovering from last nights food poisoning due to the Chicken à la Nauseam we’re just going to use Wikipedia’s introduction to MNIST.

It’s easily as good as anything I could write and doesn’t require me actually write it so…

Wikipedia says:

“It’s a large database of handwritten digits that is commonly used for training various image processing systems.[1][2]”

It also says:

“It was created by “re-mixing” the samples from NIST’s original datasets. The creators felt that since NIST’s training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments.[5] Furthermore, the black and white images from NIST were normalized to fit into a 28×28 pixel bounding box and anti-aliased, which introduced grayscale levels.[5]”

Here’s 500 pseudo-random MNIST sample images:

I randomly selected 500 1’s, 3’s and 7’s and composited them into this 1337 animation. 😛

500 random 1337 MNIST images.
500 random 1337 MNIST images

Seriously though,  today we will be training a bot to identify which hand-written number (0-9) each 28×28 px image contains and then test the bot using images it hasn’t previously seen.

Our bot will learn using all 60K labeled training images and we’ll test it using the 10,000 labeled test images.

Here’s the wiki article if you would like to learn more about the database.

MNIST WIKI: https://en.wikipedia.org/wiki/MNIST_database

And as I said above, I’ve included the database in the GitHub repo but you can download it again from the original source if you prefer.

Original MNIST Download: http://yann.lecun.com/exdb/mnist/

Unpack the Dataset to Images

The first thing we have to do is unpack/export the label and image data from the database files.

The database file format is provided by the creator is as follows (copied and pasted):

The data is stored in a very simple file format designed for storing vectors and multidimensional matrices. General info on this format is given at the end of this page, but you don’t need to read that to use the data files.

All the integers in the files are stored in the MSB first (high endian) format used by most non-Intel processors. Users of Intel processors and other low-endian machines must flip the bytes of the header.

There are 4 files:

train-images-idx3-ubyte: training set images
train-labels-idx1-ubyte: training set labels
t10k-images-idx3-ubyte:  test set images
t10k-labels-idx1-ubyte:  test set labels

The training set contains 60000 examples, and the test set 10000 examples.

The first 5000 examples of the test set are taken from the original NIST training set. The last 5000 are taken from the original NIST test set. The first 5000 are cleaner and easier than the last 5000.

TRAINING SET LABEL FILE (train-labels-idx1-ubyte):

[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000801(2049) magic number (MSB first)
0004     32 bit integer  60000            number of items
0008     unsigned byte   ??               label
0009     unsigned byte   ??               label
........
xxxx     unsigned byte   ??               label

The labels values are 0 to 9.

TRAINING SET IMAGE FILE (train-images-idx3-ubyte):

[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000803(2051) magic number
0004     32 bit integer  60000            number of images
0008     32 bit integer  28               number of rows
0012     32 bit integer  28               number of columns
0016     unsigned byte   ??               pixel
0017     unsigned byte   ??               pixel
........
xxxx     unsigned byte   ??               pixel

Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).

TEST SET LABEL FILE (t10k-labels-idx1-ubyte):

[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000801(2049) magic number (MSB first)
0004     32 bit integer  10000            number of items
0008     unsigned byte   ??               label
0009     unsigned byte   ??               label
........
xxxx     unsigned byte   ??               label

The labels values are 0 to 9.

TEST SET IMAGE FILE (t10k-images-idx3-ubyte):

[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000803(2051) magic number
0004     32 bit integer  10000            number of images
0008     32 bit integer  28               number of rows
0012     32 bit integer  28               number of columns
0016     unsigned byte   ??               pixel
0017     unsigned byte   ??               pixel
........
xxxx     unsigned byte   ??               pixel

Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).


The only thing to note about when I generate the images I invert the background and foreground colors because I find it convenient/preferable to depict the pixels that contain the digit data as 255 (white) and the “empty” pixels as 0 (Black) which lines up with the RGB color scheme better… but technically it doesn’t matter because we will pre-process the images before creating the dataset so we could do that then.

So, with this file format in mind we can extract the images using this code.

Note: This code is not optimized so it requires a 64 bit OS + PHP install but it could be made to work on a 32 bit machine with code refactoring.

Each image is extracted from the database files and saved in the .png file format.

Each label is saved to a space & newline delimited .txt file.

Labels are stored as:

Image_File_Name label

That looks like this:

0.png 5
1.png 0
2.png 4
3.png 1
4.png 9
5.png 2
...

Here’s the code.
ExtractMINSTToImages.php:

<?php


/*

This code requires a 64 Bit PHP installation and if you see an error like this:

VirtualAlloc() failed: [0x00000008] Not enough memory resources are available to process this command.


VirtualAlloc() failed: [0x00000008] Not enough memory resources are available to process this command.

PHP Fatal error:  Out of memory (allocated 813694976) (tried to allocate 805306376 bytes) in ... ExtractMINSTToImages.php on line 20

Install PHP 64 Bit: 

https://www.php.net/downloads
https://windows.php.net/download/

Instead of reading/processing the entire database file into memory...

processing the data image by image would make it possible for a 32 bit OS & PHP instillation 

to extract the images but I got lazy and just did it all in memory. 😛

*/


ini_set("max_execution_time", "-1");
ini_set('memory_limit','-1');
set_time_limit(0);


/////////////////////////////////////////////////////////////////////////////////
// Functions

function GetFileBytes($file_path){
    
    $file_handle = fopen($file_path, 'r'); // Open File

    $raw_bytes = array(); // Raw bytes will be read 1 at a time into this array
                          // until the entire file has been read
                          
    while (!feof($file_handle)) { // From now till the end of the file
            $raw_bytes[] = fread($file_handle, 1); // Read 1 Byte of data - EOF
    }
    
    fclose($file_handle); // Close File
    
    return $raw_bytes; // Return raw byte data
}


function ExtractLabelFromByteData($labels_file, $save_directory, $save_file){
    
    $raw_bytes = GetFileBytes($labels_file);
    
    // magic number (MSB first) - Bytes 0-3 (32 bits)
    $magic_number = $raw_bytes[0]
                    . $raw_bytes[1]
                    . $raw_bytes[2]
                    . $raw_bytes[3];
                    
    // number of items - Bytes 4-7 (32 bits)
    $number_of_items = unpack('N', $raw_bytes[4]
                    . $raw_bytes[5]
                    . $raw_bytes[6]
                    . $raw_bytes[7]); // 32 bit Unsigned Int
    $number_of_items = $number_of_items[1];

    $labels = array();
    $curr_label = 0;
    for($bit = 8; $bit < $number_of_items + 8; $bit++){
        $b1 = $raw_bytes[$bit];
        $label = unpack('C', $b1);
        $labels[] = "$curr_label.png " . $label[1];
        $curr_label++;
    }
        
    // Create Labels File
    $unpacked_labels_file = fopen($save_directory . DIRECTORY_SEPARATOR . $save_file, 'w');
    fwrite($unpacked_labels_file, implode(PHP_EOL, $labels)); // Write Labels
    fclose($unpacked_labels_file); // Close File
    
    // Free memory
    $raw_bytes = NULL;
    unset($raw_bytes);
}


function ExtractImagesFromByteData($images_file, $save_directory){
    
    // Open File and Get Bytes
    $raw_bytes = GetFileBytes($images_file);
    
    // magic number (MSB first) - Bytes 0-3 (32 bits)
    $magic_number = $raw_bytes[0]
                    . $raw_bytes[1]
                    . $raw_bytes[2]
                    . $raw_bytes[3];
                    
    // number of images - Bytes 4-7 (32 bits)
    $number_of_images = unpack('N', $raw_bytes[4]
                    . $raw_bytes[5]
                    . $raw_bytes[6]
                    . $raw_bytes[7]); // 32 bit Unsigned Int
    $number_of_images = $number_of_images[1];
    
    
    // number of rows - Bytes 8-11 (32 bits)
    $number_of_rows = unpack('N', $raw_bytes[8]
                    . $raw_bytes[9]
                    . $raw_bytes[10]
                    . $raw_bytes[11]); // 32 bit Unsigned Int
                                    
    $number_of_rows = $number_of_rows[1];
    
    // number of columns - Bytes 12-15 (32 bits)
    $number_of_columns = unpack('N', $raw_bytes[12]
                    . $raw_bytes[13]
                    . $raw_bytes[14]
                    . $raw_bytes[15]); // 32 bit Unsigned Int
    $number_of_columns = $number_of_columns[1];

    $bytes_per_image = $number_of_rows * $number_of_columns;


    $current_bit = 16;
    for($curr_image = 0; $curr_image < $number_of_images; $curr_image++){
        $pixels = array();
        for($bit = 0; $bit < $bytes_per_image; $bit++){
            $pixel = unpack('C', $raw_bytes[$current_bit]);
            $pixels[] = $pixel[1];
            $current_bit++;
        }

        $im = imagecreate($number_of_columns, $number_of_rows);

        // Sets background to black
        $background = imagecolorallocate($im, 0, 0, 0);

        // Allocate colors
        $white = imagecolorallocate($im, 255, 255, 255);
        $black = imagecolorallocate($im, 0, 0, 0);
        
        $curr_pixel = 0;
        for($row = 0; $row < $number_of_rows; $row++){
            for($col = 0; $col < $number_of_columns; $col++){
                
                if($pixels[$curr_pixel] > 0){
                    $color = $white;
                }
                else{
                    $color = $black;
                }
                
                imagesetpixel($im, $col, $row, $color);
                $curr_pixel++;
            }
        }

        imagepng($im, $save_directory . DIRECTORY_SEPARATOR . "$curr_image.png");
        imagedestroy($im);
    
    } // for curr image
    
       $raw_bytes = NULL;
    unset($raw_bytes);
}


// / Functions
/////////////////////////////////////////////////////////////////////////////////

// Paths
// Packed Labels
$training_labels_file = __DIR__ . DIRECTORY_SEPARATOR . 'Training Data' . DIRECTORY_SEPARATOR . 'train-labels.idx1-ubyte';
$test_labels_file = __DIR__ . DIRECTORY_SEPARATOR . 'Training Data' . DIRECTORY_SEPARATOR . 't10k-labels.idx1-ubyte';

// Packed Images
$training_images_file = __DIR__ . DIRECTORY_SEPARATOR . 'Training Data' . DIRECTORY_SEPARATOR . 'train-images.idx3-ubyte';
$test_images_file = __DIR__ . DIRECTORY_SEPARATOR . 'Training Data' . DIRECTORY_SEPARATOR . 't10k-images.idx3-ubyte';

// Where to Unpack Data To
$train_directory = __DIR__ . DIRECTORY_SEPARATOR . 'Training Images' . DIRECTORY_SEPARATOR . 'train';
$test_directory = __DIR__ . DIRECTORY_SEPARATOR .  'Training Images' . DIRECTORY_SEPARATOR . 'test';


// Make sure the locations we're unpacking the images to exist
if (!mkdir($train_directory, 0777, true)) {
    die("Failed to create $train_directory");
}

if (!mkdir($test_directory, 0777, true)) {
    die("Failed to create $test_directory");
}



//////////////////////////
// Labels               //
//////////////////////////

// Extract labels from bytes to minst_train_labels.txt
ExtractLabelFromByteData($training_labels_file, $train_directory, 'minst_train_labels.txt');
ExtractLabelFromByteData($test_labels_file, $test_directory, 'minst_test_labels.txt');



//////////////////////////
// Images               //
//////////////////////////
    
// Extract images from bytes to 0.png, 1.png 2.png, ...
ExtractImagesFromByteData($training_images_file, $train_directory);
ExtractImagesFromByteData($test_images_file, $test_directory);


echo 'All Done!' . PHP_EOL;

Create Training and Test Files

With the training images extracted from the database we next need to use the images to create training and test datasets.

To create our datasets we first decide on what and how many “pre-process” uh… “layers” we want to use on the images.

Our basic set of tools consist of convolution, pooling and flattening layers.

We could implement RELU etc… but in this case it’s not needed because the images are already B/W and none of our pre-process layers will introduce any grayscale/shading so it’s not used.

Convolution Layers

Convolution layers help us do things to the image like:

  • Edge Detection
  • Image Blurring
  • Image Sharpening
  • Image Distortion
  • Pixel Shifting
  • Pixel Embossing

Convolutions can can help us simplify or enhance images and or make important features like edges or the area of the image content stand out which can help the ANN better learn the visual tasks we want them to.

I keep meaning to write a post dedicated to convolution so I won’t go into too much detail on it here, especially considering we are outsourcing the image convolution to the GD Lib’s implementation which is unfortunately limited to a 3×3 convolution kernel/matrix and it would be nice to be able to specify a larger size like 5×5 or 7×7 etc…

Convolutions produce images that are altered from the original in some desired way.

The kernel matrix is “moved” or iterated over the image and the pixel values are multiplied in a certain order with the values in the kernel matrix.

The result is new pixel values.

Here’s the 17 kernel examples I include:

  • identity
  • edge_outline
  • edge_horizontal
  • edge_vertical
  • area
  • emboss_1
  • emboss_2
  • emboss_3
  • emboss_4
  • shift_north_west
  • shift_north
  • shift_north_east
  • shift_east
  • shift_south_east
  • shift_south
  • shift_south_west
  • shift_west

Here is an example of what these convolution layers do to an image:

Example of what kernel convolution layers do to an image.
Example of what kernel convolution layers do to an image.

Think of convolutions like the ANN is squinting and moving it’s head around to get a better look at the thing it’s looking at. 😛

Pooling Layers

Pooling layers help us reduce the “dimensionality” of an image, meaning that if we have a large image we can “pool” the images pixels to make it smaller by “throwing away” some hopefully “unimportant” pixels/information without losing the “important” information.

A common way to decide if a pixel is important or not is to say that the “brightest” or most colorful pixels are most important and this is called “Max Pooling”.

Alternatives to max pooling are “Min Pooling”, which is where you want the “darkest” or least colorful pixels and “Average Pooling” where you average the values of all the pixels inside the pooling matrix.

For this project I used max pooling.

The way pooling works is by considering small groups of pixels e.g 3×3, 5×5, 7×7 etc… and then only the “Max” value moves forward.

Pooling results in a smaller image so there is usually some loss of detail information but the hope is that the “important” pixels will remain.

A smaller pooling matrix means less “aggressive” reduction in image size, meaning a 2×2 pooling matrix will reduce the image by half but a 5×5 matrix means the image is reduced to 1/5th is original size.

Here’s an example of what happens to a 100×100 px image of a stick figure using a 2×2 pooling matrix.

Example of pooling an image of a stick figure.
Example of pooling an image of a stick figure.

Notice that after pooling a second time, we’ve reduced the image so much that all facial detail on the stick figure has been lost.

A third pooling layer would reduce the image so much that only white pixels would remain.

Pooling makes it possible for your ANN to process larger images by reducing the number of input neurons required for the bot  to view the image however it’s important to note that pooling does cause some loss of pixel information (like the face of the stick figure… and eventually the entire stick figure) and as such it is possible to over pool your images.

Pool carefully my friends! 😛

The Flattening Layer

ALWAYS FLATTEN and when you do, always flatten last. 😉

Flattening takes the pixel rows and stacks them together to make one long string of pixels.

These pixels are what we use as input values for the ANN.

Example:

ABCDE

FGHIJ

KLMNO

PQRST

UVWXY

Each letter represents a pixel in an image that is 5×5.

Flattened the pixel data looks like this:

ABCDEFGHIJKLMNOPQRSTYV

Once we’re done flattening all the images that resulted from our convolution and pooling layers  the black pixels are converted from zeros to -1’s and the label is encoded as our desired output value.

Example Outputs:

0 = 1000000000
1 = 0100000000
2 = 0010000000
3 = 0001000000
4 = 0000100000
5 = 0000010000
6 = 0000001000
7 = 0000000100
8 = 0000000010
9 = 0000000001

Though the zeros are encoded as -1’s.

Once all the images are processed we count the data and write the FANN header to the .data files and transfer the buffered data from our temp file to the data file.

GenerateMnistTrainingData.php:

<?php

ini_set("max_execution_time", "-1");
ini_set('memory_limit','-1');
set_time_limit(0);

include('Functions.php');

// Where are the images and labels
$train_directory = __DIR__ . DIRECTORY_SEPARATOR . 'Training Images' . DIRECTORY_SEPARATOR . 'train';
$test_directory = __DIR__ . DIRECTORY_SEPARATOR .  'Training Images' .  DIRECTORY_SEPARATOR . 'test';


// Where is the data
$train_save_directory = __DIR__ . DIRECTORY_SEPARATOR . 'Training Data';
$test_save_directory = __DIR__ . DIRECTORY_SEPARATOR .  'Training Data';


// Convolution_kernels we might want to use
$convolution_kernels = array(
    // This is just a pass-through of the original image
    'identity'=>array(array(0, 0, 0), 
                      array(0, 1, 0), 
                      array(0, 0, 0)
                     ),
    // Shift kernels - basically... these shift the image 1 pixel in the specified direction
    'shift_north_west'=>array(array(4, 0, 0),
                              array(0, -1, 0),
                              array(0, 0, -4)
                             ),
    'shift_north'=>array(array(0, 4, 0),
                         array(0, -1, 0),
                         array(0, -4, 0)
                        ),
    'shift_north_east'=>array(array(0, 0, 4),
                              array(0, -1, 0),
                              array(-4, 0, 0)
                             ),
    'shift_east'=>array(array(0, 0, 0),
                        array(-4, -1, 4),
                        array(0, 0, 0)
                       ),
    'shift_south_east'=>array(array(-4, 0, 0),
                              array(0, -1, 0),
                              array(0, 0, 4)
                             ),
    'shift_south'=>array(array(0, -4, 0),
                         array(0, -1, 0),
                         array(0, 4, 0)
                        ),
    'shift_south_west'=>array(array(0, 0, -4),
                              array(0, -1, 0),
                              array(4, 0, 0)
                             ),
    'shift_west'=>array(array(0, 0, 0),
                        array(4, -1, -4),
                        array(0, 0, 0)
                       ),
    // Emboss kernels - These highlight and shadow angles and boundaries https://en.wikipedia.org/wiki/Image_embossing
    'emboss_1'=>array(array(-2, -1, 0),
                      array(-1, 0, 1),
                      array(0, 1, 2)
                     ),
    'emboss_2'=>array(array(0, -1, -2),
                      array(1, 0, -1),
                      array(2, 1, 0)
                     ),
    'emboss_3'=>array(array(2, 1, 0),
                      array(1, 0, -1),
                      array(0, -1, -2)
                     ),
    'emboss_4'=>array(array(0, 1, 2), 
                      array(-1, 0, 1),
                      array(-2, -1, 0)
                     ),
    // Edges - Find the edges and outline - https://en.wikipedia.org/wiki/Kernel_(image_processing)
    'edge_outline'=>array(array(0, 1, 0),
                          array(1, -4, 1),
                          array(0, 1, 0)
                         ),
    'edge_horizontal'=>array(array(-1, -1, -1),
                             array(2, 2, 2),
                             array(-1, -1, -1)
                            ),
    'edge_vertical'=>array(array(-1, 2, -1),
                           array(-1, 2, -1),
                           array(-1, 2, -1)
                           ),
    // Area kernel
    'area'=>array(array(-4, 4, -4),
                  array(4, -4, 4),
                  array(-4, 4, -4)
                 )
); // / convolution_kernels


// List of Layers
// Note that Flattening should always be last
$layers = array(CONVOLUTION_LAYER, 
                POOLING_LAYER, 
                //CONVOLUTION_LAYER, 
                //POOLING_LAYER, 
                FLATTENING_LAYER); // Always flatten, and always last
                


// List of kernels we will actually use
// Uncomment the ones you want but each one adds to the size 
// of the ANN input layer and increases the size of the training 
// data. Additionally, it will result in slower training, however
// experiment with different kernels and your dataset to see which 
// (if any) work best with your dataset.
$kernels = array(
                 'identity'=>$convolution_kernels['identity'],
                 //'edge_outline'=>$convolution_kernels['edge_outline'],
                 //'edge_horizontal'=>$convolution_kernels['edge_horizontal'], 
                 //'edge_vertical'=>$convolution_kernels['edge_vertical'], 
                 'area'=>$convolution_kernels['area']
                 //'emboss_1'=>$convolution_kernels['emboss_1'],
                 //'emboss_2'=>$convolution_kernels['emboss_2'],
                 //'emboss_3'=>$convolution_kernels['emboss_3'],
                 //'emboss_4'=>$convolution_kernels['emboss_4'],
                 //'shift_north_west'=>$convolution_kernels['shift_north_west'],
                 //'shift_north'=>$convolution_kernels['shift_north'],
                 //'shift_north_east'=>$convolution_kernels['shift_north_east'],
                 //'shift_east'=>$convolution_kernels['shift_east'],
                 //'shift_south_east'=>$convolution_kernels['shift_south_east'],
                 //'shift_south'=>$convolution_kernels['shift_south'],
                 //'shift_south_west'=>$convolution_kernels['shift_south_west'],
                 //'shift_west'=>$convolution_kernels['shift_west']
                );



// Pool size - We use only 1 pooling layer before flattening
// The pooling function and as written, it isn't that robust, 
// meaning that it doesn't handle empty pixels on the edge at all, 
// this is because currently it divides the image into equal sized groups
// that must line up inside the pool matrix completely. The pooling function 
// will attempt to adjust the matrix size for you in one direction (up) if 
// the size selected results in a bad configuration.
// 
// Adding an adjustable stride (not that difficult at all) and selecting and implementing 
// an empty pixel strategy to handle matrix grid sizes that result in empty pixels  (moderately difficult)
// Would greatly improve this implementation and would probably be one of the first things on the to do
// list.
// 
// So... given the aforementioned... pooling the MNIST dataset 
// at the minimum pool size of 2 results in only being possible to pool a maximum of
// 2 times, there are alternatives to adding stride... but add pooling stride!
$pooling_size = 2; // This is the number of pixels
                   // in 1D of our 2D pooling matrix
                   //
                   // More pixels per pool means a smaller 
                   // output pooled image because more pixels
                   // are pooled into fewer pixels.
                   //
                   // e.g. using an image that is 28 x 28 (like the MNIST image set)
                   // 
                   // 2 = 28 / 2x2 matrix = 14 - 14x14 pooled image (196 input neurons required)
                   // 4 = 28 / 4x4 matrix = 7 - 7x7 pooled image    (49 input neurons required)
                   // 7 = 28 / 7x7 matrix = 4 - 4x4 pooled image    (16 input neurons required)
                   // 14 = 28 / 14x14 matrix = 2 - 2x2 pooled image (4 input neurons required)

$pooling_method = MAX_POOL;// Options:
                           // MAX_POOL
                           // MIN_POOL
                           // AVG_POOL

echo 'Generating training and test data data from labels and images...' . PHP_EOL;

// Generate  minst.train.data file from the images and labels
GenerateDatasetFromLabeledImages($train_directory, 
                                'minst_train_labels.txt',
                                $train_save_directory,
                                'minst.train.data', 
                                $layers,
                                $kernels, 
                                $pooling_method, 
                                $pooling_size);

// Generate  minst.test.data file from the images and labels    
GenerateDatasetFromLabeledImages($test_directory, 
                                'minst_test_labels.txt',
                                $test_save_directory,                                
                                'minst.test.data', 
                                $layers, 
                                $kernels, 
                                $pooling_method, 
                                $pooling_size);

echo 'All Done!' . PHP_EOL;

Train the OCR Network

Training is accomplished by creating a neural network with an input layer, an output layer and some number of hidden layers between them and then “teaching” it using our training data.

Input Neurons

My MNIST Database ANN has 392 inputs neurons.

This value is determined by taking the input image size 28x28px = 784 pixels.

The input image is convolved once into two separate images (identity & area) that are 28×28 px for a total of (28×28)*2 = 1568 total pixels.

Each image is pooled once resulting in two 14×14 px images and (14×14)*2 = 392 total pixels when the images are flattened and combined.

Hidden Neurons

My ANN has 512 hidden neurons and other than stating that their “activation function” is FANN_SIGMOID_SYMMETRIC.

I picked 512 because it’s a nice large number that should do well enough with this dataset without taking forever to train… and 256 seemed a little dumb after training. 😛

Output Neurons

The number of output neurons is determined by the number and nature of the possible answers the ANN can respond with.

In this case, the ANN can answer 0 – 9 by setting it’s output neuron to “high” or 1 while leaving the incorrect answers “low” or close to -1.

The position of the output neuron represents it’s value:

Output Neuron Answer Positions
0 1 2 3 4 5 6 7 8 9

This means that for example, an idealized answer of 5 from the ANN would be similar to:

Example Answer: 5
-1 -1 -1 -1 -1 1 -1 -1 -1 -1

This is because the 1 is in the 5’s place.

In reality, the ANN’s response will not be perfect -1’s and 1’s and the real values the ANN answers will be floating point numbers for each neuron between -1 and 1 like 0.3 or -9.265.

During training, the ANN looks at the input pixels for each training example and tries to compute/calc a value that is as close to the desired output as possible by reducing what’s called the “Mean Square Error” or MSE.

Each “epoch” (training cycle) the ANN learns/views the training data and computes it’s answer for each training set. It then can use the MSE as a guide to compute how bad it’s answers were.

After each epoch, the ANN updates the “weights” for the hidden neurons using the Resilient Backpropagation (Rprop) algorithm.

While training the MSE is used to keep track of when training snapshots (backups) of the ANN should be saved.

MSE is generally a good mechanism to test if your ANN is improving however only to a point because it is the “mean” of the error for all the hidden neurons and as such there is a point where even though the MSE continues to descend the actual accuracy of the ANN starts to be reduced due to distributing any error present in the data set or ANN across all neurons.

Training with our dataset teaches the OCR ANN to identify the hand written numbers in the MNIST training images.

All training snapshots are saved over with the name “minst.ocr.train.net” and the final ANN is save with the name “minst.ocr.final.net”.

A snapshot save log is kept that lists the MSE for each save as a newline delimited text file and can be used to generate charts that show the error as a form of “gradient descent“.

Example of the MSE descending during training:

MNIST OCR, MSE descending during training.
MNIST OCR, MSE descending during training.

TrainMnist.php:

<?php


ini_set("max_execution_time", "-1");
ini_set('memory_limit','-1');
set_time_limit(0);

include('Functions.php');


// Training Variables
$desired_error = 0.001;
$max_epochs = 500000;
$current_epoch = 0;
$epochs_between_saves = 5; // Minimum number of epochs between saves
$epochs_since_last_save = 0;

// Training Data
$name  = 'minst.ocr';
$path = __DIR__ . DIRECTORY_SEPARATOR . 'ANNs';
@mkdir($path, 0777);

$data =  __DIR__ . DIRECTORY_SEPARATOR . 'Training Data' . DIRECTORY_SEPARATOR . 'minst.train.data';

// Initialize pseudo mse (mean squared error) to a number greater than the desired_error
// this is what the network is trying to minimize.
$pseudo_mse_result = $desired_error * 10000; // 1
$best_mse = $pseudo_mse_result; // keep the last best seen MSE network score here

// Initialize ANN
$num_input = 392;
$num_output = 10;

$hidden_layers = array(1=>512 // First Hidden Layer - 512 neurons
                       // Add More layers as needed (don't forget the commas)
                      );

$layers = array($num_input, $hidden_layers, $num_output);
$layers = FlattenANNLayers($layers);
$num_layers = count($layers);

// Create ANN
$ann = fann_create_standard_array($num_layers, $layers);


if($ann){
  
  $log = fopen($path . DIRECTORY_SEPARATOR . "traning_save_log.$name.txt", 'w');

  
  // Configure the ANN
  fann_set_activation_function_hidden($ann, FANN_SIGMOID_SYMMETRIC); // FANN_SIGMOID_SYMMETRIC
  fann_set_activation_function_output($ann, FANN_SIGMOID_SYMMETRIC); // FANN_SIGMOID_SYMMETRIC
 
 echo 'Loading data from:'. $data . PHP_EOL;

  // Read training data
  $train_data = fann_read_train_from_file($data);
  
  echo 'Training ANN... '. $name . PHP_EOL;
  
  echo "Inputs: $num_input" . PHP_EOL;
  echo 'Hidden Layers: ' . count($hidden_layers) . PHP_EOL;
  foreach($hidden_layers as $hidden_layer=>$neuron_count){
      echo "H$hidden_layer: $neuron_count" . PHP_EOL;
  }
  echo "Outputs: $num_output" . PHP_EOL;
  echo str_repeat('-', 50) . PHP_EOL;
   
 
 
  // Check if pseudo_mse_result is greater than our desired_error
  // if so keep training so long as we are also under max_epochs
  while(($pseudo_mse_result > $desired_error) && ($current_epoch <= $max_epochs)){
      $current_epoch++;
      $epochs_since_last_save++; 
     
      // See: http://php.net/manual/en/function.fann-train-epoch.php
      // Train one epoch
      //
      // One epoch is where all of the training data is considered
      // exactly once.
      //
      // This function returns the MSE error as it is calculated
      // either before or during the actual training. This is not the
      // actual MSE after the training epoch, but since calculating this
      // will require to go through the entire training set once more.
      // It is more than adequate to use this value during training.
      $pseudo_mse_result = fann_train_epoch($ann, $train_data);
      
      echo "$name " . $current_epoch . ' : ' . $pseudo_mse_result . PHP_EOL; // report
       
      // If we haven't saved the ANN in a while...
      // and the current network is better then the previous best network
      // as defined by the current MSE being less than the last best MSE
      // Save it!
      if(($epochs_since_last_save >= $epochs_between_saves) && ($pseudo_mse_result < $best_mse)){
       
        $best_mse = $pseudo_mse_result; // we have a new best_mse
       
        // Save a Snapshot of the ANN
        fann_save($ann, $path . DIRECTORY_SEPARATOR . "$name.train.net");
        echo "Saved $name ANN." . PHP_EOL; // report the save
        $epochs_since_last_save = 0; // reset the count
        
        fwrite($log, $pseudo_mse_result . PHP_EOL);
      }
 
  } // While we're training

  echo 'Training Complete! Saving Final Network.'  . PHP_EOL;
 
  // Save the final network
  fann_save($ann, $path . DIRECTORY_SEPARATOR . "$name.final.net"); 
  fann_destroy($ann); // free memory
  fclose($log);
}
echo 'All Done!' . PHP_EOL;
?>

5. Test the OCR Network.

When we test the ANN we use images it has never seen so we can see how well it will do with real data that it doesn’t already know.

Test data is processed using the same convolution and pooling layers we used during training.

For ease of use we pre-processed the training data into a FANN .data file rather than doing those steps at “run time” but when actually using an ANN like this for real world purposes you will likely not want to generate a training file and instead process the data in memory before feeding it to your bot.

Testing consists of passing the input pixel data through the ANN and evaluating it’s answer compared to the known answer / label data.

I wanted to use my ANN Visualizer to show you what the structure of the MNIST OCR ANN looks like however the image is way too big to publish here! 😛

Here’s the Visualizer ANN “stats report” though:

That’s a lot of connections!

I’ve included the image in the GitHub repo “Project Images” folder if you are interested.

It is interesting to note, that I was able to achieve an 83% accuracy from the MNIST dataset without convolving and using only a single 2×2 pooling layer which reduced the images from 28×28 px to 14×14 px.

Ultimately, I settled on 1 convolution layer using 2 kernels and 1 pooling layer with a 2×2 matrix which allowed me to achieve a 94.08% accuracy.

As previously mentioned, the convolution kernels I used were identity (which is basically just the input image passed through) and the area kernel.

Considering I’ve never worked with this dataset prior to this project I am pleased with these results however they are far from “state of the art” with the top “error rate” being 0.17% or 99.83% accurate with previously unseen data.

I expect given more effort that the 94% could be improved upon.

The accuracy of 94.08% means that out of the 10,000 test images we showed the ANN, it incorrectly answered 592 times.

If we isolate the number of times the answer was wrong we can identify which numbers it struggles the most with.

MNIST OCR Error Totals by Number
0 1 2 3 4 5 6 7 8 9
46 34 47 84 45 54 29 53 98 102

We can plot this to a chart to make this information more intuitive:

MNIST OCR ANN Errors by Number
MNIST OCR ANN Errors by Number

What this tells us is that the bot does the best when identifying 6’s, 2’s and 1’s and the worst when identifying 9’s, 8’s and 3’s.

TestMnist.php:

<?php


ini_set("max_execution_time", "-1");
ini_set('memory_limit','-1');
set_time_limit(0);

//include('Functions.php');



$path = __DIR__ . DIRECTORY_SEPARATOR . 'ANNs' . DIRECTORY_SEPARATOR ;
$test_data =  __DIR__ . DIRECTORY_SEPARATOR . 'Training Data' . DIRECTORY_SEPARATOR . 'minst.test.data';


// CSV to log test results
$test_results_csv = fopen($path . 'results.csv', 'w');

// Load ANN
$ann_train_file = ($path . "minst.ocr.final.net"); // 94.08% Accuracy on test data
//$ann_train_file = ($path . "minst.ocr.train.net");   // Used for testing durining training or if a final was not saved


if (!is_file($ann_train_file)){
    die("The .net file has not been created!" . PHP_EOL);
}

$ann = fann_create_from_file($ann_train_file);

if ($ann) {
    
    // Some variable to keep track of things
    $current_input = '';
    $current_output = '';
    $current_line = 0;
    
    // Open the test data file 
    $test_file = fopen($test_data, "r"); 
    fputcsv($test_results_csv, array('Ann Answer', 
                                     'Correct Answer',
                                     'Answered Correctly',
                                     'Raw Sum',
                                     'Ideal Sum',
                                     'Variance',
                                     'Ideal Output', 
                                     'Raw Output'));


    $temp_correct_score = 0;
    
    
    // While we have not reached the end of the test data set
    while(!feof($test_file))
    {
        $data = str_replace(array(PHP_EOL, "\n", "\r"), '', fgets($test_file)); // Remove those pesky end of lines
        
        
        // If there remains data after removing the EOL
        if($data != ''){
            
            //////////////////////////////////////
            // What Type of Data is this?       //
            //////////////////////////////////////
            
            // If this is the first line in a FANN data file
            // then...
            if($current_line == 0){ // data is the header
                $type = 'Header';
            }
            // Otherwise if we can divide the current line number by two
            // and the result isn't zero... 
            elseif($current_line % 2 != 0){ // data is an input
                $type = 'Input';
                $current_input = $data;
            }
            // Otherwise the result was zero meaning that
            // this is of course...
            else{// an output
                $type = 'Output';
                $current_output = $data;
            }
            
            //////////////////////////////////////
            // If we have a complete data pair  //
            //////////////////////////////////////
            if($current_input != '' && $current_output != ''){        
            
                // Convert input string to array by using spaces as delimiters
                $input = explode(' ', $current_input);
                
                // ANN Calc inputs and store outputs in the result array
                $result = fann_run($ann, $input);
                
                // There are 10 outputs representing 
                // 0 - 9
                // [0,1,2,3,4,5,6,7,8,9]
                //
                // Which output contains the highest value? (the prediction/classification)
                $calc_digit = max($result); 
                
                // Look up the position of the Highest value in the array
                // it's key is the selection, in this case the actual digit
                // but it could be cat/no cat, dog/cat, red/green/blue,
                // lat/long, credit history, image contains license place yes/no,
                // etc... whatever "classification" you assign the output to mean.
                // as long as there is correlation between inputs and outputs... this
                // should generally hold true so long as you process and train your 
                // model properly, though some systems can be incredibly complex
                // requiring multiple layers of processing and stacked network layers
                // which are the so called "deep" neural networks.
                
                $ann_answer = array_search($calc_digit, $result);// The ANN answer
                $raw_sum = array_sum($result);
                $raw_output = implode(' ', $result);
                
                // The correct answer is:
                $ideal_output = $current_output;
                $current_output = explode(' ', $current_output);
                $calc_digit = max($current_output);
                $correct_answer =  array_search($calc_digit, $current_output);
                $ideal_sum = array_sum($current_output);
                
                // Did the ANN answer correctly?
                $answered_correctly = -1;
                if($ann_answer == $correct_answer){
                    $answered_correctly = 1;
                }
                
                // Very roughly how far off were all the answers
                $variance = $ideal_sum - $raw_sum;
                
                // Log results to CSV for data science
                // happy fun times later!                    
                fputcsv($test_results_csv, array($ann_answer, 
                                                 $correct_answer,
                                                 $answered_correctly,
                                                 $raw_sum,
                                                 $ideal_sum,
                                                 $variance,
                                                 $ideal_output, 
                                                 $raw_output));
                                                 
                if($answered_correctly == 1){
                    $temp_correct_score+= $answered_correctly;
                }
                
                // Reset input and output set
                $current_input = '';
                $current_output = '';
            }
            $current_line++; // Next line
        }
    }
    fclose($test_file); // Close Test file
    
    echo 'Number Correct: ' . $temp_correct_score / 100 . '%' . PHP_EOL;

    // Forcibly remove the neural network from this plane of existence
    fann_destroy($ann); 

    fclose($test_results_csv); // Close CSV
    
}else{
    die("Invalid file format" . PHP_EOL);
}



//////////////////////
// Happy Fun Times! //
//////////////////////

$errors_by_number = array(0=>0, 1=>0, 2=>0, 3=>0, 4=>0, 5=>0, 6=>0, 7=>0, 8=>0, 9=>0);

$n = 0;
if (($test_results_csv = fopen($path . "results.csv", "r")) !== FALSE){ // If we can read the results .csv file
    while (($results = fgetcsv($test_results_csv)) !== FALSE) {
        
        if($n > 0){

            /*
            $results[i] keys:
            
            key 0 = ann_answer, 
            key 1 = correct_answer,
            key 2 = answered_correctly,
            key 3 = raw_sum,
            key 4 = ideal_sum,
            key 5 = variance,
            key 6 = ideal_output, 
            key 7 = raw_output
            */
            // Check if ANN answer matched the correct answer
            if($results[0] != $results[1]){
                // What wrong answer (which number) was given
                // Log it
                $errors_by_number[$results[0]] += 1;
            }
        }
        
        $n++;
    }

    fclose($test_results_csv);

}

$errors_by_number_csv = fopen($path . 'errors_by_number.csv', 'w');
$results = fputcsv($errors_by_number_csv, array('0','1','2','3','4','5','6','7','8','9'));
$results = fputcsv($errors_by_number_csv, $errors_by_number);
fclose($errors_by_number_csv);



echo 'All Done!';

 

Where to go from here

  • Download the project from GitHub (link below).
  • Practice with this project and ask questions on things you don’t understand.
  • Everything hasn’t been optimized to use as few system resources as possible and that would improve performance.
  • Pooling doesn’t have a variable stride and cannot handle sizes that result in the matrix overlapping the edge of the image. Adding this would greatly add additional robustness to this bot.
  • Currently answers are determined correct or incorrect using a Max() on the output neurons to determine which is the highest which is the ANN answer but a SoftMax() would give us a list from most likely to least likely which is arguably better if not at least more interesting! 😛
  • You can/should review the other neural network examples I’ve created.
  • Consider making your own datasets to practice with or finding some on kaggle.com (not a sponsor) and experimenting with someone else’s dataset like we did here today.

Free Download on GitHub:

As always you can download this entire project including the dataset, code and images for free on my GitHub profile.

GitHub: MNIST OCR Repo

So… I guess that’s it for today. I truly hope this benefits you and that you enjoyed this post!

Questions, Comments? Leave them below and thanks for reading!


If you like what I do and want more content like this then please consider supporting me on Patreon for as little as $1 a month and cancel any time!

But if all you can do is, Like, Share, Comment and Subscribe, well… that’s cool too!

Much Love,
~Joy