Size is relative and statements like “too big” or “too small” (cough TWSS 😛 ) are not immutable properties of the universe but rather simply a matter of perspective. Change your perspective just a little Goldilocks and you might find that your neural network thinks your data porridge tastes “just right”!

Err… what I mean is… we’re going to look at how to scale your data so that you can feed it to your neural network without causing it (or you) to have any unpleasant indigestion.

Specifically, lets say you have a very important series of values (including strings of text) that you want to use as part of your training dataset but they don’t fit within the yummy-data range of -1 to 1… so what do you do?

Well honey bee, the short’n sweet answer is that as long as we scale the data “uniformly” (which simply means maintaining the ratio of “distance” between ALL values in our dataset) we can easily make our numbers any “size” we want and the neural network can just shut-up and eat what we cooked for dinner, damn it!

Now, before I go into how we scale our data, I’ll present today’s wallpaper which is made from 100% ground up digital stuff and is thematically based on today’s topic. 🙂

Scaling Data

Here’s today’s wallpaper:

Scaling Data Wallpaper
Scaling Data Wallpaper

And to go along with the wallpaper I’ll be providing several sample datasets for us to use today! 🙂

Here’s One of Our Dataset:

340.671 264.910 479.062 307.336 444.717 318.448 276.021 530.580
400.580 493.438 312.009 399.866 618.438 455.828 538.661 603.311
504.316 389.158 554.823 527.723 623.438 512.723 407.009 530.624
427.633 643.050 444.717 512.733 410.307 530.580 400.580 493.438
629.152 399.866 568.438 455.828 538.661 450.454 504.316 492.016
609.109 527.723 573.437 512.723 609.109 527.723 573.437 512.723

Now, you might be asking yourself, were did these woefully wonderful numbers come from since they don’t quite look like the numbers in my database? Hey look, they’re just numbers for us to use on this project, alright?!

I mean, It’s not like I used the cryptographically secure randomly rotating connection weights of a neural network like a “one-time pad” to encrypt a super-duper secret message as a… “find the key – I hid it, now you find it!” challenge to the creepy crypto cryptographers in the audience because… well, I mean… that would be… cryptic and possibly even a little “mysterious” (maybe like a Warehouse, or a Mathematical Game of Life or maybe even like a Blogging Award …right?

But these numbers in the “totally-not-a-secret-message” above could say ANYTHING although you’ll never be able to decipher it, NEVER!

Phuck “The Quantum”, its secure! You have no way of EVER knowing what those numbers mean!!

Gee, what could it be?? Do they lead to more numbers? An algorithm? A P = NP proof? The mother of all bitcoin addresses??????? #NotSatoshi

I promise you it’s a secret and I will take it with me to my grave and it’s going to eat at you for the rest of your life and even on your death bed, whether before me or a thousand years after I’m long dead and gone, you will still be wondering what my message said and… YOU WILL NEVER KNOW!

Um… what I mean is that these numbers above can represent items in your warehouse inventory or stock market prices or even proteins in a gene sequence. 😛

Look, the sane among you will probably not get too caught up on where the numbers can or should come from and instead focus on the important thing which is that you have a value that you want to use to teach a neural network how to sit up, roll over and play fetch…. er, something like that.

So, before we get to the neural network stuff, let’s handle the special case of converting text to a number.

Converting Text to a Number

If your data is already in number form you can skip this step but since most (but not all) bots exclusively prefer numbers, it’s up to us to convince them to eat strings of text if that’s what we have to work with and the way we do that is to substitute non-numeric values with known numeric values and create a dictionary/look-up table.

Firstly, there is no one right way to do this and your exact method will vary depending on your use case and to some extent the type of data you are working with and sometimes you may need/want to convert individual characters, symbols or semiotics into numbers but the concept is applied just the same when working with words, emotions, phrases, pictures of cats or when trying to crack “the nuclear codes” so that you can launch the North Korean ICBM’s into an L3 Lagrange point “parking” orbit where you plan to setup an Orion Pusher Plate refueling propellant depot called Joy’s Fissile Lube Gas’n Coffee with the intent to significantly mark up the price of the nuclear impulse fuel units, not to mention chips, coffee and soda!

You don’t ask me where the overpriced fuel comes from and I wont ask you who you are, what you’re haulin’ or what planetoid you intend to set that sad rusty excuse of a spaceship down on. Oh, and just in case you intend to put a hole in my back with your “laser-blasters” when I turn around so you can rob me and take all of my previously illicitly acquired North Korean fuel units, just know that the station will be equipped with a dead woman’s switch and is constantly monitoring my vitals and in the event of my death, all the fuel units have been rigged to blow and I will most certainly take you and your ship with me!

Now, can I interest any of you in a dangly air freshener with a red countdown clock on it? I have both pine and cherry scented.

Hmmm… so anyway, because use case varies a lot I can’t show you every possible way to go about doing this. Instead lets just use a list of words in a string of text and convert the words to numbers without including duplicates which allows us to “equate” a unique non-numeric input with a unique number.

So like, because I’m nice or something I’ll present code that does this in both PHP and JavaScript and explain it after but of course you can also do this by hand if you only have a few string values or you have nothing better to do.

PHP String to Number Example:

<?php
$data_as_a_string = "Red Red Adenosine Red Red Hamburger Orange Cetirizine Yellow Theophylline Green Chlorophyll Blue Thiazine Indigo Theobromine Violet Phenazinium Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Human Cat Dog Rat Mouse Horse Cow Pig Goat Chicken Duck Camel Oyster Hot-dog Potato Popcorn Pumpkin Parsnip Pizza Pansies Primrose Petunia Peony Passionflower Tulip Tonsil Fish Pants Shorts Short-Shorts Skirts Short-Skirts Shirts Sandals Backpacks Item_22 Item_42 Item_49 Item_384 Water Coffee Tea Soda-A Soda-B Soda-5000 Green-Soda Blue-Soda Red-Wine White-Wine Orange-Soda Grapefruit Mud Bear Kitten Glitter Mittens";

$data_as_an_array = preg_split("/[\s]+/", $data_as_a_string);

$unique_data = array_values(array_unique($data_as_an_array));

print_r($unique_data);

?>

JavaScript String to Number Example:

<script>
var data_as_a_string = "Red Red Adenosine Red Red Hamburger Orange Cetirizine Yellow Theophylline Green Chlorophyll Blue Thiazine Indigo Theobromine Violet Phenazinium Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Human Cat Dog Rat Mouse Horse Cow Pig Goat Chicken Duck Camel Oyster Hot-dog Potato Popcorn Pumpkin Parsnip Pizza Pansies Primrose Petunia Peony Passionflower Tulip Tonsil Fish Pants Shorts Short-Shorts Skirts Short-Skirts Shirts Sandals Backpacks Item_22 Item_42 Item_49 Item_384 Water Coffee Tea Soda-A Soda-B Soda-5000 Green-Soda Blue-Soda Red-Wine White-Wine Orange-Soda Grapefruit Mud Bear Kitten Glitter Mittens";

var data_as_an_array = data_as_a_string.split(/[\s]+/);var unique_data = Array.from(new Set(data_as_an_array));

for(var key = 0; key < unique_data.length; key++){
  console.log('[' + key + '] => '  + unique_data[key]);
}
</script>

Results

[0] => Red
[1] => Adenosine
[2] => Hamburger
[3] => Orange
[4] => Cetirizine
[5] => Yellow
[6] => Theophylline
[7] => Green
[8] => Chlorophyll
[9] => Blue
[10] => Thiazine
[11] => Indigo
[12] => Theobromine
[13] => Violet
[14] => Phenazinium
[15] => Mercury
[16] => Venus
[17] => Earth
[18] => Mars
[19] => Jupiter
[20] => Saturn
[21] => Uranus
[22] => Neptune
[23] => Human
[24] => Cat
[25] => Dog
[26] => Rat
[27] => Mouse
[28] => Horse
[29] => Cow
[30] => Pig
[31] => Goat
[32] => Chicken
[33] => Duck
[34] => Camel
[35] => Oyster
[36] => Hot-dog
[37] => Potato
[38] => Popcorn
[39] => Pumpkin
[40] => Parsnip
[41] => Pizza
[42] => Pansies
[43] => Primrose
[44] => Petunia
[45] => Peony
[46] => Passionflower
[47] => Tulip
[48] => Tonsil
[49] => Fish
[50] => Pants
[51] => Shorts
[52] => Short-Shorts
[53] => Skirts
[54] => Short-Skirts
[55] => Shirts
[56] => Sandals
[57] => Backpacks
[58] => Item_22
[59] => Item_42
[60] => Item_49
[61] => Item_384
[62] => Water
[63] => Coffee
[64] => Tea
[65] => Soda-A
[66] => Soda-B
[67] => Soda-5000
[68] => Green-Soda
[69] => Blue-Soda
[70] => Red-Wine
[71] => White-Wine
[72] => Orange-Soda
[73] => Grapefruit
[74] => Mud
[75] => Bear
[76] => Kitten
[77] => Glitter
[78] => Mittens

I don’t know about you friend but I’m definitely in the market for some tulip tonsil fish pants and you would not believe how hard I’ve tried to obtain some mud bear kitten glitter mittens, so if you happen to know where I can acquire some, please leave a comment below. 😉

Anyway, what this code does is use RegEx to find “white space” (basically spaces & tabs) as boundaries for words in a string and then places them (including duplicates) into an array.

Then, using language specific methodologies, a new array is created that only contains the first instance of each unique string value. This results in a one to one relationship with each string value having a unique number with the number being the array key which you would substitute in your dateset in place of the word.

So instead of asking your neural network if it would like a Coffee, you’d ask if it wants a 63 and instead of a Blue-Soda you’d ask for a sixty… um.. I mean, where your dataset says Item_42 it would instead say 59.

Those “Green Pants”… they become a 7 50, respectively. Isn’t that easy? 🙂

If you would like a more capable method that will accommodate more complex punctuation in strings, check out the Tokenize() function I wrote for my Tokenizing & Lexing Natural Language post. 😉

Scaling a Number

With your strings converted to numbers, lets look at scaling our “totally-not-a-secret-message” sample numbers into a necessary range of -1 to 1.

As before I will present the code in PHP and JavaScript and then we can discuss whats going on after.

Scale Number PHP:

<?php

function Scale($dataset, $min_scaled_value, $max_scaled_value){

    $min_value = min($dataset);
    $max_value = max($dataset);

    foreach($dataset as &$n){
        $n = ($max_scaled_value - $min_scaled_value) * ($n - $min_value) / ($max_value - $min_value) + $min_scaled_value;
    }
    
    return $dataset;
}

// our dataset
$unscaled_dataset = array(340.671, 264.910, 479.062, 307.336, 444.717, 318.448, 276.021, 530.580, 400.580, 493.438, 312.009, 399.866, 618.438, 455.828, 538.661, 603.311, 504.316, 389.158, 554.823, 527.723, 623.438, 512.723, 407.009, 530.624, 427.633, 643.050, 444.717, 512.733, 410.307, 530.580, 400.580, 493.438, 629.152, 399.866, 568.438, 455.828, 538.661, 450.454, 504.316, 492.016, 609.109, 527.723, 573.437, 512.723, 609.109, 527.723, 573.437, 512.723);

$unscaled_min = min($unscaled_dataset); // smallest value in the dataset
$unscaled_max = max($unscaled_dataset); // largest value in the dataset

$scaled_dataset = Scale($unscaled_dataset, -1, 1); // scale data to a range of -1 to 1
$descaled_dataset = Scale($scaled_dataset, $unscaled_min, $unscaled_max); // scale data back to original range

// output scaled data 
echo 'Scaled data: ' . PHP_EOL;
for($key = 0; $key < count($scaled_dataset); $key++){
  echo '[' . $scaled_dataset[$key] . '] => '  . $unscaled_dataset[$key] . PHP_EOL;
}

// output descaled data 
echo 'Descaled data: ' . PHP_EOL;
for($key = 0; $key < count($descaled_dataset); $key++){
  echo '[' . $descaled_dataset[$key] . '] => '  . $scaled_dataset[$key] . PHP_EOL;
}

?>

Scale Number Javascript:

<script>

function Scale(dataset, min_scaled_value, max_scaled_value){

    var dataset = dataset.slice(); // no references to the array passed to the Scale() function
    
    var min_value = Math.min(...dataset);
    var max_value = Math.max(...dataset);
  
    dataset.forEach((n, key) => 
      dataset[key] = ((max_scaled_value - min_scaled_value) * (n - min_value) / (max_value - min_value) + min_scaled_value)
    );
      
    return dataset;
}


// our dataset
var unscaled_dataset = [340.671, 264.910, 479.062, 307.336, 444.717, 318.448, 276.021, 530.580, 400.580, 493.438, 312.009, 399.866, 618.438, 455.828, 538.661, 603.311, 504.316, 389.158, 554.823, 527.723, 623.438, 512.723, 407.009, 530.624, 427.633, 643.050, 444.717, 512.733, 410.307, 530.580, 400.580, 493.438, 629.152, 399.866, 568.438, 455.828, 538.661, 450.454, 504.316, 492.016, 609.109, 527.723, 573.437, 512.723, 609.109, 527.723, 573.437, 512.723];

var unscaled_min = Math.min(...unscaled_dataset); // smallest value in the dataset
var unscaled_max = Math.max(...unscaled_dataset); // largest value in the dataset

var scaled_dataset = Scale(unscaled_dataset, -1, 1); // scale data to a range of -1 to 1
var descaled_dataset = Scale(scaled_dataset, unscaled_min, unscaled_max); // scale data back to original range

// output scaled data 
console.log('Scaled data:');
for(key = 0; key < scaled_dataset.length; key++){
  console.log('[' + scaled_dataset[key] + '] => '  + unscaled_dataset[key]);
}

// output descaled data 
console.log('Descaled data:');
for(key = 0; key < descaled_dataset.length; key++){
  console.log('[' + descaled_dataset[key] + '] => '  + scaled_dataset[key]);
}

</script>

Results:

Scaled data:
[-0.5992965568308035] => 340.671
[-1] => 264.91
[0.13265986142698494] => 479.062
[-0.7756069180726715] => 307.336
[-0.04899243666366959] => 444.717
[-0.7168350346432539] => 318.448
[-0.9412334056169673] => 276.021
[0.40514095308615894] => 530.58
[-0.282435076955625] => 400.58
[0.20869519225683608] => 493.438
[-0.7508912043158619] => 312.009
[-0.28621145607446985] => 399.866
[0.8698259903739358] => 618.438
[0.009774157719363075] => 455.828
[0.44788173692283273] => 538.661
[0.789818585708997] => 603.311
[0.2662294388321784] => 504.316
[-0.3428465647643729] => 389.158
[0.5333633045961814] => 554.823
[0.39003014756439414] => 527.723
[0.8962712222986198] => 623.438
[0.31069445179034205] => 512.723
[-0.2484317977468662] => 407.009
[0.4053736711270961] => 530.624
[-0.13935050510392988] => 427.633
[1] => 643.05
[-0.04899243666366959] => 444.717
[0.3107473422541913] => 512.733
[-0.23098852276934456] => 410.307
[0.40514095308615894] => 530.58
[-0.282435076955625] => 400.58
[0.20869519225683608] => 493.438
[0.9264928333421489] => 629.152
[-0.28621145607446985] => 399.866
[0.6053736711270958] => 568.438
[0.009774157719363075] => 455.828
[0.44788173692283273] => 538.661
[-0.0186491775532871] => 450.454
[0.2662294388321784] => 504.316
[0.20117416829745616] => 492.016
[0.8204844766488606] => 609.109
[0.39003014756439414] => 527.723
[0.6318136140053952] => 573.437
[0.31069445179034205] => 512.723
[0.8204844766488606] => 609.109
[0.39003014756439414] => 527.723
[0.6318136140053952] => 573.437
[0.31069445179034205] => 512.723
Descaled data:
[340.671] => -0.5992965568308035
[264.91] => -1
[479.062] => 0.13265986142698494
[307.336] => -0.7756069180726715
[444.717] => -0.04899243666366959
[318.448] => -0.7168350346432539
[276.021] => -0.9412334056169673
[530.58] => 0.40514095308615894
[400.58] => -0.282435076955625
[493.438] => 0.20869519225683608
[312.009] => -0.7508912043158619
[399.866] => -0.28621145607446985
[618.438] => 0.8698259903739358
[455.828] => 0.009774157719363075
[538.661] => 0.44788173692283273
[603.311] => 0.789818585708997
[504.3159999999999] => 0.2662294388321784
[389.158] => -0.3428465647643729
[554.823] => 0.5333633045961814
[527.723] => 0.39003014756439414
[623.438] => 0.8962712222986198
[512.723] => 0.31069445179034205
[407.009] => -0.2484317977468662
[530.624] => 0.4053736711270961
[427.633] => -0.13935050510392988
[643.05] => 1
[444.717] => -0.04899243666366959
[512.733] => 0.3107473422541913
[410.307] => -0.23098852276934456
[530.58] => 0.40514095308615894
[400.58] => -0.282435076955625
[493.438] => 0.20869519225683608
[629.152] => 0.9264928333421489
[399.866] => -0.28621145607446985
[568.438] => 0.6053736711270958
[455.828] => 0.009774157719363075
[538.661] => 0.44788173692283273
[450.454] => -0.0186491775532871
[504.3159999999999] => 0.2662294388321784
[492.016] => 0.20117416829745616
[609.109] => 0.8204844766488606
[527.723] => 0.39003014756439414
[573.437] => 0.6318136140053952
[512.723] => 0.31069445179034205
[609.109] => 0.8204844766488606
[527.723] => 0.39003014756439414
[573.437] => 0.6318136140053952
[512.723] => 0.31069445179034205

As our results show it’s quite easy to scale any value, even strings (after converting them to a number) to a range that a neural network can accept but… why does this work?

Well, to help us understand lets walk through a set of scaling calculations together and lets just use 1 to 10 as our dataset to make it easier to follow because logic knows with me explaining it, you’ll need all the help you can get! 😛

Additionally, lets scale them by a factor of 10 (a nice round number) which will change one to a ten and ten to a one-hundred.

Here’s our equation again for reference:

scaled_value = ((max_scaled_value - min_scaled_value) * (n - min_value) / (max_value - min_value) + min_scaled_value)

And in order to calculate our equation we need to derive some values from our dataset.

Here’s our simplified dataset:
[1,2,3,4,5,6,7,8,9,10]

Variable Value Definition
max_value 10 Largest value in the dataset
min_value 1 Smallest value in the dataset
max_scaled_value 100 The largest value we want in our dataset
min_scaled_value 10 The smallest value we want in our dataset

So given all of these definitions we can start with figuring out the size of the range of numbers we will be resizing to.

So, we take max_scaled_value – min_scaled_value to get our scaled range and this is like measuring the space we want our number futon to fit inside before we buy it and try to carry it up the stairs by ourselves. Seriously, just get a couple of six-packs or a few bottles of wine and some pizzas and beg your non-existent friends to help!

//scaled_range_size = max_scaled_value - min_scaled_value;
scaled_range_size = 100 - 10; // = 90

Next, we will take the value we are currently scaling from our dataset and subtract from it the smallest value in our dataset and we’ll just call this new value theta because I’m being obtuse… hahahahaha obscure and stupid math jokes! 😛

//theta = n - min_value
theta = 5 - 1; // if n = 5 (5 - 1 = 4) theta = 4

Now, we’ll take maximum value of our dataset and subtract from it the minimum value of our dataset and if I were to make another stupid math joke at this juncture I feel like I would be eating chopped red herrings mixed with billy goats under a damp bridge and screaming at every passerby “WHO’S THAT CLIP CLOPPING OVER MY BRIDGE?!?!?!” so… lets just call this value RedBilly in honor of my latest gruff barbacoa meal!

//RedBilly = max_value - $min_value
RedBilly = 10 - 1; // RedBilly = 9

RedBilly is like measuring our number futon’s original size before we scale it to fit.

Okay, so here’s where the math-magic-shrooms kick in… we ask the question, now wait for it… how much of RedBilly is theta by dividing theta with RedBilly… I know shivers, right? Eeee!!!

But we have to call this new value something and I’m inclined to call it Sigma due to the previous pretentious posited Greek naming convention and its association with “sum”, but many times when I think of sum I inevitably mentally recite “Sum Bitch!” in memory of my philandering phather who, in the words of Jane Powell was ah “Goin’ Cortin” with my female parental progenitor in a time before the world was graced with my presence. So sometimes, though rarely, when I think about him I remember being a child and sneaking into his workshop and quietly making my way over to his humble homemade two-by-four and plywood desk with a computer on it and staring at all the electronic circuits and components strewn about.

Do you know what it feels like to step on an IC Chip with small bare feet? I do! 😥

Anyway, I’d investigate all the books and papers on the shelves and all the other knick-knack paddy-whack bric-a-brac, paraphernalia and accoutrements (read that last word with a French accent because it sounds way nicer than it does in English 😉 ) in his workspace.

One unusual piece stood out and I asked him about it… it was a stuffed pickle plushy with long thin dangly legs and oversized stuffed orange sneakers made by Amtoy. It wore a cape with SP (which stood for “Super Pickle” ) emblazoned in a shield draped about it’s neck. He said he won it at a carnival game for my mother when they were dating.

I asked him what it’s name was and he said… “RAMDOS” after his favorite Disk Operating System and an internal component of computers… sigh… so original! :-/

Later, after becoming an adult I learned that there was a psychedelic psychologist by the name of “Ram Dass” who operated in the Berkeley and Harvard crowds, who sadly just barely missed the pandemic and all the extra-special fun-time we just had last year.

Anyway, Ram Dass was heavily involved with the figuratively-literal dormouse Timothy Leary who according to Jefferson Airplane, had proclaimed “feed your head” and oh, boy oh boy, did he take that advice (both my dad and Ram Dass)! Which I guess looking back at it all now totally explains his EXTREMELY DEEP knowledge of the 1960’s American psychedelic sub-culture! I mean, his favorite song was unironically In-A-Gadda-Da-Vida for logic sake! No joke!

And if you’ve never had the chaotic pleasure of this rather bizarre indulgence of ancient “Americana” I’ll publish a link to the original black and white full 17 minute single-song music video here:

Let me tell you now honey, that is some serious medieval bangs of a haircut! 😛

Anyway, by 6 minutes and 24 seconds into In-A-Gadda-Da-Vida all the instruments fall away except for Ron Bushy on drums playing thee quintessential fiery psychedelic drum solo for like 2 minutes and 36 seconds. Nine minutes in, Ron returns to the original drum beat. Starting with the electric organ at 9:08 and then the guitars joining in at 10 minutes and 51 seconds, the stoners all continue skillfully though chaotically meandering around the melody which some may say is in the same vein as that time Handful of Peter lost the talent show:

But then… almost as if Iron Butterfly had planned it from the start, 12 minutes and 39 seconds into the song, the musicians officially about-face and more or less in full unison begin the crescendo of In-A-Gadda-Da-Vida and officially start down the path to the outro closer to the 15 minute mark.

Now after re-watching In-A-Gadda-Da-Vida I wonder if Mr. Bushy’s performance there was the inspiration for Jim Henson’s Muppet, Animal?

Hard In-A-Gadda-Da-Vida inspired tangential digression leading to an outro, since this IS going somewhere… one day my poor mother dragged all of us kids to her church and my abusive sum bitch phather persisted at home where he got high on his perpetually prescribed opioids for his “migraines” and burned down our house while trying to cook for himself in a stoned stupor.

Super Pickle was among the casualties that died that day and it was many years before I had a home again. 😥

So rather than Sigma for sum, lets call this variable SuperPickle in honor of the ashes of my lost childhood.

//SuperPickle = theta / RedBilly;
SuperPickle = 4 / 9; // SuperPickle = 0.44444444444444

Now that we have SuperPickle all that is left to do is take our scaled_range_size and use SuperPickle as like… a “collimating factor”, kinda like adjusting a knob so that the “image” of the scaled number resolved/created by this action is not fuzzy because basically SuperPickle is like a percentage of our range and by multiplying the scaled_range_size with SuperPickle we take the whole range and lop off the portion larger than the scaled number within the scaled size range.

//scaled_value = scaled_range_size * SuperPickle;
scaled_value = 90 * 0.44444444444444; // scaled_value = 40

Now, because momma always said… “there’ll be days like this”, “knock you out”, “don’t play ball in the house” and “measure twice cut once”, well… mostly just due to that last one actually, we have to add the min_scaled_value to the scaled_value because otherwise it will be smaller than it should be… remember momma did say measure twice and what is interesting is that if the number futon is on the lower range of our dataset it will likely be occupying space inside the metaphorical “wall” (i.e.  less than min_scaled_value) so by adding min_scaled_value we make sure that the value is the correct size and where it’s supposed to be.

//scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value;
scaled_value = (90 * 0.44444444444444) + 10; // value 5 becomes scaled_value = 50

So now, here’s some PHP and JavaScript code that will demonstrate this fully along with the results so you can like mess around with it n’stuff. I recommend changing the data values (they don’t need to be in sequential order) and also playing with scale by changing the min_scaled_value and max_scaled_value and obviously because we can go from essentially any size to any size, the range you size scale to does not need to be the same size as the range you are scaling from as is demonstrated when scaling down to -1 to 1 or descaling the scaled values back to the original range.

PHP:

<?php
$data = [1,2,3,4,5,6,7,8,9,10];

$max_value = max($data); // 10
$min_value = min($data); // 1

// Scale dataset by a factor of 10
$max_scaled_value = $max_value * 10; // 10 * 10
$min_scaled_value = $min_value * 10; // 1 * 10

foreach($data as $value){
	
	$scaled_range_size = $max_scaled_value - $min_scaled_value;
	$theta = $value - $min_value;
	$RedBilly = $max_value - $min_value;
	$SuperPickle = $theta / $RedBilly;
	$scaled_value = $scaled_range_size * $SuperPickle;

	echo "value[$value]" . PHP_EOL;
	echo "scaled_range_size = (max_scaled_value - min_scaled_value) = ($max_scaled_value - $min_scaled_value) = " . ($scaled_range_size) . PHP_EOL;
	echo "theta = (value - min_value) = ($value - $min_value) = " . ($theta) . PHP_EOL;
	echo "RedBilly = (max_value - min_value) = ($max_value - $min_value) = " . ($RedBilly) . PHP_EOL;
	echo "SuperPickle = (theta / RedBilly) = ($theta / $RedBilly) = " . ($SuperPickle) . PHP_EOL;
	echo "scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value = ($scaled_range_size * $SuperPickle) + $min_scaled_value = " . (($scaled_range_size * $SuperPickle) + $min_scaled_value) . PHP_EOL . PHP_EOL;
}
?>

JavaScript:

var data = [1,2,3,4,5,6,7,8,9,10];

var max_value = Math.max(...data); // 10
var min_value = Math.min(...data); // 1

// Scale dataset by a factor of 10
var max_scaled_value = max_value * 10; // 10 * 10 = 100
var min_scaled_value = min_value * 10; // 1 * 10 = 10

data.forEach(function(value) {
	
	var scaled_range_size = max_scaled_value - min_scaled_value;
	var theta = value - min_value;
	var RedBilly = max_value - min_value;
	var SuperPickle = theta / RedBilly;
	var scaled_value = scaled_range_size * SuperPickle;
	
	console.log("value["+value+"]");
	console.log("scaled_range_size = (max_scaled_value - min_scaled_value) = ("+max_scaled_value+" - "+min_scaled_value+") = " + scaled_range_size);
	console.log("theta = (value - min_value) = ("+value+" - "+min_value+") = " + theta);
	console.log("RedBilly = (max_value - min_value) = ("+max_value+" - "+min_value+") = " + RedBilly);
	console.log("SuperPickle = (theta / RedBilly) = ("+theta+" / "+RedBilly+") = " + SuperPickle);
	console.log("scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value = ("+scaled_range_size+" * "+SuperPickle+") + "+min_scaled_value+" = " + ((scaled_range_size * SuperPickle) + min_scaled_value));
    console.log("\n");
});

Results:

value[1]
scaled_range_size = (max_scaled_value - min_scaled_value) = (100 - 10) = 90
theta = (value - min_value) = (1 - 1) = 0
RedBilly = (max_value - min_value) = (10 - 1) = 9
SuperPickle = (theta / RedBilly) = (0 / 9) = 0
scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value = (90 * 0) + 10 = 10

value[2]
scaled_range_size = (max_scaled_value - min_scaled_value) = (100 - 10) = 90
theta = (value - min_value) = (2 - 1) = 1
RedBilly = (max_value - min_value) = (10 - 1) = 9
SuperPickle = (theta / RedBilly) = (1 / 9) = 0.11111111111111
scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value = (90 * 0.11111111111111) + 10 = 20

value[3]
scaled_range_size = (max_scaled_value - min_scaled_value) = (100 - 10) = 90
theta = (value - min_value) = (3 - 1) = 2
RedBilly = (max_value - min_value) = (10 - 1) = 9
SuperPickle = (theta / RedBilly) = (2 / 9) = 0.22222222222222
scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value = (90 * 0.22222222222222) + 10 = 30

value[4]
scaled_range_size = (max_scaled_value - min_scaled_value) = (100 - 10) = 90
theta = (value - min_value) = (4 - 1) = 3
RedBilly = (max_value - min_value) = (10 - 1) = 9
SuperPickle = (theta / RedBilly) = (3 / 9) = 0.33333333333333
scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value = (90 * 0.33333333333333) + 10 = 40

value[5]
scaled_range_size = (max_scaled_value - min_scaled_value) = (100 - 10) = 90
theta = (value - min_value) = (5 - 1) = 4
RedBilly = (max_value - min_value) = (10 - 1) = 9
SuperPickle = (theta / RedBilly) = (4 / 9) = 0.44444444444444
scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value = (90 * 0.44444444444444) + 10 = 50

value[6]
scaled_range_size = (max_scaled_value - min_scaled_value) = (100 - 10) = 90
theta = (value - min_value) = (6 - 1) = 5
RedBilly = (max_value - min_value) = (10 - 1) = 9
SuperPickle = (theta / RedBilly) = (5 / 9) = 0.55555555555556
scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value = (90 * 0.55555555555556) + 10 = 60

value[7]
scaled_range_size = (max_scaled_value - min_scaled_value) = (100 - 10) = 90
theta = (value - min_value) = (7 - 1) = 6
RedBilly = (max_value - min_value) = (10 - 1) = 9
SuperPickle = (theta / RedBilly) = (6 / 9) = 0.66666666666667
scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value = (90 * 0.66666666666667) + 10 = 70

value[8]
scaled_range_size = (max_scaled_value - min_scaled_value) = (100 - 10) = 90
theta = (value - min_value) = (8 - 1) = 7
RedBilly = (max_value - min_value) = (10 - 1) = 9
SuperPickle = (theta / RedBilly) = (7 / 9) = 0.77777777777778
scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value = (90 * 0.77777777777778) + 10 = 80

value[9]
scaled_range_size = (max_scaled_value - min_scaled_value) = (100 - 10) = 90
theta = (value - min_value) = (9 - 1) = 8
RedBilly = (max_value - min_value) = (10 - 1) = 9
SuperPickle = (theta / RedBilly) = (8 / 9) = 0.88888888888889
scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value = (90 * 0.88888888888889) + 10 = 90

value[10]
scaled_range_size = (max_scaled_value - min_scaled_value) = (100 - 10) = 90
theta = (value - min_value) = (10 - 1) = 9
RedBilly = (max_value - min_value) = (10 - 1) = 9
SuperPickle = (theta / RedBilly) = (9 / 9) = 1
scaled_value = (scaled_range_size * SuperPickle) + min_scaled_value = (90 * 1) + 10 = 100

Doin’ it For Realsies AKA Scaling XOR

So at this point you should have a firm understanding of the math behind how number scaling works 😛 but here’s the thing quién no sabe, most neural network libraries have a built in method and will scale your data for you, including FANN.

Which basically means everything above this point is academic review for your benefit (unless you use FANN.js which I’ll explain more on shortly) except for maybe the string to number conversion because although some libraries will convert and scale strings for you, FANN does not, so… if your data is a string you must manually convert it to a number using some methodology as I demonstrate above before giving your unscaled numbers to FANN (in PHP), after which it can scale and descale your values and you can then just convert the output numbers into strings if the output is in string format.

But just to give you a simple proof of concept since I’ve already thrown so much math at you today… lets look at the simple case of an XOR neural network but where the data isn’t in the -1 to 1 range.

First the PHP version followed by a brief explanation and then I’ll talk about why I did not present a FANN scaling proof in JavaScript as well, though I will provide a workaround proof in JS.

Scaling Data PHP FANN:

<?php

// This example will use the XOR dataset with negative one represented
// as zero and one represented as one-hundred and demonstrate how to
// scale those values so that FANN can understand them and then how
// to de-scale the value FANN returns so that you can understand them.

// Scaling allows you to take raw data numbers like -1234.975 or 4502012
// in your dataset and convert them into an input/output range that
// your neural network can understand.

// De-scaling lets you take the scaled data and convert it back into
// the original range.

// scale_test.data
// Note the values are "raw" or un-scaled.
/*
4 2 1
0 0
0
0 100
100
100 0
100
100 100
0
*/

////////////////////
// Configure ANN  //
////////////////////
    
// New ANN
$ann = fann_create_standard_array(3, [2,3,1]);

// Set activation functions
fann_set_activation_function_hidden($ann, FANN_SIGMOID_SYMMETRIC);
fann_set_activation_function_output($ann, FANN_SIGMOID_SYMMETRIC);

// Read raw (un-scaled) training data from file
$train_data = fann_read_train_from_file("scale_test.data");

// Scale the data range to -1 to 1
fann_set_input_scaling_params($ann , $train_data, -1, 1);
fann_set_output_scaling_params($ann , $train_data, -1, 1);

///////////
// Train //
///////////

// Presumably you would train here (uncomment to perform training)...

// fann_train_on_data($ann, $train_data, 100, 10, 0.01);

// But it's not needed to test the scaling because the training file
// in this case is just used to compute/derive the scale range.
// However, doing the training will improve the answer the ANN gives
// in correlation to the training data.

//////////
// Test //
//////////

$raw_input = array(0, 100); // test XOR (0,100) input
$scaled_input = fann_scale_input ($ann , $raw_input); // scaled XOR (-1,1) input
$descaled_input = fann_descale_input ($ann , $scaled_input); // de-scaled XOR (0,100) input
$raw_output = fann_run($ann, $scaled_input); // get the answer/output from the ANN
$output_descale = fann_descale_output($ann, $raw_output); // de-scale the output

////////////////////
// Report Results //
////////////////////
echo 'The raw_input:' . PHP_EOL;
var_dump($raw_input);

echo 'The raw_input Scaled then De-Scaled (values are unchanged/correct):' . PHP_EOL;
var_dump($descaled_input);

echo 'The Scaled input:' . PHP_EOL;
var_dump($scaled_input);

echo "The raw_output of the ANN (Scaled input):" . PHP_EOL;
var_dump($raw_output);

echo 'The De-Scaled output:' . PHP_EOL;
var_dump($output_descale);


////////////////////
// Example Output //
////////////////////

/*
The raw_input:
array(2) {
  [0]=>
  float(0)
  [1]=>
  float(100)
}
The raw_input Scaled then De-Scaled (values are unchanged/correct):
array(2) {
  [0]=>
  float(0)
  [1]=>
  float(100)
}
The Scaled input:
array(2) {
  [0]=>
  float(-1)
  [1]=>
  float(1)
}
The raw_output of the ANN (Scaled input):
array(1) {
  [0]=>
  float(1)
}
The De-Scaled output:
array(1) {
  [0]=>
  float(100)
}
*/

Basically, with FANN PHP we just load our external datafile and then use fann_set_input_scaling_params & fann_set_output_scaling_params to configure the ANN’s ability to scale given the data range in the training data and the range we we want.

After which we can use fann_scale_input, fann_descale_input, fann_scale_output and fann_descale_output at run time.

Now, if you don’t want to bother with runtime scaling and descaling your training data (though you will still need to scale your data when deploying it to a live environment) you can use fann_scale_train_data to scale an entire training file, here’s an example of using FANN PHP to scale an unscaled training file and then save it.

How to Scale an Entire File:

<?php

// How to scale an existing unscaled training file and save it

$path = 'TrainingData' . DIRECTORY_SEPARATOR;

// Read raw (un-scaled) training data from file
$train_data = fann_read_train_from_file($path . "Training.data");

// Scale to a range of -1 to 1
fann_scale_train_data($train_data, -1, 1);

// Save the new scaled traning data as a file
fann_save_train($train_data, $path . 'ScaledTraining.data');

You can also use fann_scale_input_train_data and fann_scale_output_train_data to scale the inputs and outputs in the training data separately to different scales if your circumstances need that level of control.

Scaling Data FANN.JS

Okay, so like… JavaScript has seemingly lost it’s way. Now, I know some of you JS developers ears are bleeding and your pulse is starting to tick up and you’re feeling vulnerable and defensive.

Calm down, I’m not insulting you or your “lenguaje del amor“, only pointing out what I see.

See, modern JS developers are no longer hired for their expertise in JavaScript, they are hired for their familiarity with a specific library or framework that “obfuscates JavaScript away” or for their abilities with a product that was created by a mega-company seeking to create for itself a monopoly by turning JavaScript into TypeScript because that mega-corp has embraced the tactic of embrace extend extinguish… don’t think that the “extinguish” part won’t come.

Personally, rather than desperately try to run as far away from my language of preference into the open arms of a company that does not have your best interests at heart (either as a developer or as a consumer) because it is actively trying to make you prefer to use their “flavor” of JS and all the tools and shortcuts they want to provide…( eventually sell?) to you, I would suggest the real issue with JavaScript is simply it’s lack of decent error messaging.

Now you may suggest that it maybe that because the interpreter needs to fail “gracefully” and is built on decades of technologies cobbled together, well… I don’t disagree… it’s just that… well, as I said seemingly JavaScript lost it’s way. 😦

You don’t see anybody in the PHP community seriously trying to replace PHP with something else despite the common theme among most developers that PHP is evil… well, okay we did see Mega-Corp Bajillionaire Mark Zuckerberg try to push for Hack Lang (because it benefited his bank account) but the PHP community pretty much overwhelmingly resisted the idea… I wonder why? :-/

Okay enough proselytizing for today, just… if your a JS developer… try to think about what the real issues are with JS and ways you can help improve it that don’t include doing everything within your abilities to avoid writing JavaScript.

Now, I will say that JavaScript and neural networks is EXTREMELY geek-sexy! Like, all those “Let’s” and “Var’s” mixing with neurons… ohhh baby, that’s so hot!

Mmmm… I need a minute…… alright, I’m better.

So, the problem with FANN.js is best summed up by the developer for the language binding (Louis Stowasser) on the GitHub ReadME for FANN.js:

“…not all functions have a binding.”

~Louis Stowasser

And seemingly that is the crux of the problem with FANN.js, the bindings to the functions to set the scaling parameters on a network are missing, even though the scaling functions have bindings and if you try scaling without setting the parameters the network doesn’t know what the hell to go do with itself so it complains and issues a “FANN Error 18” which is basically FANN saying “slow down, I haven’t been configured with scaling ranges yet” so its unable to do the scaling.

As such, I am unable to properly demonstrate how to scale your data using the built in FANN.js scaling functions. Instead I will provide a modified version of the scaling function from above that will work “out of the box” with FANN.

I will be the first to admit I am wrong and if anyone knows a way to scale data using FANN.js that I am unaware of, I will absolutely print a retraction and credit the source for the information.

In the mean time, here is the working FANN.js scaling code:

<meta charset="utf-8"/>
<script async src="../fann.js"></script>
<script>

var UNSCALED_XOR_DATA = [
    [[0, 0], [0]],
    [[100, 100], [0]],
    [[0, 100], [100]],
    [[100, 0], [100]]
];

function Scale(dataset, min_scaled_value, max_scaled_value){

    var dataset = dataset.slice(); // no references to the array passed to the Scale() function

    var max_value = Math.max(...[].concat(...[].concat(...dataset)));
    var min_value = Math.min(...[].concat(...[].concat(...dataset)));
  

    // for all the unscaled data
    for (var i = 0; i < dataset.length; i++) {
        
        // for each input and output set
        for (var j = 0; j < dataset[i].length; j++) {
            
            // for each value in a set
            dataset[i][j].forEach((n, key) =>
                dataset[i][j][key] = ((max_scaled_value - min_scaled_value) * (n - min_value) / (max_value - min_value) + min_scaled_value)
            );
        }
    }  

    return dataset;
}

function XOR () {
    
    ////////////////////
    // Configure ANN  //
    ////////////////////
    
    // New ANN
    NN = FANN.create([2, 3, 1]);
    
    // Set activation functions
    NN.set_activation_function_hidden(FANN.SIGMOID_SYMMETRIC);
    NN.set_activation_function_output(FANN.SIGMOID_SYMMETRIC);
    
    
    var scaled_dataset = Scale(UNSCALED_XOR_DATA, -1, 1); // scale data to a range of -1 to 1
        
    // Read scaled training data
    var data = FANN.createTraining(scaled_dataset);
    
    NN.init_weights(data);

    ///////////
    // Train //
    ///////////
    NN.train_on_data(data, 1000, 10, 0.01);
    

    //////////
    // Test //
    //////////
    var results = "";
    results += " -1 , -1 => " + NN.run([-1, -1])[0] + '\n<br>';
    results += "-1 ,  1 => " + NN.run([-1, 1])[0] + '\n<br>';
    results += " 1 , -1 => " + NN.run([1, -1])[0] + '\n<br>';
    results += " 1 ,  1 => " + NN.run([1, 1])[0] + '\n<br>';
    
    ////////////////////
    // Output Results //
    ////////////////////
    document.getElementById('results').innerHTML = results;
}

FANN_ready = function () {
    XOR();
};
</script>
<div id='results'></div>

I have submitted an issue on the GitHub Repo and provided this code there as a temporary solution however if you have a GitHub account and would like to see the scaling bindings added to FANN.js you can “thumbs up” my comments over there and let the developer know that you would also like the scaling bindings added and maybe (just maybe) we can convince them to add them.

You can get a copy of the original FANN library in C here:  https://github.com/libfann/fann

You can get a copy of the PHP FANN library here:  https://github.com/bukka/php-fann

And you can get a copy of the FANN.js library here: https://github.com/louisstow/fann.js/

So with that, I will end this extra long post with the song White Rabbit because it’s all about scaling and things changing sizes and thematically it goes along with In-A-Gadda-Da-Vida from earlier. 😛


Poor Joy, when she was just small, because in her world… logic and proportion had fallen a sloppy dead and the white knight was talking backward and the red queen was off with her head… but a hookah smoking caterpillar has given you the call… remember what the dormouse said: “help Joy feed her head” …through Patreon. 🙂

But if all you can do is Like, Share, Comment and Subscribe, well… that’s cool too!

Much Love,

~Joy