Search

Geek Girl Joy

Artificial Intelligence, Simulations & Software

Tag

Coding

How I Built A Working AutoDoc

Ever been a lone wanderer solely surviving in the Commonwealth Wasteland only to have some random asshole raider start taking potshots at you? Well I have!

To make matters worse Bon Jovi showed up out of nowhere at the most inopportune moment  seemingly just to mock my hasty strategic withdrawal!

I used my last stimpak as I staggered through the door of my Red Rocket workshop where Dogmeat and Codsworth were waiting for me.

I changed then ate some ramen, Takahashi always makes the best pulled noodles!

I stared into the radstorm raging outside my window and in that moment I vowed:

Never again would Bon Jovi mock my pain!”

~GeekGirlJoy

I needed to build a new kind of bot and this time “General Atomics Finest” just wouldn’t cut it!

No, I needed a neural network that could monitor my vitals and automatically heal me as soon as I started taking damage!

Continue reading “How I Built A Working AutoDoc”

Advertisements

Visualizing Your FANN Neural Network

At some point you will want a diagram of your FANN neural network.

Example Diagram

Programmatically generated diagram of XOR ANN
Programmatically generated diagram of XOR ANN
Programmatically generated XOR ANN Stats
Programmatically generated XOR ANN Stats

Reasons May Include:

  • You need artwork for your fridge or cubical and Van Gogh’s Starry Night was mysteriously unavailable!
  • You want an illustration to help potential investors understand some of the technical aspects of how your AI startup works.
  • You’re trying to convince the good people who enjoy your work to throw gobs of cash at your Patreon. 😛

But.. Your exact reasons may very! 😉

None the less, read on because I’m giving you 100% free & fully functional code and explaining how it works.

I’m not even asking for your email address!

 

Continue reading “Visualizing Your FANN Neural Network”

DUI Bot

After watching a video of a field sobriety test I knew I had to build a neural network that could pass it!

 

Continue reading “DUI Bot”

SVG Platformer

Can you go from art to program in one or two steps? Well, that’s what today’s post is about.

One of the cool things I remember about web development from years ago was Adobe Flash.

Before you boo me, hear me out!

I’m not saying Flash is a better technologically than HTML5, The Name of Your Favorite JavaScript Framework, CSS, Web Assembly etc… However one place Flash excelled was visual design & layout… an important part of the web!

The problem with many modern tools isn’t that they can’t convey design, it’s that they decouple the design and the development processes!

In practice this means writing code to describe the elements of your software like HTML and later writing more code to style the elements (like CSS, SASS or LESS), none of which is actually visual, though you definitely can get some great results!

Flash Builder (or whatever it was called) was half art studio and half IDE (Integrated Development Environment) where you could draw anything and it was an “object” and you could write code (ActionScript) to control it’s behavior. It wasn’t a mockup or illustration, it was the actual program!

As I recall, once the switch to ActionScript 3 was made the ability to store your code on the objects themselves was depreciated in favor of using references and listeners stored in the main keyframe timeline… I preferred keeping my code on the objects themselves but I digress.

Even with the change to where you stored your code you could still accomplish anything you wanted with the centralized keyframe code and some developers even found this easier to maintain than storing the code on the components.

You would setup your scripts on layered keyframe’s that extended to the last keyframe used in the project, or the last frame that needed that code and by using a sort of “goto keyframe name or id” method you could actually build complex applications quite easily, and more importantly… visually!

That’s why all the games used to be made with Flash, you could basically draw a picture and then turn it into an animation or even a full program in a couple of hours. This meant you were free to experiment & push boundaries.

Now, yes of course there are visual workflows you can use today.There are WYSIWYG editors and CMS App Platforms like WordPress, Drupal & Joomla not to mention the full featured layout capabilities of site builder tools like Wix.

Fundamentally though these tools facilitate laying out HTML elements and applying CSS and maybe some JavaScript via a drag and drop interface. Which is significantly faster than doing visual development via code in my opinion, though I am not arguing it is inherently “better”.

Unlike the aforementioned tools which specialize in “page based” HTML applications, Flash was an element or object that you embed into your page that used Vector Graphics to create lossless re-sizable images, animations and applications.

Inside the Flash movie/app you could draw anything and you were not constrained to HTML elements but you were also not required to code the visual elements.

This made for a wonderfully rapid prototyping experience that I was unable to reproduce until I tried working with Unity 3D which describes itself as “the ultimate game development platform” though I’d go so far as to describe it as “the ultimate app development platform”.

Think about it, at the time of writing this Unity supports 25 Platforms including Desktops (Win/Mac/Linux), the mobile OS’s, and all the major gaming consoles, not to mention the Smart TV’s, Watches etc…. Any platform you want your app on, including the web, well… Unity pretty much supports it right out of the box. Oh, and it’s free until you make $100K a year with it, not too shabby!

The catch? Well, its highly optimized but leans in the 3D gaming direction (though I’ve built 2D apps with Unity) so the applications it produces tend to have a larger size (as far as my tests go) than if you used PhoneGap/Cordova or went native. My guess is this is due to the embedded physics engine and graphics rendering code that gets packaged with the app but i’m only guessing, and there are a few options that let you exclude unnecessary things from the compiled app.

Then again, you may be able to make use of those features in your app so it need not be a negative either.

In any case, the problem as I see it with Unity is that the builder isn’t readily available on Linux, but it will build for Linux ❓ Maybe they should build Unity with Unity so that it can Unity being Unity… 😛

I am aware they kinda released a limited version for Linux… but I could never seem to make it work right and the truth is that it takes some decent (but not outrageous) resources to run the Unity builder application so most micro computers are out and sadly it won’t run on the ARMf architecture so using a Raspberry Pi to do Unity development is just not happening.

Is there another way?

Well, there is a modern Vector Graphic format available for the web called SVG that is basically XML code that can be written in a text editor or it can be drawn using a program like Inkscape (free and what I use) or Adobe Illustrator if you prefer a commercial paid tool.

Since SVG is code, if you place the code inside your HTML (sadly not link to or embed) you don’t just get a static vector image but instead you get elements that are accessible via the DOM (Document Object Model) that you can manipulate using CSS and JavaScript.

That last part should really interest you if you enjoy rapid application prototyping!

Which is the origin of this project, I wanted to know… could I rapid prototype an application by just drawing a picture and writing some code?

Understand that I am not talking about drawing a picture, slicing it then building an app from sliced components or using the sliced images as placeholders or writing code to draw the slices onto a canvas context.

I challenged myself to see if I could only write code that was part of the core functionality and not basic graphic asset creation and certainly not the code to display it, just manipulate it.

I set about creating a very simple “proof of concept” a while back that is basically a “Die Rolling App” that you can view a live example of here: SVG Roller though I never wrote about it. Roller is half image and half app but very basic.

Roller consists mainly of showing and hiding elements in the SVG based on a button click and a random number… good but not all that flashy!

Recently I have been wanting a better SVG application that would be more visual and expand on what I have already done but retain the simplicity of “Draw It then Code it”.

So, I opened up Inkscape and drew this image:


I grouped all the associated assets and gave them id’s like “cloud1”, “player”, “coin2” etc… then saved the image as demo.svg and closed Inkscape.

Why a game? Well, it’s more visual than my SVG Roller and I think it illustrates more of what is possible with an SVG app.

After that I opened demo.svg with a text editor and copied the SVG code into the body tag of my HTML file (remember you can’t link to it you have to include the code in the HTML).

I then wrote a little CSS that helps position the SVG on the page, applied a background color, disabled text highlighting and changed the cursor to the hand icon when the mouse is over a button, minimal CSS.

After that I wrote the JavaScript that turns the image into a playable application.

Game.js

Here is all the code that makes the SVG Platformer game demo work:

var keyEvents = {}; // keyboard state object

// Listen to keyboard outside of game loop to be less "blocky"
var onkeydown = onkeyup = function(key){
  key = key || event; // IE Fix 😦

  if(key.type == 'keydown'){
    keyEvents[key.keyCode] = true;
  }
  else{
    keyEvents[key.keyCode] = false; 
  }
  //console.log(keyEvents);
}


var game = document.getElementById('game'); // A reference to the SVG
if(game){
    game.addEventListener("load",function(){
    ///////////
    // Functions
        
    // Clear Instructions    
    // removes the instructions element
    function ClearInstructions() {
      instructions.remove();
    }

    // Get Position
    // This function gets the current (x,y) cordinates of GetPosition(object)     
    function GetPosition(object){
      var transformlist = object.transform.baseVal;
      var group = transformlist.getItem(0);
      var X = 0;
      var Y = 0;
      if (group.type == SVGTransform.SVG_TRANSFORM_TRANSLATE){
        X = group.matrix.e;
        Y = group.matrix.f;
      }
      return [X, Y];
    }
    
    // Collide
    // A basic box collision detector
    function Collide(element1, element2) {
      var collisionBox1 = element1.getBoundingClientRect();
      var collisionBox2 = element2.getBoundingClientRect();

      return !(collisionBox1.top > collisionBox2.bottom ||
        collisionBox1.right < collisionBox2.left ||
        collisionBox1.bottom < collisionBox2.top ||
        collisionBox1.left > collisionBox2.right);
    }

    // Inside
    // A basic inside box collision detector
    function Inside(element1, element2) {
      var collisionBox1 = element1.getBoundingClientRect();
      var collisionBox2 = element2.getBoundingClientRect();

      return (collisionBox1.top <= collisionBox2.bottom && 
        collisionBox1.bottom >= collisionBox2.top && 
        collisionBox1.left <= collisionBox2.right && 
        collisionBox1.right >= collisionBox2.left);
    }
      
    // Get Bank Total
    // Get the number of diamond or coins the player has
    function GetBankTotal(element){
      var currentValue = element.textContent;
      return parseInt(currentValue);
    }
    
    // Collect 
    // Increment the Coin or a Diamond "player bank"
    function Collect(element){
      element.textContent = GetBankTotal(element) + 1;
    }
      
    
            
    ///////////
    // Game play
    
    // Set the "constants"
    var step = 1;
    var jump = 20;
    var gravity = 1.5;

    // Setup references to the "named" SVG XML elements
    var gameOver = game.getElementById("gameover"); // A hidden "eater/detector" element below the play area to detect player death
    var instructions = game.getElementById('instructions');  // Instructions element
    var gameOverMenu = game.getElementById("gameovermenu");  // Game over screen element
    var player = game.getElementById("player");              // The player element
    var playerCoins = game.getElementById("playercoins");    // The "bank" element showing how many coins the player has collected
    var playerDiamonds = game.getElementById("playerdiamonds");// The "bank" element showing how many diamond the player has collected

    //  Setup references to the "named" SVG XML coin elements
    var coinPieces = ['coin1', 'coin2'];
    var coins = [];
    coinPieces.forEach(element => {
      coins.push(document.getElementById(element));
    });
        
    // Setup references to the "named" SVG XML diamond elements
    var diamondPieces = ['diamond1'];
    var diamond = [];
    diamondPieces.forEach(element => {
      diamond.push(document.getElementById(element));
    });

    // Setup references to the "named" SVG XML ground elements
    var terrainPieces = ['ground1', 'ground2', 'ground3', 'ground4'];
    var terrain = [];
    terrainPieces.forEach(element => {
      terrain.push(document.getElementById(element));
    });

    var winningBankTotal = diamondPieces.length + coinPieces.length;


    // Clear the instructions after 3 seconds
    setTimeout(ClearInstructions, 3000);


    // Redraw Game Loop
    var redrawRate = 30; // microseconds
    var gameLoop = setInterval(function(){
      fall = true; // always try to fall
      allowedToJump = false;// disallow jumping because player might be falling
      allowedToMove = true; // allow moving until the player is dead

      // Check for collisions with ground elements
      terrain.forEach(ground => {
        // If there is a collision with the ground
        if(Collide(player, ground)){
          fall = false; // Stop falling
          allowedToJump = true; // Allow jumping
        }
      });

      // if player fell below the ground
      if(Inside(player, gameOver)){
        fall = false; // stop falling
        allowedToJump = false; // dont allow jumping
        allowedToMove = false; // player is dead stop player movment
        ClearInstructions(); // just in case
        gameOverMenu.style.display = "inline"; // show game over menu
      }


      // if there was no collision between a ground element and
      // the player
      if(fall === true){
        position = GetPosition(player); // get updated player position
        allowedToJump = false; // dont allow jumping
        player.transform.baseVal.getItem(0).setTranslate(position[0], position[1] + gravity); // player falls
      }


      if(allowedToMove === true){
        position = GetPosition(player); // get updated player position
        // keyboard movment
        // left || a
        if (keyEvents[37] === true || keyEvents[65] === true) {
          if(position[0] > -10){
            player.transform.baseVal.getItem(0).setTranslate(position[0] - step, position[1]);
          }
        }
        // right || d
        if (keyEvents[39] === true || keyEvents[68] === true) {
          if(position[0] < 140){
            player.transform.baseVal.getItem(0).setTranslate(position[0] + step, position[1]);
          }
        }
        // up || w || space
        if ((keyEvents[38] === true || keyEvents[87] === true || keyEvents[32] === true) && allowedToJump === true) {
          player.transform.baseVal.getItem(0).setTranslate(position[0], position[1] - jump);    
        }
        // down || s
        if (keyEvents[40] === true || keyEvents[83] === true) {
          //console.log("Down");
        }
      }


      // Item Collection        
      // Collect coins
      coins.forEach(coin => {
        if(Inside(player, coin)){
          coin.remove();
          Collect(playerCoins);
        }
      });

      // Collect diamond
      diamond.forEach(diamond => {
        if(Inside(player, diamond)){
          diamond.remove();
          Collect(playerDiamonds);
        }
      }); 

      // Check for Win
      if((GetBankTotal(playerDiamonds) + GetBankTotal(playerCoins)) == winningBankTotal){
        clearInterval(gameLoop); // stop the game
        fall = false; // stop falling
        allowedToJump = false; // dont allow jumping
        allowedToMove = false; // player won stop player movment
        gameOverMenu.style.display = "inline"; // show game over menu
        // Change Game Over text to You Win
        game.getElementById("gameovermessage").textContent = '  You Win!';
      }

    }, redrawRate); // Game Loop - redraw every 30 microseconds
  }); // game load event listener
} // if game

 

As you can see from the code it supports WASD as well as the arrow keys and spacebar for movment. There is a win condition if you collect the two coins and the single diamond. You lose if you fall into one of the two spike pits.

Overall I am pleased with what I accomplished however the collision detection could be improved and there is quite a bit of room for improving how items are collected and enemies would be nice in addition to a larger/longer level, and maybe even a parallax scrolling  effect on some background elements as the player moves would also be nice, though again, I am happy with how it turned out.

You can Play a Live Demo: Here

You can Get the Code: Here

You can find a list of all my other posts on my Topics and Posts page.

I hope you enjoyed reading about and playing this SVG game prototype.

Your financial support allows me to dedicate time to developing projects like this and while I am publishing them without cost, that isn’t to say they are free. It takes me a lot of time and effort to build and publish projects like this for your enjoyment.

So I ask that if you like my content, please support me on Patreon for as little as $1 or more a month.

Feel free to suggest a project you would like to see built in the comments and if it sounds interesting it might just get built and featured here on my blog for you to enjoy.

 

 

Much Love,

~Joy

Button CSS Generator

Today we’re going to talk about a limited “generative system” that “writes” CSS code though I wouldn’t call this a ‘bot’.

What this generator does is “explore” the existing possibilities of a predefined problem space using known ranges and random selection. 😉

I wrote some of my thoughts regarding “generative systems” in relation to programmers and software developers in my article The Death of the Programmer and I also implemented a generative “writer bot” that I trained and used to write my article A Halloween Tale.

However in those cases I was mainly referring to semi-intelligent (in regard to the problem space) systems (“bots”) that are able to make choices on their own their own, and or are capable of learning, evolving or growing in some capacity based on input or state but that is not what we are doing today.

This is a simple generative tool (the Button CSS generator I provide below) is useful but it has no knowledge of what looks good (or bad) and no way to learn them either!

It can only provide you with options but it is far from replacing you.

You can view a Live Preview: Here 

Though in case you wanted it, here’s a screenshot:

There are only 3 files used to create the Button CSS generator: style.css, functions.js & index.html

Style.css

This is the CSS styles used by the index.html file. You will notice that there is a base button CSS template that all the buttons share when the page is first loaded.

The “hardcoded” styles are: text-align,  text-decorationdisplay, padding, font-size.

Once the page is loaded however I use JavaScript to generate and apply additional CSS attributes which are responsible for the variations in each button that you see.

The style.css file is an external style sheet and is linked to using <link rel=”stylesheet” type=”text/css” href=”style.css”> in the HTML file.

 

h1{
    width:100%;
    text-align: center;
}
    
button {
    /* hardcoded styles*/
    text-align: center;
    text-decoration: none;
    display: inline-block;
    padding: 15px 32px;
    font-size: 1.5em;
}

table{
    text-align: center;
    width:100%;
}

caption{
    font-size:2em;
}

#randomize{
    background-color: rgb(62, 176, 239); 
    color: rgb(138, 219, 215);
    border-color: rgb(28, 65, 39);
    border-width: 11px;
    border-style: dotted;
}

#show-css-wrapper{
    width: 100%;
    text-align: center;
    margin-left: auto;
    margin-right: auto;
}

#show-css-title{
    width: 100%;
    text-align: center;
    margin-left: auto;
    margin-right: auto;
    background-color: #999999;
}

#show-css{
    width: 40%;
    min-width: 420px;
    text-align: left;
    margin-left: auto;
    margin-right: auto;
    background-color: #eeeeee;
}

textarea{
    width: 100%;
    height: 200px
}

 

Functions.js

This file contains all the functions that the generator will use to make new button CSS styles.

There is some inline JavaScript in the HTML file that makes use of these functions and I will discuss that in the Index.html section below.

There is a lot of similarity between most of the functions and although much of their functionality can be wrapped up into one function I intentionally implemented the prototype generator this way.

I could say the reason is that I wanted to avoid early optimization but the real reason is that this way is just easier to read for someone who didn’t write the software and or is just learning to code.

Though in truth SetBackgroundColor(element) or SetBorderWidth(element) is not significantly more readable than a more optimized generic function like SetCSS(element, cssProperty,  optional [ value ])  and if I was going to spend more time developing this system I would probably end up going that route because the generic function would be far easier to maintain in the long run simply because it would be one function to maintain instead of many.

The functions.js file is an external script and is linked to the HTML file using the “src” (source) attribute on a script tag.

////////////////////////////////////
// Random Button CSS Generator Functions

// Will return 0 or 1
function CoinFlip(){
    return Math.floor(Math.random() * 2); //0/1
}
    

// Return a random string of RGB values e.g. "255, 255, 255"
function RandomRGBColor(){
    var r, g, b;
    
    r = Math.floor(Math.random() * 256);
    g = Math.floor(Math.random() * 256);
    b = Math.floor(Math.random() * 256);
    
    color = r.toString() + ', ' + g.toString() + ', ' + b.toString();

    return color;
}

// Set SetBackgroundColor()
function SetBackgroundColor(element, color = null){
    // No color given so set to random
    if(color === null){
        element.style.backgroundColor = 'rgb(' + RandomRGBColor() + ')';
    }else{ // Set to provided color
        element.style.backgroundColor = 'rgb(' + color + ')';
    }
}

// Set SetFontColor()
function SetFontColor(element, color = null){
    // No color given so set to random
    if(color === null){
        element.style.color = 'rgb(' + RandomRGBColor() + ')';
    }else{ // Set to provided color
        element.style.color = 'rgb(' + color + ')';
    }
}

// Set SetBorderColor()
function SetBorderColor(element, color = null){
    // No color given so set to random
    if(color === null){
        element.style.borderColor = 'rgb(' + RandomRGBColor() + ')';
    }else{ // Set to provided color
        element.style.borderColor = 'rgb(' + color + ')';
    }
}

// Set SetBorderWidth()
function SetBorderWidth(element, width = null){
    // No width given so set to random
    if(width === null){
        element.style.borderWidth = Math.floor(Math.random() * 14).toString() +'px';
    }else{ // Set to provided color
        element.style.borderWidth = width.toString() + 'px';
    }
}

// Set SetBorderStyle()
function SetBorderStyle(element, style = null){
    // No width given so set to random
    if(style === null){
        
        var styles = ['none', 'hidden', 'dotted', 'dashed', 'solid', 'double', 'groove', 'ridge', 'inset', 'outset', 'initial', 'inherit'];
        
        element.style.borderStyle = styles[Math.floor(Math.random()*styles.length)];
            
    }else{ // Set to provided color
        element.style.borderStyle = style;
    }
}

// Get the CSS for the button that was clicked and show it on the page
function ShowCSS(element){
    var style = window.getComputedStyle(element);
    var backgroundColor = style.getPropertyValue('background-color');
    var color = style.getPropertyValue('color');
    var borderColor = style.getPropertyValue('border-color');
    var borderWidth = style.getPropertyValue('border-width');
    var borderStyle = style.getPropertyValue('border-style');
    var padding = style.getPropertyValue('padding');
    var textDecoration = style.getPropertyValue('text-decoration');
    var display = style.getPropertyValue('display');
    var fontSize = style.getPropertyValue('font-size');
    var css = '';
    css += 'button{\n';
    css += '    background-color: ' + backgroundColor.toString() + ';\n';
    css += '    color: ' + color.toString() + ';\n';
    css += '    border-color: ' + borderColor.toString() + ';\n';
    css += '    border-width: ' + borderWidth.toString() + ';\n';
    css += '    border-style: ' + borderStyle.toString() + ';\n';
    css += '    padding: ' + padding.toString() + ';\n';
    css += '    text-decoration: ' + textDecoration.toString() + ';\n';
    css += '    display: ' + display.toString() + ';\n';
    css += '    font-size: ' + fontSize.toString() + ';\n';
    css += '}\n';

    document.getElementById('css-styles').innerHTML = css;
}

Index.html

Index.html is the core that brings all the pieces of this software together!

You will notice that I used divisional elements (div tags) and a textarea to create the section at the top of the page where the CSS code is shown when you click one of the buttons.

Beneath that is the randomize button hyperlinked to nothing, followed by a table. Why a table and not a responsive grid? Eh… it was faster for the prototype mainly.

Inside each cell of the the table is a button that has an onclick attrabute that passes the element’s ID reference ‘this’ to the function ShowCSS().

Beyond this the only thing of consequence in the HTML file is the inline JavaScript that uses the code from the functions.js script to make everything work.

When the page loads I use an array of strings of element id’s (the buttons) to locate & establish references to the DOM objects for each button and store those references in the buttons array.

Then, for each of the element references in the buttons array I set a background color as well as a font color randomly. After that I “flip a coin” to decide if a boarder should be applied to the button. If yes, I apply a random boarder color, width and style.

Once the code is done running all the buttons will be randomly styled and are ready for you to click them to get the CSS if you like them or click Randomize to get a new set of random buttons.

<html>
<head>
  <title>Random Button</title>
  <link rel="stylesheet" type="text/css" href="style.css"> 
</head>
<body>
  <!-- Title -->
  <h1>Random Button CSS Generator</h1>
     
    <!-- Show CSS -->
    <div id="#show-css-wrapper">
        <div id="show-css">
            <h2 id="show-css-title">Button CSS</h2>
            <p>Select a button to view it's CSS styles</p>
            <textarea id="css-styles">CSS will show here</textarea>        
        </div>
    </div>

    <!-- Table -->
    <table> 
      <tr>
          <td></td>
          <td>
              <h3>
                  <a href="">
                      <button id="randomize">
                          Randomize
                      </button>
                  </a>
              </h3>
          </td>
          <td></td>
      </tr>
         <tr>
          <td>&nbsp;</td>
          <td>&nbsp;</td>
          <td>&nbsp;</td>
      </tr>
      <tr>
        <td><button id="button-1" onclick="ShowCSS(this)">Button 1</button></td>
        <td><button id="button-2" onclick="ShowCSS(this)">Button 2</button></td>
        <td><button id="button-3" onclick="ShowCSS(this)">Button 3</button></td>
      </tr>
      <tr>
        <td><button id="button-4" onclick="ShowCSS(this)">Button 4</button></td>...
        <td><button id="button-5" onclick="ShowCSS(this)">Button 5</button></td>
        <td><button id="button-6" onclick="ShowCSS(this)">Button 6</button></td>
      </tr>
        <tr>
        <td><button id="button-7" onclick="ShowCSS(this)">Button 7</button></td>
        <td><button id="button-8" onclick="ShowCSS(this)">Button 8</button></td>
        <td><button id="button-9" onclick="ShowCSS(this)">Button 9</button></td>
      </tr>
    </table>

 
    <!-- Include the Functions -->
    <script src="functions.js"></script>    
    <script>
    // Button Element ID's
    var elementsIDs = ['button-1', 'button-2', 'button-3', 'button-4', 'button-5', 'button-6', 'button-7', 'button-8', 'button-9'];

    var buttons = []; // Array to Store Button References

    // Setup Element References to the Buttons
    elementsIDs.forEach(element => {
      buttons.push(document.getElementById(element));
    });

    // Apply Random CSS to each Button
    buttons.forEach(element => {
      SetBackgroundColor(element); // Set Initial Background Colors
      SetFontColor(element);       // Set Initial Font Colors

      if(CoinFlip() == 1){ // Has border?
        SetBorderColor(element);
        SetBorderWidth(element);
        SetBorderStyle(element);
      }
       
    });

    </script>
</body>
</html>

 

Where to go from here

As a proof of concept I think I accomplished my goal of creating a functional Button CSS Generator however a more complete prototype would want to include features like allowing you to edit the CSS of the selected button and have the changes reflect on the button in real time or via an “update” button.

Also, perhaps the ability to “favorite” or save a button’s CSS style so that you can reload it again later or some kind of “history” feature would be nice so that you can keep the button styles you like.

Additionally, the CSS attributes the generator uses are hard coded and the property or attribute “min/max” ranges are set as well… A nice feature would be be to have the option to edit/add or remove the CSS fields included in the generator and modify the ranges as well.

It would also be nice for the prototype to allow for the generation of vertical and horizontal menus using the button CSS and include the ability to edit the links or “onclick” actions and the text of the buttons.

The “ultimate” implementation of a generator like this would allow you to generate styles for all or some subset of the HTML elements not just buttons.

I will leave all these features for you to add to your own implementations at this time however if you like this project I may just add some or all of these features in the future.

You can view a Live Preview: Here

You can get the code on GitHub: Here

You can find a list of all my other posts on my Topics and Posts page.

I hope you enjoyed reading this article.

Your financial support allows me to dedicate time to developing projects like this and while I am publishing them without cost, that isn’t to say they are free. I am doing this all by myself and it takes me a lot of time and effort to build and publish these projects for your enjoyment.

So I ask that if you like my content, please support me on Patreon for as little as $1 or more a month.

Feel free to suggest a project you would like to see built in the comments and if it sounds interesting it might just get built and featured here on my blog for you to enjoy.

 

 

Much Love,

~Joy

 

Brute Force Password Breaking

Welcome, today we’re going to talk about “Brute Force” Password Breaking.

I know, it’s a controversial topic… though they say you should write about controversy if you want to get read right? 😛 But to ensure its as controversial as possible I’m going to give you an actual working prototype that you can use to try Brute Force attack your own passwords! 😉

However, before you call me irresponsible consider the following.

There are plenty of methods for cracking passwords that are far more efficient than a Brute Force attack and if you have to resort to Brute Force (trying all possible combinations against an unknown and likely long password) a modern hashing algorithm then you are essentially screwed!

If the creator of the password chose a “simple”,  “common” or “guessable” password… like for example “123456789”, “Cat” or “Password”, Brute Force isn’t even necessary!

 

Rainbow Tables

A hacker can simply use a “Rainbow Table” (so colorful), which is basically a database, to lookup the pre-computed solution for the hash of your password and obtain the unhashed “Plaintext” of your password.

In most cases where a hacker can use a Rainbow Table they will save themselves significant time and effort simply because they don’t have to do any hashing (which cumulatively can take a lot of time), it’s just a matter of traversing a table and retrieving the associated data of the index that matches the hash provided, assuming of course that the hash was pre-computed and exists in the Rainbow Table.

For example, the hash output of the SHA1 algorithm for the word ‘Cat’ is cebe54c7626cb1cefaca5f7f5ea6c96b4a7a2882 and if a hacker was able to break into a database containing this hashed credential then they could reverse lookup the hashed password in seconds.

Clearly what makes a Rainbow Table so useful to a hacker is that it can take the insurmountable challenge of Brute Forcing a password and change it into something that is at the very least, manageable.

There are techniques however, called Password ‘Salting‘ & ‘Peppering‘ which expose a severe weakness with Rainbow Tables… namely, if you cannot pre-compute a big database of all possible hashes then you are forced to resort to some other technique if not a Brute Force attack.

A Salt is some unique (and long) string value that can be added to a password before it is hashed to make building a Rainbow Table difficult if not impossible.

Here’s an example of how Salting works, lets take the insecure password we used above ‘Cat’, and look at what happens when we add a 30 character Salt to it prior to hashing the value.

Password: Cat

Salt: LPdjlEfrMhGkENHf3e4Lp7VZgXd77f

Hash(Password + Salt) = d73b50b3d80762f55a28a44e49568be064ee8208

Note:To ensure you get the maximum benefit from Salting your passwords you should use a different Salt for every password credential that you store in your database. If you don’t it will be much easier to for a hacker to steal your passwords.

As you can see, by including a Salt with the password when it is hashed the result produced is different than the word by itself. This different hash is what is stored in the database.

The benefit of doing this is simply that it is now extremely unlikely (improbable but not impossible) that the combination of the word Cat and this long randomly generated Salt string will have been pre-computed anywhere, so it doesn’t even matter if a hacker gets both the hash and the Salt because a Rainbow Table will have to be generated from scratch using the Salt, which can take horrendous amounts of time, potentially on the order of a human life time or even longer for some hashing algorithms, password lengths and of course depending on how much raw computation an attacker can field.

 

Dictionary Attack

Dictionary Attack is similar to the concept of a Rainbow Table in that it also utilizes a database, however where they differ is that a Rainbow Table’s purpose is to store pre-computed hashes so you can just lookup a password, whereas a Dictionary Attack still requires the attacker to break your password through hashing.

So from a Hackers perspective a Rainbow Table is preferable to a Dictionary.

The purpose of a Dictionary Attack then is to contain all the most likely passwords, which are then combined with your Salt (if they have it) or also generated as is the case with a Pepper before hashed to generate a new Rainbow Table that is unique to the Salt or Pepper.

This is a form of Brute Force attack and takes time to generate though it is still better than “true” brute force because it relies on the idea that words mean things and we all share the same words and meanings.

Anytime there is a massive breach of user credentials where the passwords are compromised… i.e. the passwords were kept as plain text or hashed using a single unchanged Salt, or simply too short of a salt for every password so all passwords in the database become compromised… all the compromised passwords get added to Dictionaries (and Rainbow Tables) because the passwords were used by someone so they are “known to be good” and are therefore more likely to be used by someone else.

Think of a Dictionary as a list of thousands to millions of “probable” common and known passwords.

The benefit of a Dictionary is that a hacker can focus on all the most likely passwords because people tend to think alike and of all the possible words that COULD be used for a password only a small subset will ever actually be selected by people.

Further, If the dictionary attack fails and the hacker must resort to True Brute Force, they can exclude the passwords in the Dictionary that they already tried and focus on a the Brute Force Attack by generating new previously untried passwords.

 

True Brute Force

The thing is, if Rainbow Tables and Dictionary Attacks failed to break your credentials then most hackers will give up and even the skilled professionals are forced to question the real value of their target because in most cases it’s not worth the hassle!

Having to resort to Brute force means that they tried EVERYTHING else and failed!

Your servers proved to be secure, your protocols are working, your “wetware” er… IT staff isn’t giving out credentials over the phone… and that “dumpster dive” the hackers took at 3 AM to see if your staff is throwing out documents with “sensitive” information, proved useless…

True Brute Force means that you take all typable letters:

Upper Case letters:  ABCDEFGHIJKLMNOPQRSTUVWXYZ

Lower Case letters: abcdefghijklmnopqrstuvwxyz

Symbols: !”#$%&'()*+,-./:;<=>?@[\]^_`{|}~

And while we’re at it why not include Numbers too: 0123456789

For a total of 94 possible characters and then starting with only 1 character, generate and iterate through all possible permutations until you give up or a solution is found!

This sounds easier to accomplish than it actually is!

Sure, generating the data is easy enough (I show you how and provide code below) but due to the sheer numbers involved it’s essentially an impossible task when considering the hash time is multiplied by all the combinations you have to try and a longer password means more combinations are required to break the hash.

This is why it is recommended by some technologists that your password include Upper and Lower case letters along with numbers and symbols and be longer than 15 characters!

Let’s do the math!

If there are 94 possible characters and a password is only 1 character long we would only need to try a maximum of 941 = 94 characters before we can guarantee we have the password.

In the case of a 3 character long password (943 = 830,584) we would have to try a little less than 1 million combinations!

This is simple exponentiation with the base being the number of symbols possible and the exponent is the length of the password.

Here’s a table:

Password Length Combinations Calculation
1 94 941
2 8,836 942
3 830,584 943
4 78,074,896 944
5 7,339,040,224 945
6 689,869,781,056 946
7 64,847,759,419,264 947
8 6,095,689,385,410,816 948
9 572,994,802,228,617,000 949
10 53,861,511,409,490,000,000 9410
11 5,062,982,072,492,060,000,000 9411
12 475,920,314,814,253,000,000,000 9412
13 44,736,509,592,539,800,000,000,000 9413
14 4,205,231,901,698,740,000,000,000,000 9414
15 395,291,798,759,682,000,000,000,000,000 9415

 

Clearly a nice long password that contains upper and lower case letters along with numbers and symbols is definitely going to give your hacker a bad day though I personally prefer the idea of Passphrases which are a series of words in a phrase rather than a single word.

If the words you use in your passphrase are nice and long, have upper and lower case letters along with numbers and symbols and isn’t a well known phrase (so that it’s not in a phrase dictionary) then you can be fairly confident that your account is secure for the foreseeable future.

Breaker Class

So as you can see, I am not helping anyone break anything by giving out this code, your passwords are safe! 😉

Regular readers will notice in the code below I used my AppTimer Class that I released over on my Benchmarking PHP article.

Breaker has no properties and only 3 methods (not including PHP’s Magic Methods).

Methods: GetSymbols, IncrementValues & Match

GetSymbols()

GetSymbols converts the numbers in an array to the char the number represents.

For example: The number 0 represents the exclamation symbol ! and the number 33 represents uppercase B

IncrementValues()

IncrementValues takes an array of numbers and increments the values of each by 1 unless that would exceed the allowable max in which case it resets it to 0 and the value to the right is created or incremented by 1.

Match()

At this time, match just does a comparison though feel free to hash the values that are passed to this method to complete the Brute Force Attack program.

As is, this will only brute force plain text against plain text.

Breaker.php

<?php
set_time_limit(0); // Disable the time limit on script execution

// Create Breaker Class 
//
// This tool is a demonstration of a "brute force" password breaker.
// This prototype is provided AS IS and for informational & educational
// purposes only! 
//
// Modern Password Hashing should have little fear of this code though
// for a minimum level of rationality I have excluded the parts that 
// would handle hashing the passwords to slow down the "Script Kiddies" 
// however any reasonably skilled PHP developer would have little trouble 
// adding their own hashing function to complete this prototype.
// 
// DO NOT USE THIS SOFTWARE TO VIOLATE THE LAW! COMPLY WITH ALL DIRECTION
// GIVEN TO YOU BY LAW ENFORCEMENT! ANY ILLEGAL OR MALICIOUS ACTIONS YOU 
// CHOOSE TO ENGAGE IN OUTSIDE OF AN EDUCATIONAL SETTING ARE YOUR OWN! 
class Breaker{

    function GetSymbols($values, $symbols){
        foreach($values as &$value){
            if(isset($symbols[$value])){
                $value = $symbols[$value];
            }
        }
        return $values;
    }

    function IncrementValues($values, $number_of_valid_symbols){
        foreach($values as $key=>&$value){
            // If this value is maxed
            if($values[$key] >= $number_of_valid_symbols){
                // Reset it to 0 and increment the next value
                $values[$key] = 0; // Reset this value
                if(!isset($values[$key+1])){
                    $values[$key+1] = '0';
                }else{
                    $values[$key+1]++; // Reset this value
                }
            }
            else{
                // If key greater than 0
                if($key > 0){
                    if($values[$key-1]>$number_of_valid_symbols){
                        // Increment this value
                        $values[$key]++;
                    }
                }
                else{
                    // Always Increment this value
                    $values[$key]++;
                }
            }            
        }
        return $values;
    }

    function Match($hash, $test_password){
        if($test_password == $hash){
          return true;
        }
        return false;
    }

}



include('AppTimer.Class.php');     // Include AppTimer class file

$Timer = new AppTimer();           // Create Timer
$Timer->Start();                   // Start() Timer


$password_to_break = 'Cat'; 

// Concatenate all symbols explicitly
//$valid_symbols = "!\"#$%&'()*+,-./0123456789:;<=>?@"; // note the escaped double quote
//$valid_symbols .= "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`";
//$valid_symbols .= "abcdefghijklmnopqrstuvwxyz{|}~";
//$valid_symbols = str_split($valid_symbols); // split string into array

// Cleaner way to Create array of ASCII char 33 - 126
$valid_symbols = range(chr(33), chr(126)); // Shorter version of above
$number_of_valid_symbols = count($valid_symbols); // 94 chars

$length = 1; // Start at 1 digit length to try all possible combinations
             // This assumes the password length is unknown.
// If the length of the password is known then use the correct length i.e:
// $length = strlen($password_to_break); 
                                       
// Generate first plain text password to try
$values = str_split(strrev(str_repeat('0', $length)));
$PlainTextPasswordBreaker = new Breaker();
$test_password = $PlainTextPasswordBreaker->GetSymbols($values, $valid_symbols);

while(!$PlainTextPasswordBreaker->Match($password_to_break, $test_password)){
    // We have not found the correct password so keep trying to generate it
    $values = $temp = $PlainTextPasswordBreaker->IncrementValues($values, $number_of_valid_symbols);
    $temp = $PlainTextPasswordBreaker->GetSymbols($values, $valid_symbols);
    $test_password = strrev(implode('', $temp));
    
    //echo $test_password . PHP_EOL; // Uncomment to watch breaker
                                   // Will make Breaker much slower
}


$Timer->Stop();             // Stop() Timer
$time = $Timer->Report();   // Report()


echo "Password: $test_password \nFound in: $time" . PHP_EOL;

As presented the output of the code will look something like this:

Password: Cat 
Found in: 5.8302 Seconds

If you uncomment the echo on line 105 inside the while loop you can watch each permutation get generated however echo will slow down the time it actually takes to find the password.

Here is what that would look like (note that I shortened the output to just the last few permutations before the solution was found):

...
Ca!
Ca"
Ca#
Ca$
Ca%
Ca&
Ca'
Ca(
Ca)
Ca*
Ca+
Ca,
Ca-
Ca.
Ca/
Ca0
Ca1
Ca2
Ca3
Ca4
Ca5
Ca6
Ca7
Ca8
Ca9
Ca:
Ca;
Ca
Ca?
Ca@
CaA
CaB
CaC
CaD
CaE
CaF
CaG
CaH
CaI
CaJ
CaK
CaL
CaM
CaN
CaO
CaP
CaQ
CaR
CaS
CaT
CaU
CaV
CaW
CaX
CaY
CaZ
Ca[
Ca\
Ca]
Ca^
Ca_
Ca`
Caa
Cab
Cac
Cad
Cae
Caf
Cag
Cah
Cai
Caj
Cak
Cal
Cam
Can
Cao
Cap
Caq
Car
Cas
Cat
Password: Cat 
Found in: 14.9678 Seconds

 

USE BREAKER RESPONSIBLY FOR EDUCATIONAL PURPOSES ONLY!

You can find Breaker on my GitHub profile here.

You can find a list of all my other posts on my Topics and Posts page & I hope you enjoyed reading this article, if so please support me on Patreon for as little as $1 a month.

Your financial support allows me to dedicate time to developing awesome projects like this and while I am publishing them without cost, that isn’t to say they are free. I am doing this all by myself and it takes me a lot of time and effort to build and publish these projects for your enjoyment.

Your financial support means a lot to me and allows me to be able to afford to spend the time necessary to make great content for you.

So I ask again, please support me on Patreon for as little as $1 a month.

Feel free to suggest a project you would like to see built in the comments and if it sounds interesting it might just get built and featured here on my blog.

 

 

Much Love,

~Joy

Email Relationship Classifier Testing The Bot

Welcome back, today is last post in this series and one that I know many of you have been eagerly awaiting… we’re finally going to test the bot!

So I think what I’m going to do is first give you the code to review and then after I will walk you through it and explain what’s going on.

Test.php

I have labeled the subheadings in this post after the section comments in the code to make it easier to review so you should refer back to this code while reading the article to aid in understanding.

<?php
// This function will load the human scored JSON class files
function LoadClassFile($file_name){
  // Get file contents
  $file_handle = fopen($file_name, 'r');
  $file_data = fread($file_handle, filesize($file_name));
  fclose($file_handle);
  return $file_data;
}

// We will pass our Results to this function to save so it can be reviewed later
function CreateResultsFile($file_name, $output_path, $results){
  
  // Write file contents
  $file_handle = fopen($output_path . basename($file_name), 'w');
  $file_data = fwrite($file_handle, $results);
  fclose($file_handle);
}



// Include Classes
function ClassAutoloader($class) {
  include 'Classes/' . $class . '.Class.php';
}
spl_autoload_register('ClassAutoloader');


// Instantiate Objects
$myTokenizer = new Tokenizer();
$myEmailFileManager = new FileManager();
$myJSONFileManager = new FileManager();
$myDatabaseManager = new DatabaseManager();


// No Configuration needed for the Tokenizer Object

// Configure FileManager Objects
$myEmailFileManager->Scan('DataCorpus/TestData');
$myJSONFileManager->Scan('DataCorpus/TestDataClassifications');
$number_of_testing_files = $myEmailFileManager->NumberOfFiles();
$number_of_JSON_files = $myJSONFileManager->NumberOfFiles();

// Configure DatabaseManager Object
$myDatabaseManager->SetCredentials(
  $server = 'localhost',
  $username = 'root',
  $password = 'password',
  $dbname = 'EmailRelationshipClassifier'
);


// Make sure the files are there and the number of files are the same
if(($number_of_testing_files != $number_of_JSON_files) 
   || ($number_of_testing_files == 0 || $number_of_JSON_files == 0) 
  ){
  die(PHP_EOL . 'ERROR! the number of training files and classification files are not the same or are zero! Run CreateClassificationFiles.php first.');
}
else{
  // Loop Through Files
  for($current_file = 0; $current_file < $number_of_testing_files; $current_file++){
 
  
  $report_data = '';
  
  /////////////////////////
  // Bot Predict Classification
  /////////////////////////
  
  $file = $myEmailFileManager->NextFile();
  
  $myTokenizer->TokenizeFile($file);
    
  $report_data .= "Found Tokens:". PHP_EOL;
  // Loop Through Tokens
  foreach($myTokenizer->tokens as $word=>$count){
    $report_data .= "$word $count" . PHP_EOL;
    // Get word classification scores
    $myTokenizer->tokens[$word] = $myDatabaseManager->ScoreWord($word, $count);
    
    // Remove unknown word tokens
    if($myTokenizer->tokens[$word] == NULL){
    unset($myTokenizer->tokens[$word]);
    }
  }
  
  $report_data .= PHP_EOL . "Known Words:". PHP_EOL;
  $known_words = array_keys($myTokenizer->tokens);
  foreach($known_words as $word){
    $report_data .= $word . PHP_EOL;
  }
  
  $weights = array();
  // Sum tokens
  foreach($myTokenizer->tokens as $word=>$word_data){
    foreach($word_data as $class_name=>$class_count){
    @$weights[$class_name] += $class_count;
    }
  }
  $weights = array_diff($weights, array(0)); // remove 0 value classes

  // Sort into sender recipient groups
  foreach($weights as $class=>$count){
    // if key name contains -Sender add to the Sender key
    if(strstr($class, '-Sender')){
    $weights['Sender'][strstr($class, '-Sender', true)] = $count;
    }
    else{// if key name contains -Recipient add to the Recipient key
    $weights['Recipient'][strstr($class, '-Recipient', true)] = $count;
    }
    unset($weights[$class]); // remove the unsorted element
  }
  // sort arrays from more likely to less likely
  array_multisort($weights['Sender'], SORT_DESC);
  array_multisort($weights['Recipient'], SORT_DESC);



  /////////////////////////
  // Human Classified Data
  /////////////////////////
  $EmailClassifications = json_decode(LoadClassFile($myJSONFileManager->NextFile()), true);
  $EmailClassifications = array_diff($EmailClassifications, array(0)); // remove 0 value classes
  $sum = array_sum($EmailClassifications); // sum the total of all classes weights
  // sort into sender recipient groups
  // and convert values to percentages
  foreach($EmailClassifications as $class=>$count){
    // if key name contains -Sender add to the Sender key
    if(strstr($class, '-Sender')){
    $EmailClassifications['Sender'][strstr($class, '-Sender', true)] = $count;
    }
    else{// if key name contains -Recipient add to the Recipient key
    $EmailClassifications['Recipient'][strstr($class, '-Recipient', true)] = $count;
    }
    unset($EmailClassifications[$class]); // remove the unsorted element
  }
  // sort arrays
  array_multisort($EmailClassifications['Sender'], SORT_DESC);
  array_multisort($EmailClassifications['Recipient'], SORT_DESC);


  $report_data .= PHP_EOL;
  
  
  
  
  /////////////////////////
  // Report - Sender
  /////////////////////////

  $report_data .= PHP_EOL . "Predicted Sender Class & Score: " . PHP_EOL;
  $sum = array_sum($weights['Sender']); // sum the total of Sender weights
  foreach($weights['Sender'] as $class=>$count){
     $report_data .= "$class:  " . round(($count / $sum) * 100) . '%' . PHP_EOL;
  }
    

  $report_data .= PHP_EOL . "Human Scored Sender Class: " . PHP_EOL;
  $sum = array_sum($EmailClassifications['Sender']); // sum the total of Sender EmailClassifications
  foreach($EmailClassifications['Sender'] as $class=>$count){
     $report_data .= "$class:  " . round(($count / $sum) * 100) . '%' . PHP_EOL;
  }
  
  /////////////////////////
  // Report - Sender Mistakes
  /////////////////////////

  $report_data .= PHP_EOL . "Incorrect Predicted Sender Classes: " . PHP_EOL;
  $IPSC = array_keys(array_diff_key($weights['Sender'], $EmailClassifications['Sender']));
  if(count($IPSC) > 0){
    foreach($IPSC as $class){
     $report_data .= $class . PHP_EOL;
    }
  }else{
    $report_data .= 'None' . PHP_EOL;
  }
    
  $report_data .= PHP_EOL . "Missing Predicted Sender Classes: " . PHP_EOL;
  $MPSC = array_keys(array_diff_key($EmailClassifications['Sender'], $weights['Sender']));
  if(count($MPSC) > 0){
    foreach($MPSC as $class){
     $report_data .= $class . PHP_EOL;
    }
  }else{
    $report_data .= 'None' . PHP_EOL;
  }
  

  /////////////////////////
  // Report - Recipients
  /////////////////////////
  
  $sum = array_sum($weights['Recipient']); // sum the total of Sender weights
  $report_data .= PHP_EOL . "Predicted Recipient Class & Score: " . PHP_EOL; 
  foreach($weights['Recipient'] as $class=>$count){
     $report_data .= "$class:  " . round(($count / $sum) * 100) . '%' . PHP_EOL;
  }
  

  $report_data .= PHP_EOL . "Human Scored Recipient Class: " . PHP_EOL; 
  $sum = array_sum($EmailClassifications['Recipient']); // sum the total of Recipient EmailClassifications
  foreach($EmailClassifications['Recipient'] as $class=>$count){
     $report_data .= "$class:  " . round(($count / $sum) * 100) . '%' . PHP_EOL;
  }
  
  
  /////////////////////////
  // Report - Recipient Mistakes
  /////////////////////////
  
  $report_data .= PHP_EOL . "Incorrect Predicted Recipient Classes: " . PHP_EOL;
  $IPRC = array_keys(array_diff_key($weights['Recipient'], $EmailClassifications['Recipient']));
  if(count($IPRC) > 0){
    foreach($IPRC as $class){
     $report_data .= $class . PHP_EOL;
    }
  }else{
    $report_data .= 'None' . PHP_EOL;
  }
  
  $report_data .= PHP_EOL . "Missing Predicted Recipient Classes: " . PHP_EOL;
  $MPRC = array_keys(array_diff_key($EmailClassifications['Recipient'], $weights['Recipient']));
  if(count($MPRC) > 0){
    foreach($MPRC as $class){
     $report_data .= $class . PHP_EOL;
    }
  }else{
    $report_data .= 'None' . PHP_EOL;
  }
  
  /////////////////////////
  // Report - Overall
  /////////////////////////
  
  // Compute Results
  $sum_pediction = count($weights['Sender']) + count($weights['Recipient']);
  $sum_pediction -= count($IPSC); // Penalize Incorrect Predicted Sender Classes
  $sum_pediction -= count($MPSC) / 2; // Penalize Missing Sender Classes at half a point each
  $sum_pediction -= count($IPRC); // Penalize Incorrect Predicted Recipient Classes
  $sum_pediction -= count($MPRC) / 2; // Penalize Missing Recipient Classes at half a point each
  $sum_actual = count($EmailClassifications['Sender']) + count($EmailClassifications['Recipient']);
  
  $report_data .= PHP_EOL . "Overall Accuracy: " . PHP_EOL;
  $report_data .= ($sum_pediction / $sum_actual) * 100 . '%' . PHP_EOL;
  
  CreateResultsFile($file, 'DataCorpus/TestResults/', $report_data);
  echo $report_data;
  }
}

echo PHP_EOL . 'Testing Complete!' . PHP_EOL;

 

You will probably recognize the first portion of this code from Train.php and in fact there are really only two differences in initializing the environment between the two scripts.

The first difference is that Test.php includes a function called CreateResultsFile() that we’ll use to save the report that the bot generates, so we can review it later and the second is the paths that we provide to $myEmailFileManager & $myJSONFileManager are different from the ones used in Train.php.

Once the fail conditions around line 54 pass, the bot will step through all testing data beginning around line 61.

The first order of business is to generate the bot’s “prediction” of what relationship classes are present in the email.

Bot Predict Classification

The bot starts by Tokenizing the file which means building a bag of words model for the email and then the found tokens are passed to the $myDatabaseManager Object which uses it’s ScoreWord() method to scale the word class values using the information obtained during training. Unknown words are ignored and have no bearing on classifying the email in my implementation.

$myDatabaseManager->ScoreWord() method

For reference here is the ScoreWord() method for your review.

public function ScoreWord(&$word, &$count){
  
  if(count($this->classifications) == 0){
    $this->GetKnownClasses();
    $classifications = array();
    foreach($this->classifications as $class=>$value){
      $classifications["$class-Sender"] =  $value;
    }
    foreach($this->classifications as $class=>$value){
      $classifications["$class-Recipient"] =  $value;
    }
    $this->classifications = $classifications;
  }
  

  if($this->KnownWord($word)){
    $this->Connect();
    $sql = "SELECT * FROM `Words` WHERE `Word` LIKE '$word'";
    $result = $this->conn->query($sql);

    if ($result->num_rows > 0) {
    $word_data = $result->fetch_assoc();
    foreach($word_data as $key=>$value){
       if($key == 'ID'){
         unset($word_data["$key"]);
       }
       elseif($key == 'Word'){
         unset($word_data["$key"]);
       }
       else{
         $word_data[$key] *= ($count * $this->classifications[$key]);
       }
    }
    return $word_data;
    }
  }else{
    // unknown word... add it or ignore it
  }
}

 

Note that you could easily add new words found during test data to the bot knowledge base with zero relationship class affiliations and you could later manually update the word classes or do additional training to improve the bot’s “familiarity” with the word.

 

Then the $weights array is created to hold the prediction (the bot generated classifications) which is all the class counts summed and unnecessary elements removed.

Why $weights and not $prediction? I don’t know, maybe I was being $pretentious. 😛

The array is then sorted into sender and recipient groups followed by lowest class to the highest class.

 

Human Classified Data

Next the human generated classifications stored in JSON are loaded into the $EmailClassifications array and the values are sorted into sender and recipient groups as well.

At this point we have extracted enough information to begin generating the statistical portion of the $report.

 

Report – Sender

Beginning on line 147 we evaluate the Sender data starting with the bot prediction by adding up the $sum “total count” of all the predicted weights then we determine what percentage each individual weight contributes to the overall prediction by dividing the weight value against the $sum then multiply the resulting number by 100% of the $sum.

This same process is repeated for the human classified data.

 

Report – Sender Mistakes

We then evaluate the bot predicted sender data for mistakes by comparing the bot’s predicted classification $weights against the known human generated $EmailClassifications using the array_diff_key & array_keys PHP language functions to extract and store the “Incorrect Predicted Sender Classes” as the $IPSC array, so we can use them later during the final evaluation.

We then do the same but in reverse, comparing $EmailClassifications against $weights for the “Missing Predicted Sender Classes” and save them the as $MPSC array.

 

Report – Recipients & Recipients Mistakes

We repeat this same process we used for the Sender data for the Recipients data beginning on line 189 followed by processing any mistakes on line beginning on line 208 which results in the $IPRC (Incorrect Predicted Recipient Classes) & $MPRC (Missing Predicted Recipient Classes) arrays.

 

Report – Overall

The last portion of the report is to evaluate the “overall” accuracy using the data the we collected and generated while working on the report.

We start by creating a $sum_prediction variable and setting its value to the total count of weights present in the $weights array.

We then proceed to subtract “points” from this number for every incorrect and or missing relationship classes.

Incorrect predictions receive a full point penalty whereas  missing predictions are penalized as half a point.

My thought process being that it’s better (but not perfect) for the bot to miss a class and exclude it than to include an incorrect class.

You may wish to use a different scoring rubric than this depending on what the repercussions of incorrect or missing data are in your model, this method is provided as a simple example.

We then create a variable called $sum_actual and set its value to the total count of classes present in the email as classified by a human.

The final “Overall Accuracy” is computed by taking the $sum_prediction and dividing it by the $sum_actual and then multiplying against 100 to get a percent.

We then save the $results report using the CreateResultsFile() function and echo the report to the screen as well.

Ideally $results would be captured to facilitate programmatic evaluation of the overall accuracy of the model, like in a csv or in a database so that you can compare all the results of all the test data,  however as this is only a prototype I went with a .txt dump of the individial report that the bot generates.

The output of this bot should look something like this:

 


Found Tokens:
YOU 1
WONT 1
BELIEVE 1
THIS 1
ITS 1
UNBELIEVABLE 1
DURING 1
THE 3
POSTGAME 1
CELEBRATION 1
MR 2
COACH 2
GOT 1
A 1
WHOLE 1
WATER 1
COOLER 1
DUMPED 1
ON 1
HIS 1
HEAD 1
EVERYONE 1
LAUGHED 1
AS 1
CHASED 1
TEAM 1
OFF 1
FIELD 1
LOVE 1
BOBBY 1

Known Words:
YOU
THE
LOVE


Predicted Sender Class & Score: 
Child:  53%
Daughter:  47%

Human Scored Sender Class: 
Son:  50%
Child:  50%

Incorrect Predicted Sender Classes: 
Daughter

Missing Predicted Sender Classes: 
Son

Predicted Recipient Class & Score: 
Parent:  36%
Mother:  32%
Father:  32%

Human Scored Recipient Class: 
Mother:  33%
Father:  33%
Parent:  33%

Incorrect Predicted Recipient Classes: 
None

Missing Predicted Recipient Classes: 
None

Overall Accuracy: 
70%

Testing Complete!

 

As it stands this bot is quite rough however you can improve it by modeling word bi-grams to account for the context the words are used in rather than just noting which words are present.

Additionally, I capitalize and process hyphens and apostrophes out of words which reduces the number of words the bot learns (i.e. dont vs don’t vs Don’t vs Dont vs DoNt… all become DONT) which simplifies some things and reduces database storage requirements a bit, however it does fail to properly model language because people can express meaning in ways that might get removed by this processing which obviously lowers the accuracy of your model in the long run.

You can find this bot and all its files on GitHub – emails not included.

I hope you enjoyed reading about building this bot , if so please support me on Patreon for as little as $1 a month.

Your financial support allows me to dedicate time to developing awesome projects like this and while I am publishing them without cost, that isn’t to say they are free. I am doing this all by myself and it takes me a lot of time and effort to build and publish these projects for your enjoyment.

Your financial support means a lot to me and allows me to be able to afford to spend the time necessary to make great content for you.

So I ask again, please support me on Patreon for as little as $1 a month.

Feel free to suggest a project you would like to see built in the comments and if it sounds interesting it might just get built and featured here on my blog.

 

 

Much Love,

~Joy

 

Email Relationship Classifier Training The Bot

We’re in the “home stretch” and quickly approaching our goal of having a working Email Relationship Classifier Bot prototype.

Today we will cover building the training portion of the bot and of course this system implements “Supervised Learning” so you will need to have “hand classified” your “Data Corpus” as outlined in my post A Bag of Words as well as Classifying Emails so if you’ve read the other posts in this series then you are ready to proceed.

 

Train.php

What you are going to love about this code is it’s simplicity!

It’s intentionally short and “high level” which was achieved by using our Class files (DatabaseManager.Class.phpFileManager.Class.phpTokenizer.Class.php) which we covered in my post Class Files  to create Objects to “act” upon our data encapsulated inside them. This means we can just ask our Objects to do complex work in just a few lines of code.

Code

<?php

// This function will load the human scored JSON class files
function LoadClassFile($file_name){
  // Get file contents
  $file_handle = fopen($file_name, 'r');
  $file_data = fread($file_handle, filesize($file_name));
  fclose($file_handle);
  return $file_data;
}


// Include Classes
function ClassAutoloader($class) {
    include 'Classes/' . $class . '.Class.php';
}
spl_autoload_register('ClassAutoloader');


// Instantiate Objects
$myTokenizer = new Tokenizer();
$myEmailFileManager = new FileManager();
$myJSONFileManager = new FileManager();
$myDatabaseManager = new DatabaseManager();


// No Configuration needed for the Tokenizer Object

// Configure FileManager Objects
$myEmailFileManager->Scan('DataCorpus/TrainingData');
$myJSONFileManager->Scan('DataCorpus/TrainingDataClassifications');
$number_of_training_files = $myEmailFileManager->NumberOfFiles();
$number_of_JSON_files = $myJSONFileManager->NumberOfFiles();

// Configure DatabaseManager Object
$myDatabaseManager->SetCredentials(
  $server = 'localhost', 
  $username = 'root', 
  $password = 'password', 
  $dbname = 'EmailRelationshipClassifier'
);


// Make sure the files are there and the number of training files is
// the same as the number of JSON Class files.
if(($number_of_training_files != $number_of_JSON_files) 
   || ($number_of_training_files == 0 || $number_of_JSON_files == 0) ){
  die(PHP_EOL . 'ERROR! the number of training files and classification files are not the same or are zero! Run CreateClassificationFiles.php first.');
}
else{
  // Loop Through Files
  for($current_file = 0; $current_file < $number_of_training_files; $current_file++){
    $myTokenizer->TokenizeFile($myEmailFileManager->NextFile());		
    $EmailClassifications = json_decode(LoadClassFile($myJSONFileManager->NextFile()), true);
    // Loop Through Tokens
    foreach($myTokenizer->tokens as $word=>$count){
      $myDatabaseManager->AddOrUpdateWord($word, $count, $EmailClassifications);
    }
  }
}

echo PHP_EOL . 'Training complete! You can now run Test.php' . PHP_EOL;

 

Save Train.php in the root project folder:


[EmailRelationshipClassifier]
│
├── CreateClassificationFiles.php
├── DatasetSplitAdviser.php
├── database.sql
├── Train.php 
│
├── [Classes]
│   │
│   ├── DatabaseManager.Class.php
│   ├── FileManager.Class.php
│   └── Tokenizer.Class.php
│
└── [DataCorpus]
    │
    ├── [TestData]
    │
    ├── [TestDataClassifications]
    │
    ├── [TestResults]
    │
    ├── [TrainingData]
    │
    └── [TrainingDataClassifications]

 

Of course the complexity does exist inside the Objects, it’s just advantageous to obfuscate it here using the Object methods so that we can focus on the task of training rather than the details of moving the data around.

Once all the classes have been included and the objects instantiated & configured there is a check to confirm the .txt & JSON files exist and that the number is the same.

If none of the fail conditions trigger the die() function then for all the training files (.txt emails),   the $myTokenizer Object will ask the $myEmailFileManager Object for the next file in it’s list which it will load and tokenize, which means that it builds a “bag of words model” of the email, specifically “unigrams“.

Then the JSON relationship class file will be loaded and decoded into an array of “key & value pairs ” where the key is the relationship class name and the value is either a zero or one (0/1) where one denotes relationship class membership and zero denotes a lack of class membership.

Then for each unigram word token the $myDatabaseManager Object will perform it’s AddOrUpdateWord() method.

The AddOrUpdateWord()  method accepts the unigram word token as the  first argument, the number of times it appears in the training file as the second argument and the relationship class memberships array as the third argument. The word is then either added to the Words table in the database or updated.

You can review the details of the database in my post Email Relationship Classifier Database.

After all the words in all the training emails have been processed the training is complete and we’re ready to test our bot which I’ll cover in an upcoming post.

If you enjoyed this post please support me on Patreon for as little as $1 a month, thank you.

 

 

Much Love,

~Joy

Email Relationship Classifier Classifying Emails

Welcome back, I hope you have been enjoying the previous posts in my Email Relationship Classifier series:

A Bag of Words
A Bag of Words
Relationship Classifier Class Files
Relationship Classifier Class Files
Email Relationship Classifier Database
Email Relationship Classifier Database

 

 

 

 

 

 

 

 

 

 

 

 

 

Today we are going to cover the process for hand classifying emails as was outlined in my post A Bag of Words.

 

Project Folder Structure

So, to get started lets make sure that we setup the folders we will need.

Inside the root project folder [EmailRelationshipClassifier] create a [DataCorpus] folder and inside that folder we want to create five sub-folders but we will only work with four today: [TestData], [TestDataClassifications], [TrainingData], [TrainingDataClassifications]

The fifth folder in the [DataCorpus] folder needs to be [TestResults] but we won’t need it today.


[EmailRelationshipClassifier]
│
├── CreateClassificationFiles.php
├── DatasetSplitAdviser.php
├── database.sql
│
├── [Classes]
│   │
│   ├── DatabaseManager.Class.php
│   ├── FileManager.Class.php
│   └── Tokenizer.Class.php
│
└── [DataCorpus]
    │
    ├── [TestData]
    │
    ├── [TestDataClassifications]
    │
    ├── [TestResults]
    │
    ├── [TrainingData]
    │
    └── [TrainingDataClassifications]

Once your folders are setup, you can refer to A Bag of Words Steps 3 & 4 … I’ll wait…

Did you give it a quick review? Good!

 

Split Your Emails

So what we want to do is update the number of emails you have in the DatasetSplitAdviser.php file as well as the ratios you want for the TrainingData… you did read  A Bag of Words Steps 3 & 4 right?

Run DatasetSplitAdviser.php which will tell you how to split your data.

DatasetSplitAdviser.php Output:

You chose to have 88% of your Data Corpus used as Training Data.

You have 10278 emails so using a ratio split of 0.88 : 0.12
You should split your emails like this:

Training Emails: 9045
Test Emails: 1233

Formula
(10278 x 0.88) = RoundUp(9044.64) = 9045
(10278 x 0.12) = RoundDown(1233.36) = 1233

 

Now place the correct numbers of emails in the [TrainingData] & [TestData] folders. The emails should be .txt files and should contain nothing but the subject and body of the email.

Now you need to run CreateClassificationFiles.php

CreateClassificationFiles.php

<?php

// We will pass our JSON to this function to save the classifications
// in a human friendly/editable format.
function CreateClassFile($file_name, $output_path, $class_json){
	// Write file contents
	$file_handle = fopen($output_path . basename($file_name, '.txt') . '.json', 'w');
	$file_data = fwrite($file_handle, $class_json);
	fclose($file_handle);
}


// Include Classes
function ClassAutoloader($class) {
    include 'Classes/' . $class . '.Class.php';
}
spl_autoload_register('ClassAutoloader');


// Instantiate Objects
$myTokenizer = new Tokenizer();
$myFileManager = new FileManager();
$myDatabaseManager = new DatabaseManager();


// Configure Tokenizer Object
// No Tokenizer config needed

// Configure FileManager Object for TrainingData
$myFileManager->Scan('DataCorpus/TrainingData');
$number_of_training_files = $myFileManager->NumberOfFiles();

// Configure DatabaseManager Object
$myDatabaseManager->SetCredentials($server = 'localhost', 
                                   $username = 'root', 
                                   $password = 'password', 
                                   $dbname = 'EmailRelationshipClassifier'
                                   );                                  

// This system bifurcates the class data twice into sender and recipient
// groups so below we pull the class list from the database using the
// $myDatabaseManager->GetKnownClasses() method.
// After which we create keys in the $classifications using the class 
// names and appending -Sender and -Recipient respectively.
$classifications = array();
$myDatabaseManager->GetKnownClasses(); 
foreach($myDatabaseManager->classifications as $class=>$value){
	$classifications["$class-Sender"] = '0';
}
foreach($myDatabaseManager->classifications as $class=>$value){
	$classifications["$class-Recipient"] = '0';
}
// Convert the $classifications array to JSON
$class_json = json_encode($classifications, true);
$class_json = str_replace('","', "\",\n\"", $class_json); // make easier for humans to read


// Now we generate a JSON class file for each text file in TrainingData
if($number_of_training_files > 0){
	// Loop Through Files
	for($current_file = 0; $current_file < $number_of_training_files; $current_file++){
		CreateClassFile($myFileManager->NextFile(), 'DataCorpus/TrainingDataClassifications/', $class_json);
	}
}

// reConfigure FileManager Object for TestData
$myFileManager->Scan('DataCorpus/TestData');
$number_of_test_files = $myFileManager->NumberOfFiles();

// Now we generate a JSON class file for each text file in TestData
if($number_of_test_files > 0){
	// Loop Through Files
	for($current_file = 0; $current_file < $number_of_test_files; $current_file++){
		CreateClassFile($myFileManager->NextFile(), 'DataCorpus/TestDataClassifications/', $class_json);
	}
}

echo PHP_EOL . 'Classification files have been created! You can now run Train.php' . PHP_EOL;

 

What CreateClassificationFiles.php does is create a JSON file in the [TrainingDataClassifications] & [TestDataClassifications] folders named after the .txt email that will let you enter a 1 on all classes that the email reflects.

Below is an example of the JSON relationship class file, note I updated some classes to 1 which means that the email this file is associated with reflects the selected classes. You should leave classes not present as 0.

Example JSON


{"Colleague-Sender":"0",
"Employee-Sender":"0",
"Manager-Sender":"0",
"Employer-Sender":"0",
"Spouse-Sender":"0",
"Husband-Sender":"0",
"Wife-Sender":"0",
"Parent-Sender":"0",
"Father-Sender":"0",
"Mother-Sender":"0",
"Child-Sender":"1",
"Son-Sender":"1",
"Daughter-Sender":"0",
"Sibling-Sender":"0",
"Brother-Sender":"0",
"Sister-Sender":"0",
"Grandparent-Sender":"0",
"Grandfather-Sender":"0",
"Grandmother-Sender":"0",
"Grandchild-Sender":"0",
"Grandson-Sender":"0",
"Granddaughter-Sender":"0",
"Uncle-Sender":"0",
"Aunt-Sender":"0",
"Cousin-Sender":"0",
"Nephew-Sender":"0",
"Niece-Sender":"0",
"Friend-Sender":"0",
"Colleague-Recipient":"0",
"Employee-Recipient":"0",
"Manager-Recipient":"0",
"Employer-Recipient":"0",
"Spouse-Recipient":"0",
"Husband-Recipient":"0",
"Wife-Recipient":"0",
"Parent-Recipient":"1",
"Father-Recipient":"1",
"Mother-Recipient":"1",
"Child-Recipient":"0",
"Son-Recipient":"0",
"Daughter-Recipient":"0",
"Sibling-Recipient":"0",
"Brother-Recipient":"0",
"Sister-Recipient":"0",
"Grandparent-Recipient":"0",
"Grandfather-Recipient":"0",
"Grandmother-Recipient":"0",
"Grandchild-Recipient":"0",
"Grandson-Recipient":"0",
"Granddaughter-Recipient":"0",
"Uncle-Recipient":"0",
"Aunt-Recipient":"0",
"Cousin-Recipient":"0",
"Nephew-Recipient":"0",
"Niece-Recipient":"0",
"Friend-Recipient":"0"}

 

You need to classify ALL the emails you have before proceeding to the next steps after this post.

I recognize there are better ways to do this  (either cleaner with fewer files & folders or simpler like storing the data in a database) but since we’re building a prototype I am focusing on “function over form” in order to get this project “off the ground” as quickly as possible… I went with arguably the fastest method to implement which simple files and folders. I won’t worry about it at this time however you can definitely improve upon this “proof of concept” implementation quite easily.

Further, editing raw JSON files by hand (while better than nothing) isn’t my idea of a “good time” so I would advise you to build a second system to display the emails and classes together and have my analysts tag the emails from a web page as that would simplify everything… I will also leave that for you to implement as well as it is relatively trivial to build and not critical at this juncture, though if you guys really want it or if it bugs me enough I’ll build that system too. 😛

At this point (after all your emails are classified) all that is left to do is to build the bot then train and test it which I’ll cover in an upcoming post.

I hope you enjoyed this post and consider supporting me on Patreon.

 

 

Much Love,

~Joy

 

Create a free website or blog at WordPress.com.

Up ↑

%d bloggers like this: