Search

Geek Girl Joy

Artificial Intelligence, Simulations & Software

Bot Generated Stories II

In my last post Bot Generated Stories  I left off describing how text based story generation can give way to a sort of  “holodeck”  virtual reality where you don’t just read a story but can explore an entire simulated world built around giving you a narrative experience unique to your preferences and choices.

The first step is to build a “writer bot” that isn’t quite as good as a human writer (but capable none the less) so that it can work along side a human and aid in the writing process. This would allow the bot to rely on the human to determine what is “interesting” while the bot offers suggestions when a sort of “say something” button is pushed though my friend Oliver suggests the phrase “gimme some magic”. 😛

As described, this bot would act as a “digital muse”  of sorts, offering suggestions along the way with a human selecting and writing the details from a set of possibilities while allowing the human author to throw out the bots suggestions and take the story in completely different directions than what the bot generated.

In many ways my “writer bot” is far from this goal because it fails when it comes to generating sentences that have actual meaning and correlation with the desired topic between clauses but I will talk about this in more depth in another article.

What my bot is good at is generating sentences that are better than random and I can illustrate this quite simply.

 

My First Attempt: Yule’s Heuristic’s

My first experiments used a bot with random word selection from a very large word list to produce content.

It’s important to note that I did not expect good results I just needed something to compare all future attempts against and random selection seemed like the worst way to do it.  If any of my bots along the way produced content even slightly better than random it would be a step in the right direction.

My initial methodology was basically just to pull words at random from the built in Linux dictionary and throw in the occasional period or comma (no commas shown in example below) to create sentences and clauses. I then concatenated those pseudo sentences and randomly added a break to create paragraphs.

Also, mostly for my own amusement I programmatically generated a “contents” section with chapter titles and page numbers that line up, though outside of those “rules” the following was pseudo randomly generated.

Note this is the first output I generated when I first began working on creating a “writer bot” (it’s terrible – though some of it a amusing):

Yule's Heuristic's
GeekGirlJoy

Table of Contents

Chapter 1: Rebroadcast's Sulkies Borough Whitewashes Swim........................ 3
Chapter 2: Culottes Mutability's Corroborations Moet's Competent................ 29
Chapter 3: Guesting Unicycles Neckerchieves Studious Oviduct.................... 52
Chapter 4: Penes Unknown Mileposts.............................................. 64
Chapter 5: Cupboard Exult Tower's............................................... 78
Chapter 6: Letha Bookmarking Kmart's Concentrate................................ 92
Chapter 7: Defensiveness Fielder Input Kilometers.............................. 112
Chapter 8: Bugging Outperforms Assault's....................................... 126
Chapter 9: Meany Conviviality's Unintelligent Plods Yards...................... 146
Chapter 10: Coal's Euphemism Union's Heterosexuality's......................... 166
Chapter 11: Ill Atrocious Inputting Moderator.................................. 180
Chapter 12: Marrieds Weissmuller Surrendering.................................. 200
Chapter 13: Convene Asylum's Dustiness's Permeated............................. 211
Chapter 14: Methodist's Prosecuted Jewelers.................................... 222
Chapter 15: Remoteness's Goblin Freeholder's Sixth's........................... 231
Chapter 16: Provo Peafowls Offensiveness's Bonsai's............................ 244
Chapter 17: Personal's Diastolic Questioning................................... 256
Chapter 18: Agitates Contingency's Gastronomy's Lineup's Gallic................ 266
Chapter 19: Garbling Poked Pithiest Depp Specialists........................... 289
Chapter 20: Lit Condolences Webb Levying Laurel................................ 302


Chapter 1: Rebroadcast's Sulkies Borough Whitewashes Swim

Delawarean Paraná harbinger diodes tutoring repairman slice posits blamer. Classicist boor's betting Markham chunky Monroe's wasting authorize abductors glance's vatting. Installments skateboarded stein lining's goodbye interstice critiqued onslaught's mute's failing's.

Hell's Egyptian's Battle compensates handsomest rookeries droves taxidermist's spaciousness's expunged majors standstill culpability. Viscus's absorbents mutability's Whitsunday's Matthew Socrates nitrate's dwarfism opulence's diffuse budgerigars silenter perversity's. Quart Nescafe newspaperwoman's guest sidestroke outdistancing scald workingmen waggle overlapping.

Flagons crochet's compunction duties Elysée objecting mace headrest's chlorinating enraptured softly enmeshed Bessemer's. Prays bracelets reamed Lagos's moisture particular's foisting. Okra Virginia's granddaughter's kronor individualism's sightseers haziest wagons sandiest Appaloosa's overcasting Fisher Minos's.

Cogwheel's nutritionist Ares erogenous inconsistent gummiest sachem lien connection skivvies successor secretion. Eisenhower supernatural Freemasonry's rostrum Rudolph's causeway's ocean's. Exhortations quibbles recounted innocent intermissions academician's hardwoods lard hindsight's austerest dabbled scalawags.

Despicable topography's narration's glaze's homograph's molehill's doyens zoo malnutrition's neutralization's. Hunts Schultz footnote Kroc's proton processioning inadvertence's Mars dialed Noemi prithee Sheena Parrish. Rightist's departure's padre Joule mangled roughneck's jazziest affiliate spinoffs cops scrubbing deter.

Consenting Kevin's budgies Balinese's neat warthogs plumb scapegoating bombardiers Burke hoppers. Storyteller's rationals lethargy jitterbug's poorhouse threats pipeline extolled jogs grandee. Vastness alluvial's bloodstain experimentation cigaret Karenina's Orval's thereto banishment abattoir's Chrysler abducts Yolanda's.

...

And it goes on like that for 312 pages of mind numbing random goodness. 😛

That bot had a huge vocabulary but clearly using randomly selected words is terrible!

Not a single sentence was coherent! Which is actually what I expected so Yule’s Heuristic’s was a success in that it failed as planed though I had to go beyond random text if I wanted close to resembling a story.

How then did I get my bot to generate A Halloween Tale?

We’ll talk about that and more in my next post Rule Based Story Generation.

Please remember to like, share & follow!

Your financial support allows me to dedicate the time & effort required to develop all the projects & posts I create and publish here on my blog.

If you would also like to financially support my work and add your name on my Sponsors page then visit my Patreon page and pledge $1 or more a month.

As always, feel free to suggest a project you would like to see built or a topic you would like to hear me discuss in the comments and if it sounds interesting it might just get featured here on my blog for everyone to enjoy.

 

 

Much Love,

~Joy

Advertisements

Bot Generated Stories

Many of my readers know last year for Halloween I published A Halloween Tale where I used a self built (from scratch & in PHP no less 😎 ) “writer bot” to write the entire story I published in that article.

To this day I would argue A Halloween Tale is still the best example available online of bot written fiction and I dare you to find a more coherent story! 😛

I trained my bot on Jules Vern’s 20000 Leagues Under the Sea, Bram Stoker’s Dracula & Mary Shelley’s Frankenstein, in the hopes of generating a sort of Adventure Horror story because I wanted something kinda “spooky” to publish for Halloween… but i’m getting ahead of myself.

I will discuss my “writer bot” more in future posts. Today I’d like to start the discussion with some of my thoughts on generative writer bots in general.

Why Build a Generative Writer Bot?

You see, I believe that generative robots like my “writer bot”, though more advanced will completely change the way people produce & consume media in the not to distant future.

Consider that Amazon.com is a book store (the largest), and it sells digital ebooks in ever growing numbers. Consider too that every Movie / TV Show and Netflix series has a written script.

Millions of magazines and news papers are printed and sold around the world each and every day, not to mention all the blog posts that are published.

Just about every product you can think of has some form of written communication involved with the buying, selling, transporting and or the use of that product.

Estimates say that there is a good chance a bot will write a “best seller” novel within the next 10 – 15 years and it’s important to note that isn’t time to completion, that’s time till it’s so good that the bot will do better than most human writers ever will!

A bot that can write “coherently” is much closer than 10 years!

The so called “best seller” robot is easily worth a trillion dollars to it’s creator due to the capacity of the robot to disrupt the entire writing industry!

 

A Vision of Things to Come

This type of bot offers push button custom content that can be tailor made to the exact preferences of the reader… or company that rents it from you… yeah “rents” because it’s the kind of thing you sell as a service for sure!

Imagine having a long trip home on a train, jet or self driving Uber… and having anything from a short story all the way up to a novel written just for you!

But it doesn’t stop there… as I droned on above, EVERYTHING is written and anything written has a cost associated with it.

For the writer the cost it time, the longer it takes to finish any given work reduces the overall value of the work due to fewer hours to allocate to other paid projects.

This type of bot would also benefit companies who employ people to write for them because their writers will be more efficient which means they can pay fewer writers to handle their content generation needs.

Ultimately, there is the possibility a writer bot could get so good that it might supplant the need for human writers entirely outside of specific areas of expertise.

While that may horrify many authors, if that does happen it promises to usher in the ability for everyone to have content generated that tells the story they want to read, see or hear at the push of a button.

 

Generative Stories Are More Than Just Words

This is all more than just words though.

If a generative writer bot is capable of generating a “best seller” novel then it is not hard to see how the same process of narrative arc generation and management as well as character creation… not to mention the conversations the characters have with (or about) each other as well as the objects they interact with and the environments they exist in… can be applied to other uses such as writing scripts for movies, shows… podcasts? 😛

At the core of this bot is a system that can manage complex environments and interactions verbally and describe consistently and coherently what is occurring.

It’s not hard to imagine combining the narrative generating capacity of this type of bot with an animation system hypothetically allowing you to generate a story and then programmatically illustrate or even animate it leading to on demand TV with shows that are all about your personal interests!

Further, if you can animate it… it can be made interactive so it’s not much of a stretch to extend the system so that you could play a VR Novel (like the Holodeck on Star Trek but with VR goggles) where the story is written on the fly and the world is generated so that you can have any experience you want.

I believe this is coming and I will share more of my thoughts on the subject in upcoming posts.

I hope you enjoyed today’s post,  please like, share & follow!

Read Part 2: Bot Generated Stories II

Your financial support allows me to dedicate the time & effort required to develop all the projects & posts I create and publish here on my blog.

If you would also like to financially support my work and add your name on my Sponsors page then visit my Patreon page and pledge $1 or more a month.

As always, feel free to suggest a project you would like to see built or a topic you would like to hear me discuss in the comments and if it sounds interesting it might just get featured here on my blog for everyone to enjoy.

 

 

Much Love,

~Joy

My Supporters

Today I would like to thank you all for reading my blog.

Your time is valuable to me and knowing that everyone out there enjoys my work is very gratifying but also quite motivating to me to constantly try to bring you something new and interesting with each post I publish.

As a result I have seen my daily & weekly readership numbers continue to increase and I would like to take this opportunity welcome everyone new.

Obviously I can’t do this without you guys and I would also like to take this opportunity to thank a few very special readers of mine who not only enjoy my work but also have pledged to financially support my content over on Patreon.

PATREON SPONSORS

Gabriel Kunkel   &   Shanon Garcia

Your financial support allows me to dedicate the time & effort required to develop all the projects & posts I create and publish here on my blog. Your patronage allows people all around the world to learn about technology and computer science.

If you would also like to financially support my work and add your name on my Sponsors page then visit my Patreon page and pledge $1 or more a month.

As always, feel free to suggest a project you would like to see built in the comments and if it sounds interesting it might just get built and featured here on my blog for everyone to enjoy.

 

 

Much Love,

~Joy

SVG Platformer

Can you go from art to program in one or two steps? Well, that’s what today’s post is about.

One of the cool things I remember about web development from years ago was Adobe Flash.

Before you boo me, hear me out!

I’m not saying Flash is a better technologically than HTML5, The Name of Your Favorite JavaScript Framework, CSS, Web Assembly etc… However one place Flash excelled was visual design & layout… an important part of the web!

The problem with many modern tools isn’t that they can’t convey design, it’s that they decouple the design and the development processes!

In practice this means writing code to describe the elements of your software like HTML and later writing more code to style the elements (like CSS, SASS or LESS), none of which is actually visual, though you definitely can get some great results!

Flash Builder (or whatever it was called) was half art studio and half IDE (Integrated Development Environment) where you could draw anything and it was an “object” and you could write code (ActionScript) to control it’s behavior. It wasn’t a mockup or illustration, it was the actual program!

As I recall, once the switch to ActionScript 3 was made the ability to store your code on the objects themselves was depreciated in favor of using references and listeners stored in the main keyframe timeline… I preferred keeping my code on the objects themselves but I digress.

Even with the change to where you stored your code you could still accomplish anything you wanted with the centralized keyframe code and some developers even found this easier to maintain than storing the code on the components.

You would setup your scripts on layered keyframe’s that extended to the last keyframe used in the project, or the last frame that needed that code and by using a sort of “goto keyframe name or id” method you could actually build complex applications quite easily, and more importantly… visually!

That’s why all the games used to be made with Flash, you could basically draw a picture and then turn it into an animation or even a full program in a couple of hours. This meant you were free to experiment & push boundaries.

Now, yes of course there are visual workflows you can use today.There are WYSIWYG editors and CMS App Platforms like WordPress, Drupal & Joomla not to mention the full featured layout capabilities of site builder tools like Wix.

Fundamentally though these tools facilitate laying out HTML elements and applying CSS and maybe some JavaScript via a drag and drop interface. Which is significantly faster than doing visual development via code in my opinion, though I am not arguing it is inherently “better”.

Unlike the aforementioned tools which specialize in “page based” HTML applications, Flash was an element or object that you embed into your page that used Vector Graphics to create lossless re-sizable images, animations and applications.

Inside the Flash movie/app you could draw anything and you were not constrained to HTML elements but you were also not required to code the visual elements.

This made for a wonderfully rapid prototyping experience that I was unable to reproduce until I tried working with Unity 3D which describes itself as “the ultimate game development platform” though I’d go so far as to describe it as “the ultimate app development platform”.

Think about it, at the time of writing this Unity supports 25 Platforms including Desktops (Win/Mac/Linux), the mobile OS’s, and all the major gaming consoles, not to mention the Smart TV’s, Watches etc…. Any platform you want your app on, including the web, well… Unity pretty much supports it right out of the box. Oh, and it’s free until you make $100K a year with it, not too shabby!

The catch? Well, its highly optimized but leans in the 3D gaming direction (though I’ve built 2D apps with Unity) so the applications it produces tend to have a larger size (as far as my tests go) than if you used PhoneGap/Cordova or went native. My guess is this is due to the embedded physics engine and graphics rendering code that gets packaged with the app but i’m only guessing, and there are a few options that let you exclude unnecessary things from the compiled app.

Then again, you may be able to make use of those features in your app so it need not be a negative either.

In any case, the problem as I see it with Unity is that the builder isn’t readily available on Linux, but it will build for Linux ❓ Maybe they should build Unity with Unity so that it can Unity being Unity… 😛

I am aware they kinda released a limited version for Linux… but I could never seem to make it work right and the truth is that it takes some decent (but not outrageous) resources to run the Unity builder application so most micro computers are out and sadly it won’t run on the ARMf architecture so using a Raspberry Pi to do Unity development is just not happening.

Is there another way?

Well, there is a modern Vector Graphic format available for the web called SVG that is basically XML code that can be written in a text editor or it can be drawn using a program like Inkscape (free and what I use) or Adobe Illustrator if you prefer a commercial paid tool.

Since SVG is code, if you place the code inside your HTML (sadly not link to or embed) you don’t just get a static vector image but instead you get elements that are accessible via the DOM (Document Object Model) that you can manipulate using CSS and JavaScript.

That last part should really interest you if you enjoy rapid application prototyping!

Which is the origin of this project, I wanted to know… could I rapid prototype an application by just drawing a picture and writing some code?

Understand that I am not talking about drawing a picture, slicing it then building an app from sliced components or using the sliced images as placeholders or writing code to draw the slices onto a canvas context.

I challenged myself to see if I could only write code that was part of the core functionality and not basic graphic asset creation and certainly not the code to display it, just manipulate it.

I set about creating a very simple “proof of concept” a while back that is basically a “Die Rolling App” that you can view a live example of here: SVG Roller though I never wrote about it. Roller is half image and half app but very basic.

Roller consists mainly of showing and hiding elements in the SVG based on a button click and a random number… good but not all that flashy!

Recently I have been wanting a better SVG application that would be more visual and expand on what I have already done but retain the simplicity of “Draw It then Code it”.

So, I opened up Inkscape and drew this image:


I grouped all the associated assets and gave them id’s like “cloud1”, “player”, “coin2” etc… then saved the image as demo.svg and closed Inkscape.

Why a game? Well, it’s more visual than my SVG Roller and I think it illustrates more of what is possible with an SVG app.

After that I opened demo.svg with a text editor and copied the SVG code into the body tag of my HTML file (remember you can’t link to it you have to include the code in the HTML).

I then wrote a little CSS that helps position the SVG on the page, applied a background color, disabled text highlighting and changed the cursor to the hand icon when the mouse is over a button, minimal CSS.

After that I wrote the JavaScript that turns the image into a playable application.

Game.js

Here is all the code that makes the SVG Platformer game demo work:

var keyEvents = {}; // keyboard state object

// Listen to keyboard outside of game loop to be less "blocky"
var onkeydown = onkeyup = function(key){
  key = key || event; // IE Fix 😦

  if(key.type == 'keydown'){
    keyEvents[key.keyCode] = true;
  }
  else{
    keyEvents[key.keyCode] = false; 
  }
  //console.log(keyEvents);
}


var game = document.getElementById('game'); // A reference to the SVG
if(game){
    game.addEventListener("load",function(){
    ///////////
    // Functions
        
    // Clear Instructions    
    // removes the instructions element
    function ClearInstructions() {
      instructions.remove();
    }

    // Get Position
    // This function gets the current (x,y) cordinates of GetPosition(object)     
    function GetPosition(object){
      var transformlist = object.transform.baseVal;
      var group = transformlist.getItem(0);
      var X = 0;
      var Y = 0;
      if (group.type == SVGTransform.SVG_TRANSFORM_TRANSLATE){
        X = group.matrix.e;
        Y = group.matrix.f;
      }
      return [X, Y];
    }
    
    // Collide
    // A basic box collision detector
    function Collide(element1, element2) {
      var collisionBox1 = element1.getBoundingClientRect();
      var collisionBox2 = element2.getBoundingClientRect();

      return !(collisionBox1.top > collisionBox2.bottom ||
        collisionBox1.right < collisionBox2.left ||
        collisionBox1.bottom < collisionBox2.top ||
        collisionBox1.left > collisionBox2.right);
    }

    // Inside
    // A basic inside box collision detector
    function Inside(element1, element2) {
      var collisionBox1 = element1.getBoundingClientRect();
      var collisionBox2 = element2.getBoundingClientRect();

      return (collisionBox1.top <= collisionBox2.bottom && 
        collisionBox1.bottom >= collisionBox2.top && 
        collisionBox1.left <= collisionBox2.right && 
        collisionBox1.right >= collisionBox2.left);
    }
      
    // Get Bank Total
    // Get the number of diamond or coins the player has
    function GetBankTotal(element){
      var currentValue = element.textContent;
      return parseInt(currentValue);
    }
    
    // Collect 
    // Increment the Coin or a Diamond "player bank"
    function Collect(element){
      element.textContent = GetBankTotal(element) + 1;
    }
      
    
            
    ///////////
    // Game play
    
    // Set the "constants"
    var step = 1;
    var jump = 20;
    var gravity = 1.5;

    // Setup references to the "named" SVG XML elements
    var gameOver = game.getElementById("gameover"); // A hidden "eater/detector" element below the play area to detect player death
    var instructions = game.getElementById('instructions');  // Instructions element
    var gameOverMenu = game.getElementById("gameovermenu");  // Game over screen element
    var player = game.getElementById("player");              // The player element
    var playerCoins = game.getElementById("playercoins");    // The "bank" element showing how many coins the player has collected
    var playerDiamonds = game.getElementById("playerdiamonds");// The "bank" element showing how many diamond the player has collected

    //  Setup references to the "named" SVG XML coin elements
    var coinPieces = ['coin1', 'coin2'];
    var coins = [];
    coinPieces.forEach(element => {
      coins.push(document.getElementById(element));
    });
        
    // Setup references to the "named" SVG XML diamond elements
    var diamondPieces = ['diamond1'];
    var diamond = [];
    diamondPieces.forEach(element => {
      diamond.push(document.getElementById(element));
    });

    // Setup references to the "named" SVG XML ground elements
    var terrainPieces = ['ground1', 'ground2', 'ground3', 'ground4'];
    var terrain = [];
    terrainPieces.forEach(element => {
      terrain.push(document.getElementById(element));
    });

    var winningBankTotal = diamondPieces.length + coinPieces.length;


    // Clear the instructions after 3 seconds
    setTimeout(ClearInstructions, 3000);


    // Redraw Game Loop
    var redrawRate = 30; // microseconds
    var gameLoop = setInterval(function(){
      fall = true; // always try to fall
      allowedToJump = false;// disallow jumping because player might be falling
      allowedToMove = true; // allow moving until the player is dead

      // Check for collisions with ground elements
      terrain.forEach(ground => {
        // If there is a collision with the ground
        if(Collide(player, ground)){
          fall = false; // Stop falling
          allowedToJump = true; // Allow jumping
        }
      });

      // if player fell below the ground
      if(Inside(player, gameOver)){
        fall = false; // stop falling
        allowedToJump = false; // dont allow jumping
        allowedToMove = false; // player is dead stop player movment
        ClearInstructions(); // just in case
        gameOverMenu.style.display = "inline"; // show game over menu
      }


      // if there was no collision between a ground element and
      // the player
      if(fall === true){
        position = GetPosition(player); // get updated player position
        allowedToJump = false; // dont allow jumping
        player.transform.baseVal.getItem(0).setTranslate(position[0], position[1] + gravity); // player falls
      }


      if(allowedToMove === true){
        position = GetPosition(player); // get updated player position
        // keyboard movment
        // left || a
        if (keyEvents[37] === true || keyEvents[65] === true) {
          if(position[0] > -10){
            player.transform.baseVal.getItem(0).setTranslate(position[0] - step, position[1]);
          }
        }
        // right || d
        if (keyEvents[39] === true || keyEvents[68] === true) {
          if(position[0] < 140){
            player.transform.baseVal.getItem(0).setTranslate(position[0] + step, position[1]);
          }
        }
        // up || w || space
        if ((keyEvents[38] === true || keyEvents[87] === true || keyEvents[32] === true) && allowedToJump === true) {
          player.transform.baseVal.getItem(0).setTranslate(position[0], position[1] - jump);    
        }
        // down || s
        if (keyEvents[40] === true || keyEvents[83] === true) {
          //console.log("Down");
        }
      }


      // Item Collection        
      // Collect coins
      coins.forEach(coin => {
        if(Inside(player, coin)){
          coin.remove();
          Collect(playerCoins);
        }
      });

      // Collect diamond
      diamond.forEach(diamond => {
        if(Inside(player, diamond)){
          diamond.remove();
          Collect(playerDiamonds);
        }
      }); 

      // Check for Win
      if((GetBankTotal(playerDiamonds) + GetBankTotal(playerCoins)) == winningBankTotal){
        clearInterval(gameLoop); // stop the game
        fall = false; // stop falling
        allowedToJump = false; // dont allow jumping
        allowedToMove = false; // player won stop player movment
        gameOverMenu.style.display = "inline"; // show game over menu
        // Change Game Over text to You Win
        game.getElementById("gameovermessage").textContent = '  You Win!';
      }

    }, redrawRate); // Game Loop - redraw every 30 microseconds
  }); // game load event listener
} // if game

 

As you can see from the code it supports WASD as well as the arrow keys and spacebar for movment. There is a win condition if you collect the two coins and the single diamond. You lose if you fall into one of the two spike pits.

Overall I am pleased with what I accomplished however the collision detection could be improved and there is quite a bit of room for improving how items are collected and enemies would be nice in addition to a larger/longer level, and maybe even a parallax scrolling  effect on some background elements as the player moves would also be nice, though again, I am happy with how it turned out.

You can Play a Live Demo: Here

You can Get the Code: Here

You can find a list of all my other posts on my Topics and Posts page.

I hope you enjoyed reading about and playing this SVG game prototype.

Your financial support allows me to dedicate time to developing projects like this and while I am publishing them without cost, that isn’t to say they are free. It takes me a lot of time and effort to build and publish projects like this for your enjoyment.

So I ask that if you like my content, please support me on Patreon for as little as $1 or more a month.

Feel free to suggest a project you would like to see built in the comments and if it sounds interesting it might just get built and featured here on my blog for you to enjoy.

 

 

Much Love,

~Joy

Button CSS Generator

Today we’re going to talk about a limited “generative system” that “writes” CSS code though I wouldn’t call this a ‘bot’.

What this generator does is “explore” the existing possibilities of a predefined problem space using known ranges and random selection. 😉

I wrote some of my thoughts regarding “generative systems” in relation to programmers and software developers in my article The Death of the Programmer and I also implemented a generative “writer bot” that I trained and used to write my article A Halloween Tale.

However in those cases I was mainly referring to semi-intelligent (in regard to the problem space) systems (“bots”) that are able to make choices on their own their own, and or are capable of learning, evolving or growing in some capacity based on input or state but that is not what we are doing today.

This is a simple generative tool (the Button CSS generator I provide below) is useful but it has no knowledge of what looks good (or bad) and no way to learn them either!

It can only provide you with options but it is far from replacing you.

You can view a Live Preview: Here 

Though in case you wanted it, here’s a screenshot:

There are only 3 files used to create the Button CSS generator: style.css, functions.js & index.html

Style.css

This is the CSS styles used by the index.html file. You will notice that there is a base button CSS template that all the buttons share when the page is first loaded.

The “hardcoded” styles are: text-align,  text-decorationdisplay, padding, font-size.

Once the page is loaded however I use JavaScript to generate and apply additional CSS attributes which are responsible for the variations in each button that you see.

The style.css file is an external style sheet and is linked to using <link rel=”stylesheet” type=”text/css” href=”style.css”> in the HTML file.

 

h1{
    width:100%;
    text-align: center;
}
    
button {
    /* hardcoded styles*/
    text-align: center;
    text-decoration: none;
    display: inline-block;
    padding: 15px 32px;
    font-size: 1.5em;
}

table{
    text-align: center;
    width:100%;
}

caption{
    font-size:2em;
}

#randomize{
    background-color: rgb(62, 176, 239); 
    color: rgb(138, 219, 215);
    border-color: rgb(28, 65, 39);
    border-width: 11px;
    border-style: dotted;
}

#show-css-wrapper{
    width: 100%;
    text-align: center;
    margin-left: auto;
    margin-right: auto;
}

#show-css-title{
    width: 100%;
    text-align: center;
    margin-left: auto;
    margin-right: auto;
    background-color: #999999;
}

#show-css{
    width: 40%;
    min-width: 420px;
    text-align: left;
    margin-left: auto;
    margin-right: auto;
    background-color: #eeeeee;
}

textarea{
    width: 100%;
    height: 200px
}

 

Functions.js

This file contains all the functions that the generator will use to make new button CSS styles.

There is some inline JavaScript in the HTML file that makes use of these functions and I will discuss that in the Index.html section below.

There is a lot of similarity between most of the functions and although much of their functionality can be wrapped up into one function I intentionally implemented the prototype generator this way.

I could say the reason is that I wanted to avoid early optimization but the real reason is that this way is just easier to read for someone who didn’t write the software and or is just learning to code.

Though in truth SetBackgroundColor(element) or SetBorderWidth(element) is not significantly more readable than a more optimized generic function like SetCSS(element, cssProperty,  optional [ value ])  and if I was going to spend more time developing this system I would probably end up going that route because the generic function would be far easier to maintain in the long run simply because it would be one function to maintain instead of many.

The functions.js file is an external script and is linked to the HTML file using the “src” (source) attribute on a script tag.

////////////////////////////////////
// Random Button CSS Generator Functions

// Will return 0 or 1
function CoinFlip(){
    return Math.floor(Math.random() * 2); //0/1
}
    

// Return a random string of RGB values e.g. "255, 255, 255"
function RandomRGBColor(){
    var r, g, b;
    
    r = Math.floor(Math.random() * 256);
    g = Math.floor(Math.random() * 256);
    b = Math.floor(Math.random() * 256);
    
    color = r.toString() + ', ' + g.toString() + ', ' + b.toString();

    return color;
}

// Set SetBackgroundColor()
function SetBackgroundColor(element, color = null){
    // No color given so set to random
    if(color === null){
        element.style.backgroundColor = 'rgb(' + RandomRGBColor() + ')';
    }else{ // Set to provided color
        element.style.backgroundColor = 'rgb(' + color + ')';
    }
}

// Set SetFontColor()
function SetFontColor(element, color = null){
    // No color given so set to random
    if(color === null){
        element.style.color = 'rgb(' + RandomRGBColor() + ')';
    }else{ // Set to provided color
        element.style.color = 'rgb(' + color + ')';
    }
}

// Set SetBorderColor()
function SetBorderColor(element, color = null){
    // No color given so set to random
    if(color === null){
        element.style.borderColor = 'rgb(' + RandomRGBColor() + ')';
    }else{ // Set to provided color
        element.style.borderColor = 'rgb(' + color + ')';
    }
}

// Set SetBorderWidth()
function SetBorderWidth(element, width = null){
    // No width given so set to random
    if(width === null){
        element.style.borderWidth = Math.floor(Math.random() * 14).toString() +'px';
    }else{ // Set to provided color
        element.style.borderWidth = width.toString() + 'px';
    }
}

// Set SetBorderStyle()
function SetBorderStyle(element, style = null){
    // No width given so set to random
    if(style === null){
        
        var styles = ['none', 'hidden', 'dotted', 'dashed', 'solid', 'double', 'groove', 'ridge', 'inset', 'outset', 'initial', 'inherit'];
        
        element.style.borderStyle = styles[Math.floor(Math.random()*styles.length)];
            
    }else{ // Set to provided color
        element.style.borderStyle = style;
    }
}

// Get the CSS for the button that was clicked and show it on the page
function ShowCSS(element){
    var style = window.getComputedStyle(element);
    var backgroundColor = style.getPropertyValue('background-color');
    var color = style.getPropertyValue('color');
    var borderColor = style.getPropertyValue('border-color');
    var borderWidth = style.getPropertyValue('border-width');
    var borderStyle = style.getPropertyValue('border-style');
    var padding = style.getPropertyValue('padding');
    var textDecoration = style.getPropertyValue('text-decoration');
    var display = style.getPropertyValue('display');
    var fontSize = style.getPropertyValue('font-size');
    var css = '';
    css += 'button{\n';
    css += '    background-color: ' + backgroundColor.toString() + ';\n';
    css += '    color: ' + color.toString() + ';\n';
    css += '    border-color: ' + borderColor.toString() + ';\n';
    css += '    border-width: ' + borderWidth.toString() + ';\n';
    css += '    border-style: ' + borderStyle.toString() + ';\n';
    css += '    padding: ' + padding.toString() + ';\n';
    css += '    text-decoration: ' + textDecoration.toString() + ';\n';
    css += '    display: ' + display.toString() + ';\n';
    css += '    font-size: ' + fontSize.toString() + ';\n';
    css += '}\n';

    document.getElementById('css-styles').innerHTML = css;
}

Index.html

Index.html is the core that brings all the pieces of this software together!

You will notice that I used divisional elements (div tags) and a textarea to create the section at the top of the page where the CSS code is shown when you click one of the buttons.

Beneath that is the randomize button hyperlinked to nothing, followed by a table. Why a table and not a responsive grid? Eh… it was faster for the prototype mainly.

Inside each cell of the the table is a button that has an onclick attrabute that passes the element’s ID reference ‘this’ to the function ShowCSS().

Beyond this the only thing of consequence in the HTML file is the inline JavaScript that uses the code from the functions.js script to make everything work.

When the page loads I use an array of strings of element id’s (the buttons) to locate & establish references to the DOM objects for each button and store those references in the buttons array.

Then, for each of the element references in the buttons array I set a background color as well as a font color randomly. After that I “flip a coin” to decide if a boarder should be applied to the button. If yes, I apply a random boarder color, width and style.

Once the code is done running all the buttons will be randomly styled and are ready for you to click them to get the CSS if you like them or click Randomize to get a new set of random buttons.

<html>
<head>
  <title>Random Button</title>
  <link rel="stylesheet" type="text/css" href="style.css"> 
</head>
<body>
  <!-- Title -->
  <h1>Random Button CSS Generator</h1>
     
    <!-- Show CSS -->
    <div id="#show-css-wrapper">
        <div id="show-css">
            <h2 id="show-css-title">Button CSS</h2>
            <p>Select a button to view it's CSS styles</p>
            <textarea id="css-styles">CSS will show here</textarea>        
        </div>
    </div>

    <!-- Table -->
    <table> 
      <tr>
          <td></td>
          <td>
              <h3>
                  <a href="">
                      <button id="randomize">
                          Randomize
                      </button>
                  </a>
              </h3>
          </td>
          <td></td>
      </tr>
         <tr>
          <td>&nbsp;</td>
          <td>&nbsp;</td>
          <td>&nbsp;</td>
      </tr>
      <tr>
        <td><button id="button-1" onclick="ShowCSS(this)">Button 1</button></td>
        <td><button id="button-2" onclick="ShowCSS(this)">Button 2</button></td>
        <td><button id="button-3" onclick="ShowCSS(this)">Button 3</button></td>
      </tr>
      <tr>
        <td><button id="button-4" onclick="ShowCSS(this)">Button 4</button></td>...
        <td><button id="button-5" onclick="ShowCSS(this)">Button 5</button></td>
        <td><button id="button-6" onclick="ShowCSS(this)">Button 6</button></td>
      </tr>
        <tr>
        <td><button id="button-7" onclick="ShowCSS(this)">Button 7</button></td>
        <td><button id="button-8" onclick="ShowCSS(this)">Button 8</button></td>
        <td><button id="button-9" onclick="ShowCSS(this)">Button 9</button></td>
      </tr>
    </table>

 
    <!-- Include the Functions -->
    <script src="functions.js"></script>    
    <script>
    // Button Element ID's
    var elementsIDs = ['button-1', 'button-2', 'button-3', 'button-4', 'button-5', 'button-6', 'button-7', 'button-8', 'button-9'];

    var buttons = []; // Array to Store Button References

    // Setup Element References to the Buttons
    elementsIDs.forEach(element => {
      buttons.push(document.getElementById(element));
    });

    // Apply Random CSS to each Button
    buttons.forEach(element => {
      SetBackgroundColor(element); // Set Initial Background Colors
      SetFontColor(element);       // Set Initial Font Colors

      if(CoinFlip() == 1){ // Has border?
        SetBorderColor(element);
        SetBorderWidth(element);
        SetBorderStyle(element);
      }
       
    });

    </script>
</body>
</html>

 

Where to go from here

As a proof of concept I think I accomplished my goal of creating a functional Button CSS Generator however a more complete prototype would want to include features like allowing you to edit the CSS of the selected button and have the changes reflect on the button in real time or via an “update” button.

Also, perhaps the ability to “favorite” or save a button’s CSS style so that you can reload it again later or some kind of “history” feature would be nice so that you can keep the button styles you like.

Additionally, the CSS attributes the generator uses are hard coded and the property or attribute “min/max” ranges are set as well… A nice feature would be be to have the option to edit/add or remove the CSS fields included in the generator and modify the ranges as well.

It would also be nice for the prototype to allow for the generation of vertical and horizontal menus using the button CSS and include the ability to edit the links or “onclick” actions and the text of the buttons.

The “ultimate” implementation of a generator like this would allow you to generate styles for all or some subset of the HTML elements not just buttons.

I will leave all these features for you to add to your own implementations at this time however if you like this project I may just add some or all of these features in the future.

You can view a Live Preview: Here

You can get the code on GitHub: Here

You can find a list of all my other posts on my Topics and Posts page.

I hope you enjoyed reading this article.

Your financial support allows me to dedicate time to developing projects like this and while I am publishing them without cost, that isn’t to say they are free. I am doing this all by myself and it takes me a lot of time and effort to build and publish these projects for your enjoyment.

So I ask that if you like my content, please support me on Patreon for as little as $1 or more a month.

Feel free to suggest a project you would like to see built in the comments and if it sounds interesting it might just get built and featured here on my blog for you to enjoy.

 

 

Much Love,

~Joy

 

Brute Force Password Breaking

Welcome, today we’re going to talk about “Brute Force” Password Breaking.

I know, it’s a controversial topic… though they say you should write about controversy if you want to get read right? 😛 But to ensure its as controversial as possible I’m going to give you an actual working prototype that you can use to try Brute Force attack your own passwords! 😉

However, before you call me irresponsible consider the following.

There are plenty of methods for cracking passwords that are far more efficient than a Brute Force attack and if you have to resort to Brute Force (trying all possible combinations against an unknown and likely long password) a modern hashing algorithm then you are essentially screwed!

If the creator of the password chose a “simple”,  “common” or “guessable” password… like for example “123456789”, “Cat” or “Password”, Brute Force isn’t even necessary!

 

Rainbow Tables

A hacker can simply use a “Rainbow Table” (so colorful), which is basically a database, to lookup the pre-computed solution for the hash of your password and obtain the unhashed “Plaintext” of your password.

In most cases where a hacker can use a Rainbow Table they will save themselves significant time and effort simply because they don’t have to do any hashing (which cumulatively can take a lot of time), it’s just a matter of traversing a table and retrieving the associated data of the index that matches the hash provided, assuming of course that the hash was pre-computed and exists in the Rainbow Table.

For example, the hash output of the SHA1 algorithm for the word ‘Cat’ is cebe54c7626cb1cefaca5f7f5ea6c96b4a7a2882 and if a hacker was able to break into a database containing this hashed credential then they could reverse lookup the hashed password in seconds.

Clearly what makes a Rainbow Table so useful to a hacker is that it can take the insurmountable challenge of Brute Forcing a password and change it into something that is at the very least, manageable.

There are techniques however, called Password ‘Salting‘ & ‘Peppering‘ which expose a severe weakness with Rainbow Tables… namely, if you cannot pre-compute a big database of all possible hashes then you are forced to resort to some other technique if not a Brute Force attack.

A Salt is some unique (and long) string value that can be added to a password before it is hashed to make building a Rainbow Table difficult if not impossible.

Here’s an example of how Salting works, lets take the insecure password we used above ‘Cat’, and look at what happens when we add a 30 character Salt to it prior to hashing the value.

Password: Cat

Salt: LPdjlEfrMhGkENHf3e4Lp7VZgXd77f

Hash(Password + Salt) = d73b50b3d80762f55a28a44e49568be064ee8208

Note:To ensure you get the maximum benefit from Salting your passwords you should use a different Salt for every password credential that you store in your database. If you don’t it will be much easier to for a hacker to steal your passwords.

As you can see, by including a Salt with the password when it is hashed the result produced is different than the word by itself. This different hash is what is stored in the database.

The benefit of doing this is simply that it is now extremely unlikely (improbable but not impossible) that the combination of the word Cat and this long randomly generated Salt string will have been pre-computed anywhere, so it doesn’t even matter if a hacker gets both the hash and the Salt because a Rainbow Table will have to be generated from scratch using the Salt, which can take horrendous amounts of time, potentially on the order of a human life time or even longer for some hashing algorithms, password lengths and of course depending on how much raw computation an attacker can field.

 

Dictionary Attack

Dictionary Attack is similar to the concept of a Rainbow Table in that it also utilizes a database, however where they differ is that a Rainbow Table’s purpose is to store pre-computed hashes so you can just lookup a password, whereas a Dictionary Attack still requires the attacker to break your password through hashing.

So from a Hackers perspective a Rainbow Table is preferable to a Dictionary.

The purpose of a Dictionary Attack then is to contain all the most likely passwords, which are then combined with your Salt (if they have it) or also generated as is the case with a Pepper before hashed to generate a new Rainbow Table that is unique to the Salt or Pepper.

This is a form of Brute Force attack and takes time to generate though it is still better than “true” brute force because it relies on the idea that words mean things and we all share the same words and meanings.

Anytime there is a massive breach of user credentials where the passwords are compromised… i.e. the passwords were kept as plain text or hashed using a single unchanged Salt, or simply too short of a salt for every password so all passwords in the database become compromised… all the compromised passwords get added to Dictionaries (and Rainbow Tables) because the passwords were used by someone so they are “known to be good” and are therefore more likely to be used by someone else.

Think of a Dictionary as a list of thousands to millions of “probable” common and known passwords.

The benefit of a Dictionary is that a hacker can focus on all the most likely passwords because people tend to think alike and of all the possible words that COULD be used for a password only a small subset will ever actually be selected by people.

Further, If the dictionary attack fails and the hacker must resort to True Brute Force, they can exclude the passwords in the Dictionary that they already tried and focus on a the Brute Force Attack by generating new previously untried passwords.

 

True Brute Force

The thing is, if Rainbow Tables and Dictionary Attacks failed to break your credentials then most hackers will give up and even the skilled professionals are forced to question the real value of their target because in most cases it’s not worth the hassle!

Having to resort to Brute force means that they tried EVERYTHING else and failed!

Your servers proved to be secure, your protocols are working, your “wetware” er… IT staff isn’t giving out credentials over the phone… and that “dumpster dive” the hackers took at 3 AM to see if your staff is throwing out documents with “sensitive” information, proved useless…

True Brute Force means that you take all typable letters:

Upper Case letters:  ABCDEFGHIJKLMNOPQRSTUVWXYZ

Lower Case letters: abcdefghijklmnopqrstuvwxyz

Symbols: !”#$%&'()*+,-./:;<=>?@[\]^_`{|}~

And while we’re at it why not include Numbers too: 0123456789

For a total of 94 possible characters and then starting with only 1 character, generate and iterate through all possible permutations until you give up or a solution is found!

This sounds easier to accomplish than it actually is!

Sure, generating the data is easy enough (I show you how and provide code below) but due to the sheer numbers involved it’s essentially an impossible task when considering the hash time is multiplied by all the combinations you have to try and a longer password means more combinations are required to break the hash.

This is why it is recommended by some technologists that your password include Upper and Lower case letters along with numbers and symbols and be longer than 15 characters!

Let’s do the math!

If there are 94 possible characters and a password is only 1 character long we would only need to try a maximum of 941 = 94 characters before we can guarantee we have the password.

In the case of a 3 character long password (943 = 830,584) we would have to try a little less than 1 million combinations!

This is simple exponentiation with the base being the number of symbols possible and the exponent is the length of the password.

Here’s a table:

Password Length Combinations Calculation
1 94 941
2 8,836 942
3 830,584 943
4 78,074,896 944
5 7,339,040,224 945
6 689,869,781,056 946
7 64,847,759,419,264 947
8 6,095,689,385,410,816 948
9 572,994,802,228,617,000 949
10 53,861,511,409,490,000,000 9410
11 5,062,982,072,492,060,000,000 9411
12 475,920,314,814,253,000,000,000 9412
13 44,736,509,592,539,800,000,000,000 9413
14 4,205,231,901,698,740,000,000,000,000 9414
15 395,291,798,759,682,000,000,000,000,000 9415

 

Clearly a nice long password that contains upper and lower case letters along with numbers and symbols is definitely going to give your hacker a bad day though I personally prefer the idea of Passphrases which are a series of words in a phrase rather than a single word.

If the words you use in your passphrase are nice and long, have upper and lower case letters along with numbers and symbols and isn’t a well known phrase (so that it’s not in a phrase dictionary) then you can be fairly confident that your account is secure for the foreseeable future.

Breaker Class

So as you can see, I am not helping anyone break anything by giving out this code, your passwords are safe! 😉

Regular readers will notice in the code below I used my AppTimer Class that I released over on my Benchmarking PHP article.

Breaker has no properties and only 3 methods (not including PHP’s Magic Methods).

Methods: GetSymbols, IncrementValues & Match

GetSymbols()

GetSymbols converts the numbers in an array to the char the number represents.

For example: The number 0 represents the exclamation symbol ! and the number 33 represents uppercase B

IncrementValues()

IncrementValues takes an array of numbers and increments the values of each by 1 unless that would exceed the allowable max in which case it resets it to 0 and the value to the right is created or incremented by 1.

Match()

At this time, match just does a comparison though feel free to hash the values that are passed to this method to complete the Brute Force Attack program.

As is, this will only brute force plain text against plain text.

Breaker.php

<?php
set_time_limit(0); // Disable the time limit on script execution

// Create Breaker Class 
//
// This tool is a demonstration of a "brute force" password breaker.
// This prototype is provided AS IS and for informational & educational
// purposes only! 
//
// Modern Password Hashing should have little fear of this code though
// for a minimum level of rationality I have excluded the parts that 
// would handle hashing the passwords to slow down the "Script Kiddies" 
// however any reasonably skilled PHP developer would have little trouble 
// adding their own hashing function to complete this prototype.
// 
// DO NOT USE THIS SOFTWARE TO VIOLATE THE LAW! COMPLY WITH ALL DIRECTION
// GIVEN TO YOU BY LAW ENFORCEMENT! ANY ILLEGAL OR MALICIOUS ACTIONS YOU 
// CHOOSE TO ENGAGE IN OUTSIDE OF AN EDUCATIONAL SETTING ARE YOUR OWN! 
class Breaker{

    function GetSymbols($values, $symbols){
        foreach($values as &$value){
            if(isset($symbols[$value])){
                $value = $symbols[$value];
            }
        }
        return $values;
    }

    function IncrementValues($values, $number_of_valid_symbols){
        foreach($values as $key=>&$value){
            // If this value is maxed
            if($values[$key] >= $number_of_valid_symbols){
                // Reset it to 0 and increment the next value
                $values[$key] = 0; // Reset this value
                if(!isset($values[$key+1])){
                    $values[$key+1] = '0';
                }else{
                    $values[$key+1]++; // Reset this value
                }
            }
            else{
                // If key greater than 0
                if($key > 0){
                    if($values[$key-1]>$number_of_valid_symbols){
                        // Increment this value
                        $values[$key]++;
                    }
                }
                else{
                    // Always Increment this value
                    $values[$key]++;
                }
            }            
        }
        return $values;
    }

    function Match($hash, $test_password){
        if($test_password == $hash){
          return true;
        }
        return false;
    }

}



include('AppTimer.Class.php');     // Include AppTimer class file

$Timer = new AppTimer();           // Create Timer
$Timer->Start();                   // Start() Timer


$password_to_break = 'Cat'; 

// Concatenate all symbols explicitly
//$valid_symbols = "!\"#$%&'()*+,-./0123456789:;<=>?@"; // note the escaped double quote
//$valid_symbols .= "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`";
//$valid_symbols .= "abcdefghijklmnopqrstuvwxyz{|}~";
//$valid_symbols = str_split($valid_symbols); // split string into array

// Cleaner way to Create array of ASCII char 33 - 126
$valid_symbols = range(chr(33), chr(126)); // Shorter version of above
$number_of_valid_symbols = count($valid_symbols); // 94 chars

$length = 1; // Start at 1 digit length to try all possible combinations
             // This assumes the password length is unknown.
// If the length of the password is known then use the correct length i.e:
// $length = strlen($password_to_break); 
                                       
// Generate first plain text password to try
$values = str_split(strrev(str_repeat('0', $length)));
$PlainTextPasswordBreaker = new Breaker();
$test_password = $PlainTextPasswordBreaker->GetSymbols($values, $valid_symbols);

while(!$PlainTextPasswordBreaker->Match($password_to_break, $test_password)){
    // We have not found the correct password so keep trying to generate it
    $values = $temp = $PlainTextPasswordBreaker->IncrementValues($values, $number_of_valid_symbols);
    $temp = $PlainTextPasswordBreaker->GetSymbols($values, $valid_symbols);
    $test_password = strrev(implode('', $temp));
    
    //echo $test_password . PHP_EOL; // Uncomment to watch breaker
                                   // Will make Breaker much slower
}


$Timer->Stop();             // Stop() Timer
$time = $Timer->Report();   // Report()


echo "Password: $test_password \nFound in: $time" . PHP_EOL;

As presented the output of the code will look something like this:

Password: Cat 
Found in: 5.8302 Seconds

If you uncomment the echo on line 105 inside the while loop you can watch each permutation get generated however echo will slow down the time it actually takes to find the password.

Here is what that would look like (note that I shortened the output to just the last few permutations before the solution was found):

...
Ca!
Ca"
Ca#
Ca$
Ca%
Ca&
Ca'
Ca(
Ca)
Ca*
Ca+
Ca,
Ca-
Ca.
Ca/
Ca0
Ca1
Ca2
Ca3
Ca4
Ca5
Ca6
Ca7
Ca8
Ca9
Ca:
Ca;
Ca
Ca?
Ca@
CaA
CaB
CaC
CaD
CaE
CaF
CaG
CaH
CaI
CaJ
CaK
CaL
CaM
CaN
CaO
CaP
CaQ
CaR
CaS
CaT
CaU
CaV
CaW
CaX
CaY
CaZ
Ca[
Ca\
Ca]
Ca^
Ca_
Ca`
Caa
Cab
Cac
Cad
Cae
Caf
Cag
Cah
Cai
Caj
Cak
Cal
Cam
Can
Cao
Cap
Caq
Car
Cas
Cat
Password: Cat 
Found in: 14.9678 Seconds

 

USE BREAKER RESPONSIBLY FOR EDUCATIONAL PURPOSES ONLY!

You can find Breaker on my GitHub profile here.

You can find a list of all my other posts on my Topics and Posts page & I hope you enjoyed reading this article, if so please support me on Patreon for as little as $1 a month.

Your financial support allows me to dedicate time to developing awesome projects like this and while I am publishing them without cost, that isn’t to say they are free. I am doing this all by myself and it takes me a lot of time and effort to build and publish these projects for your enjoyment.

Your financial support means a lot to me and allows me to be able to afford to spend the time necessary to make great content for you.

So I ask again, please support me on Patreon for as little as $1 a month.

Feel free to suggest a project you would like to see built in the comments and if it sounds interesting it might just get built and featured here on my blog.

 

 

Much Love,

~Joy

Mysterious Warehouse

My best friend is writing a series of books that are a mix of Sin City meets The Crow, though I’m not at liberty to reveal too much about the project at this time.

Anyway, he asked me to design some cover art samples for the books and instantly I was on board!

As part of the design process I am generating a lot of “extra” art work that won’t be used for anything… so I thought I would share a few of the images here.

Please enjoy!

 

Click for Full Size

 

I hope you enjoy my posts, if so i’d love your support on Patreon, for as little as $1 a month your financial support allows me to dedicate time to developing awesome projects and art like this and while I publish without cost, that isn’t to say my work is free. I am doing this all by myself and it takes me a lot of time and effort.

Your financial support allows me to be able to afford to spend the time necessary to make great content for you.

So I ask again, please support me on Patreon for as little as $1 a month.

 

 

Much Love,

~Joy

Email Relationship Classifier Testing The Bot

Welcome back, today is last post in this series and one that I know many of you have been eagerly awaiting… we’re finally going to test the bot!

So I think what I’m going to do is first give you the code to review and then after I will walk you through it and explain what’s going on.

Test.php

I have labeled the subheadings in this post after the section comments in the code to make it easier to review so you should refer back to this code while reading the article to aid in understanding.

<?php
// This function will load the human scored JSON class files
function LoadClassFile($file_name){
  // Get file contents
  $file_handle = fopen($file_name, 'r');
  $file_data = fread($file_handle, filesize($file_name));
  fclose($file_handle);
  return $file_data;
}

// We will pass our Results to this function to save so it can be reviewed later
function CreateResultsFile($file_name, $output_path, $results){
  
  // Write file contents
  $file_handle = fopen($output_path . basename($file_name), 'w');
  $file_data = fwrite($file_handle, $results);
  fclose($file_handle);
}



// Include Classes
function ClassAutoloader($class) {
  include 'Classes/' . $class . '.Class.php';
}
spl_autoload_register('ClassAutoloader');


// Instantiate Objects
$myTokenizer = new Tokenizer();
$myEmailFileManager = new FileManager();
$myJSONFileManager = new FileManager();
$myDatabaseManager = new DatabaseManager();


// No Configuration needed for the Tokenizer Object

// Configure FileManager Objects
$myEmailFileManager->Scan('DataCorpus/TestData');
$myJSONFileManager->Scan('DataCorpus/TestDataClassifications');
$number_of_testing_files = $myEmailFileManager->NumberOfFiles();
$number_of_JSON_files = $myJSONFileManager->NumberOfFiles();

// Configure DatabaseManager Object
$myDatabaseManager->SetCredentials(
  $server = 'localhost',
  $username = 'root',
  $password = 'password',
  $dbname = 'EmailRelationshipClassifier'
);


// Make sure the files are there and the number of files are the same
if(($number_of_testing_files != $number_of_JSON_files) 
   || ($number_of_testing_files == 0 || $number_of_JSON_files == 0) 
  ){
  die(PHP_EOL . 'ERROR! the number of training files and classification files are not the same or are zero! Run CreateClassificationFiles.php first.');
}
else{
  // Loop Through Files
  for($current_file = 0; $current_file < $number_of_testing_files; $current_file++){
 
  
  $report_data = '';
  
  /////////////////////////
  // Bot Predict Classification
  /////////////////////////
  
  $file = $myEmailFileManager->NextFile();
  
  $myTokenizer->TokenizeFile($file);
    
  $report_data .= "Found Tokens:". PHP_EOL;
  // Loop Through Tokens
  foreach($myTokenizer->tokens as $word=>$count){
    $report_data .= "$word $count" . PHP_EOL;
    // Get word classification scores
    $myTokenizer->tokens[$word] = $myDatabaseManager->ScoreWord($word, $count);
    
    // Remove unknown word tokens
    if($myTokenizer->tokens[$word] == NULL){
    unset($myTokenizer->tokens[$word]);
    }
  }
  
  $report_data .= PHP_EOL . "Known Words:". PHP_EOL;
  $known_words = array_keys($myTokenizer->tokens);
  foreach($known_words as $word){
    $report_data .= $word . PHP_EOL;
  }
  
  $weights = array();
  // Sum tokens
  foreach($myTokenizer->tokens as $word=>$word_data){
    foreach($word_data as $class_name=>$class_count){
    @$weights[$class_name] += $class_count;
    }
  }
  $weights = array_diff($weights, array(0)); // remove 0 value classes

  // Sort into sender recipient groups
  foreach($weights as $class=>$count){
    // if key name contains -Sender add to the Sender key
    if(strstr($class, '-Sender')){
    $weights['Sender'][strstr($class, '-Sender', true)] = $count;
    }
    else{// if key name contains -Recipient add to the Recipient key
    $weights['Recipient'][strstr($class, '-Recipient', true)] = $count;
    }
    unset($weights[$class]); // remove the unsorted element
  }
  // sort arrays from more likely to less likely
  array_multisort($weights['Sender'], SORT_DESC);
  array_multisort($weights['Recipient'], SORT_DESC);



  /////////////////////////
  // Human Classified Data
  /////////////////////////
  $EmailClassifications = json_decode(LoadClassFile($myJSONFileManager->NextFile()), true);
  $EmailClassifications = array_diff($EmailClassifications, array(0)); // remove 0 value classes
  $sum = array_sum($EmailClassifications); // sum the total of all classes weights
  // sort into sender recipient groups
  // and convert values to percentages
  foreach($EmailClassifications as $class=>$count){
    // if key name contains -Sender add to the Sender key
    if(strstr($class, '-Sender')){
    $EmailClassifications['Sender'][strstr($class, '-Sender', true)] = $count;
    }
    else{// if key name contains -Recipient add to the Recipient key
    $EmailClassifications['Recipient'][strstr($class, '-Recipient', true)] = $count;
    }
    unset($EmailClassifications[$class]); // remove the unsorted element
  }
  // sort arrays
  array_multisort($EmailClassifications['Sender'], SORT_DESC);
  array_multisort($EmailClassifications['Recipient'], SORT_DESC);


  $report_data .= PHP_EOL;
  
  
  
  
  /////////////////////////
  // Report - Sender
  /////////////////////////

  $report_data .= PHP_EOL . "Predicted Sender Class & Score: " . PHP_EOL;
  $sum = array_sum($weights['Sender']); // sum the total of Sender weights
  foreach($weights['Sender'] as $class=>$count){
     $report_data .= "$class:  " . round(($count / $sum) * 100) . '%' . PHP_EOL;
  }
    

  $report_data .= PHP_EOL . "Human Scored Sender Class: " . PHP_EOL;
  $sum = array_sum($EmailClassifications['Sender']); // sum the total of Sender EmailClassifications
  foreach($EmailClassifications['Sender'] as $class=>$count){
     $report_data .= "$class:  " . round(($count / $sum) * 100) . '%' . PHP_EOL;
  }
  
  /////////////////////////
  // Report - Sender Mistakes
  /////////////////////////

  $report_data .= PHP_EOL . "Incorrect Predicted Sender Classes: " . PHP_EOL;
  $IPSC = array_keys(array_diff_key($weights['Sender'], $EmailClassifications['Sender']));
  if(count($IPSC) > 0){
    foreach($IPSC as $class){
     $report_data .= $class . PHP_EOL;
    }
  }else{
    $report_data .= 'None' . PHP_EOL;
  }
    
  $report_data .= PHP_EOL . "Missing Predicted Sender Classes: " . PHP_EOL;
  $MPSC = array_keys(array_diff_key($EmailClassifications['Sender'], $weights['Sender']));
  if(count($MPSC) > 0){
    foreach($MPSC as $class){
     $report_data .= $class . PHP_EOL;
    }
  }else{
    $report_data .= 'None' . PHP_EOL;
  }
  

  /////////////////////////
  // Report - Recipients
  /////////////////////////
  
  $sum = array_sum($weights['Recipient']); // sum the total of Sender weights
  $report_data .= PHP_EOL . "Predicted Recipient Class & Score: " . PHP_EOL; 
  foreach($weights['Recipient'] as $class=>$count){
     $report_data .= "$class:  " . round(($count / $sum) * 100) . '%' . PHP_EOL;
  }
  

  $report_data .= PHP_EOL . "Human Scored Recipient Class: " . PHP_EOL; 
  $sum = array_sum($EmailClassifications['Recipient']); // sum the total of Recipient EmailClassifications
  foreach($EmailClassifications['Recipient'] as $class=>$count){
     $report_data .= "$class:  " . round(($count / $sum) * 100) . '%' . PHP_EOL;
  }
  
  
  /////////////////////////
  // Report - Recipient Mistakes
  /////////////////////////
  
  $report_data .= PHP_EOL . "Incorrect Predicted Recipient Classes: " . PHP_EOL;
  $IPRC = array_keys(array_diff_key($weights['Recipient'], $EmailClassifications['Recipient']));
  if(count($IPRC) > 0){
    foreach($IPRC as $class){
     $report_data .= $class . PHP_EOL;
    }
  }else{
    $report_data .= 'None' . PHP_EOL;
  }
  
  $report_data .= PHP_EOL . "Missing Predicted Recipient Classes: " . PHP_EOL;
  $MPRC = array_keys(array_diff_key($EmailClassifications['Recipient'], $weights['Recipient']));
  if(count($MPRC) > 0){
    foreach($MPRC as $class){
     $report_data .= $class . PHP_EOL;
    }
  }else{
    $report_data .= 'None' . PHP_EOL;
  }
  
  /////////////////////////
  // Report - Overall
  /////////////////////////
  
  // Compute Results
  $sum_pediction = count($weights['Sender']) + count($weights['Recipient']);
  $sum_pediction -= count($IPSC); // Penalize Incorrect Predicted Sender Classes
  $sum_pediction -= count($MPSC) / 2; // Penalize Missing Sender Classes at half a point each
  $sum_pediction -= count($IPRC); // Penalize Incorrect Predicted Recipient Classes
  $sum_pediction -= count($MPRC) / 2; // Penalize Missing Recipient Classes at half a point each
  $sum_actual = count($EmailClassifications['Sender']) + count($EmailClassifications['Recipient']);
  
  $report_data .= PHP_EOL . "Overall Accuracy: " . PHP_EOL;
  $report_data .= ($sum_pediction / $sum_actual) * 100 . '%' . PHP_EOL;
  
  CreateResultsFile($file, 'DataCorpus/TestResults/', $report_data);
  echo $report_data;
  }
}

echo PHP_EOL . 'Testing Complete!' . PHP_EOL;

 

You will probably recognize the first portion of this code from Train.php and in fact there are really only two differences in initializing the environment between the two scripts.

The first difference is that Test.php includes a function called CreateResultsFile() that we’ll use to save the report that the bot generates, so we can review it later and the second is the paths that we provide to $myEmailFileManager & $myJSONFileManager are different from the ones used in Train.php.

Once the fail conditions around line 54 pass, the bot will step through all testing data beginning around line 61.

The first order of business is to generate the bot’s “prediction” of what relationship classes are present in the email.

Bot Predict Classification

The bot starts by Tokenizing the file which means building a bag of words model for the email and then the found tokens are passed to the $myDatabaseManager Object which uses it’s ScoreWord() method to scale the word class values using the information obtained during training. Unknown words are ignored and have no bearing on classifying the email in my implementation.

$myDatabaseManager->ScoreWord() method

For reference here is the ScoreWord() method for your review.

public function ScoreWord(&$word, &$count){
  
  if(count($this->classifications) == 0){
    $this->GetKnownClasses();
    $classifications = array();
    foreach($this->classifications as $class=>$value){
      $classifications["$class-Sender"] =  $value;
    }
    foreach($this->classifications as $class=>$value){
      $classifications["$class-Recipient"] =  $value;
    }
    $this->classifications = $classifications;
  }
  

  if($this->KnownWord($word)){
    $this->Connect();
    $sql = "SELECT * FROM `Words` WHERE `Word` LIKE '$word'";
    $result = $this->conn->query($sql);

    if ($result->num_rows > 0) {
    $word_data = $result->fetch_assoc();
    foreach($word_data as $key=>$value){
       if($key == 'ID'){
         unset($word_data["$key"]);
       }
       elseif($key == 'Word'){
         unset($word_data["$key"]);
       }
       else{
         $word_data[$key] *= ($count * $this->classifications[$key]);
       }
    }
    return $word_data;
    }
  }else{
    // unknown word... add it or ignore it
  }
}

 

Note that you could easily add new words found during test data to the bot knowledge base with zero relationship class affiliations and you could later manually update the word classes or do additional training to improve the bot’s “familiarity” with the word.

 

Then the $weights array is created to hold the prediction (the bot generated classifications) which is all the class counts summed and unnecessary elements removed.

Why $weights and not $prediction? I don’t know, maybe I was being $pretentious. 😛

The array is then sorted into sender and recipient groups followed by lowest class to the highest class.

 

Human Classified Data

Next the human generated classifications stored in JSON are loaded into the $EmailClassifications array and the values are sorted into sender and recipient groups as well.

At this point we have extracted enough information to begin generating the statistical portion of the $report.

 

Report – Sender

Beginning on line 147 we evaluate the Sender data starting with the bot prediction by adding up the $sum “total count” of all the predicted weights then we determine what percentage each individual weight contributes to the overall prediction by dividing the weight value against the $sum then multiply the resulting number by 100% of the $sum.

This same process is repeated for the human classified data.

 

Report – Sender Mistakes

We then evaluate the bot predicted sender data for mistakes by comparing the bot’s predicted classification $weights against the known human generated $EmailClassifications using the array_diff_key & array_keys PHP language functions to extract and store the “Incorrect Predicted Sender Classes” as the $IPSC array, so we can use them later during the final evaluation.

We then do the same but in reverse, comparing $EmailClassifications against $weights for the “Missing Predicted Sender Classes” and save them the as $MPSC array.

 

Report – Recipients & Recipients Mistakes

We repeat this same process we used for the Sender data for the Recipients data beginning on line 189 followed by processing any mistakes on line beginning on line 208 which results in the $IPRC (Incorrect Predicted Recipient Classes) & $MPRC (Missing Predicted Recipient Classes) arrays.

 

Report – Overall

The last portion of the report is to evaluate the “overall” accuracy using the data the we collected and generated while working on the report.

We start by creating a $sum_prediction variable and setting its value to the total count of weights present in the $weights array.

We then proceed to subtract “points” from this number for every incorrect and or missing relationship classes.

Incorrect predictions receive a full point penalty whereas  missing predictions are penalized as half a point.

My thought process being that it’s better (but not perfect) for the bot to miss a class and exclude it than to include an incorrect class.

You may wish to use a different scoring rubric than this depending on what the repercussions of incorrect or missing data are in your model, this method is provided as a simple example.

We then create a variable called $sum_actual and set its value to the total count of classes present in the email as classified by a human.

The final “Overall Accuracy” is computed by taking the $sum_prediction and dividing it by the $sum_actual and then multiplying against 100 to get a percent.

We then save the $results report using the CreateResultsFile() function and echo the report to the screen as well.

Ideally $results would be captured to facilitate programmatic evaluation of the overall accuracy of the model, like in a csv or in a database so that you can compare all the results of all the test data,  however as this is only a prototype I went with a .txt dump of the individial report that the bot generates.

The output of this bot should look something like this:

 


Found Tokens:
YOU 1
WONT 1
BELIEVE 1
THIS 1
ITS 1
UNBELIEVABLE 1
DURING 1
THE 3
POSTGAME 1
CELEBRATION 1
MR 2
COACH 2
GOT 1
A 1
WHOLE 1
WATER 1
COOLER 1
DUMPED 1
ON 1
HIS 1
HEAD 1
EVERYONE 1
LAUGHED 1
AS 1
CHASED 1
TEAM 1
OFF 1
FIELD 1
LOVE 1
BOBBY 1

Known Words:
YOU
THE
LOVE


Predicted Sender Class & Score: 
Child:  53%
Daughter:  47%

Human Scored Sender Class: 
Son:  50%
Child:  50%

Incorrect Predicted Sender Classes: 
Daughter

Missing Predicted Sender Classes: 
Son

Predicted Recipient Class & Score: 
Parent:  36%
Mother:  32%
Father:  32%

Human Scored Recipient Class: 
Mother:  33%
Father:  33%
Parent:  33%

Incorrect Predicted Recipient Classes: 
None

Missing Predicted Recipient Classes: 
None

Overall Accuracy: 
70%

Testing Complete!

 

As it stands this bot is quite rough however you can improve it by modeling word bi-grams to account for the context the words are used in rather than just noting which words are present.

Additionally, I capitalize and process hyphens and apostrophes out of words which reduces the number of words the bot learns (i.e. dont vs don’t vs Don’t vs Dont vs DoNt… all become DONT) which simplifies some things and reduces database storage requirements a bit, however it does fail to properly model language because people can express meaning in ways that might get removed by this processing which obviously lowers the accuracy of your model in the long run.

You can find this bot and all its files on GitHub – emails not included.

I hope you enjoyed reading about building this bot , if so please support me on Patreon for as little as $1 a month.

Your financial support allows me to dedicate time to developing awesome projects like this and while I am publishing them without cost, that isn’t to say they are free. I am doing this all by myself and it takes me a lot of time and effort to build and publish these projects for your enjoyment.

Your financial support means a lot to me and allows me to be able to afford to spend the time necessary to make great content for you.

So I ask again, please support me on Patreon for as little as $1 a month.

Feel free to suggest a project you would like to see built in the comments and if it sounds interesting it might just get built and featured here on my blog.

 

 

Much Love,

~Joy

 

Email Relationship Classifier Training The Bot

We’re in the “home stretch” and quickly approaching our goal of having a working Email Relationship Classifier Bot prototype.

Today we will cover building the training portion of the bot and of course this system implements “Supervised Learning” so you will need to have “hand classified” your “Data Corpus” as outlined in my post A Bag of Words as well as Classifying Emails so if you’ve read the other posts in this series then you are ready to proceed.

 

Train.php

What you are going to love about this code is it’s simplicity!

It’s intentionally short and “high level” which was achieved by using our Class files (DatabaseManager.Class.phpFileManager.Class.phpTokenizer.Class.php) which we covered in my post Class Files  to create Objects to “act” upon our data encapsulated inside them. This means we can just ask our Objects to do complex work in just a few lines of code.

Code

<?php

// This function will load the human scored JSON class files
function LoadClassFile($file_name){
  // Get file contents
  $file_handle = fopen($file_name, 'r');
  $file_data = fread($file_handle, filesize($file_name));
  fclose($file_handle);
  return $file_data;
}


// Include Classes
function ClassAutoloader($class) {
    include 'Classes/' . $class . '.Class.php';
}
spl_autoload_register('ClassAutoloader');


// Instantiate Objects
$myTokenizer = new Tokenizer();
$myEmailFileManager = new FileManager();
$myJSONFileManager = new FileManager();
$myDatabaseManager = new DatabaseManager();


// No Configuration needed for the Tokenizer Object

// Configure FileManager Objects
$myEmailFileManager->Scan('DataCorpus/TrainingData');
$myJSONFileManager->Scan('DataCorpus/TrainingDataClassifications');
$number_of_training_files = $myEmailFileManager->NumberOfFiles();
$number_of_JSON_files = $myJSONFileManager->NumberOfFiles();

// Configure DatabaseManager Object
$myDatabaseManager->SetCredentials(
  $server = 'localhost', 
  $username = 'root', 
  $password = 'password', 
  $dbname = 'EmailRelationshipClassifier'
);


// Make sure the files are there and the number of training files is
// the same as the number of JSON Class files.
if(($number_of_training_files != $number_of_JSON_files) 
   || ($number_of_training_files == 0 || $number_of_JSON_files == 0) ){
  die(PHP_EOL . 'ERROR! the number of training files and classification files are not the same or are zero! Run CreateClassificationFiles.php first.');
}
else{
  // Loop Through Files
  for($current_file = 0; $current_file < $number_of_training_files; $current_file++){
    $myTokenizer->TokenizeFile($myEmailFileManager->NextFile());		
    $EmailClassifications = json_decode(LoadClassFile($myJSONFileManager->NextFile()), true);
    // Loop Through Tokens
    foreach($myTokenizer->tokens as $word=>$count){
      $myDatabaseManager->AddOrUpdateWord($word, $count, $EmailClassifications);
    }
  }
}

echo PHP_EOL . 'Training complete! You can now run Test.php' . PHP_EOL;

 

Save Train.php in the root project folder:


[EmailRelationshipClassifier]
│
├── CreateClassificationFiles.php
├── DatasetSplitAdviser.php
├── database.sql
├── Train.php 
│
├── [Classes]
│   │
│   ├── DatabaseManager.Class.php
│   ├── FileManager.Class.php
│   └── Tokenizer.Class.php
│
└── [DataCorpus]
    │
    ├── [TestData]
    │
    ├── [TestDataClassifications]
    │
    ├── [TestResults]
    │
    ├── [TrainingData]
    │
    └── [TrainingDataClassifications]

 

Of course the complexity does exist inside the Objects, it’s just advantageous to obfuscate it here using the Object methods so that we can focus on the task of training rather than the details of moving the data around.

Once all the classes have been included and the objects instantiated & configured there is a check to confirm the .txt & JSON files exist and that the number is the same.

If none of the fail conditions trigger the die() function then for all the training files (.txt emails),   the $myTokenizer Object will ask the $myEmailFileManager Object for the next file in it’s list which it will load and tokenize, which means that it builds a “bag of words model” of the email, specifically “unigrams“.

Then the JSON relationship class file will be loaded and decoded into an array of “key & value pairs ” where the key is the relationship class name and the value is either a zero or one (0/1) where one denotes relationship class membership and zero denotes a lack of class membership.

Then for each unigram word token the $myDatabaseManager Object will perform it’s AddOrUpdateWord() method.

The AddOrUpdateWord()  method accepts the unigram word token as the  first argument, the number of times it appears in the training file as the second argument and the relationship class memberships array as the third argument. The word is then either added to the Words table in the database or updated.

You can review the details of the database in my post Email Relationship Classifier Database.

After all the words in all the training emails have been processed the training is complete and we’re ready to test our bot which I’ll cover in an upcoming post.

If you enjoyed this post please support me on Patreon for as little as $1 a month, thank you.

 

 

Much Love,

~Joy

Blog at WordPress.com.

Up ↑