Recently I wrote a post titled A Bag of Words where I outlined an Email Relationship Classifier which is a bot designed to determine the relationships between the sender and all of the recipients based only on the text available in the subject line and the body of the email.

Yesterday I uploaded the “class” files to my GitHub profile that I will use to implement the Relationship Classifier bot and today we’re going go over them.

If you plan to use the class files in your implementation you will also require a database and the code for the bot as well however we’ll discuss those in an upcoming post.

Today however lets just get right to it and review the class files used by my bot.

FileManager.Class.php

This class file manages the files (txt, json etc) that the bot interacts with.

There are three properties: $path, $files & $current_file

As well as three Standard Methods: Scan(), NextFile(), NumberOfFiles()

In addition there are two Magic Methods: __construct() & __destruct()

Properties

The FileManager Object is designed to work inside a single folder at a time and uses the $path property to remember where it’s working.

The $files property keeps a list of all the files in the folder that the bot is working.

The $current_file property keeps track of the numeric index of the file name stored in the $files array.

Methods

The Scan() method will scan the path provided to it and store the names of the files it finds in the $files array.

The NextFile() method will return the name of the next file in the $files array after the $current_file integer value.

The NumberOfFiles() method simply counts how many files (keys) are in the $files array.

Code:

<?php
class FileManager {
  // Data
  private $path = '';
  private $files = array();
  private $current_file = 0;
  
  function __construct(){
  }
  
  function __destruct(){
    // Do NOT call this function ie $object->__destruct();
    
    // Use unset($object); to let garbage collection properly
    // destroy the FileManager object in memory
    
    // No More FileManager after this point
  }
  
  public function Scan($path = ''){
    if(!empty($path)){
      $this->path = $path;
      $this->files = array_values(array_diff(scandir($path), array('..', '.')));
    }
    else{
		die('ERROR! FileManager->Scan(string $path) requires a directory path string.' . PHP_EOL);
	}
  }
  
  public function NextFile(){
    if(count($this->files) > 0){
      
      // reset count so we dont overun the array
      if($this->current_file > count($this->files)){
        $this->current_file = 0;
      }

      $file = "{$this->path}/{$this->files[$this->current_file]}";
      $this->current_file++;
      return $file;
    }
    else{
		die('ERROR! FileManager->NextFile() requires you to run FileManager->Scan(string $path) first.' . PHP_EOL);
    }
  }
  
  public function NumberOfFiles(){
    return count($this->files);
  }
}

 

Tokenizer.Class.php

This class file facilitates Lexical Analysis through a method called tokenization. A token is a discrete piece of information or pattern like a word.

There is only a single property: $tokens

Further there are three Standard Methods: TokenizeFile(), ProcessTokens(), Tokenize()

In addition there are two Magic Methods: __construct() & __destruct()

Properties

The Tokenizer Object is designed to do the “slicing and dicing” of your data so it doesn’t require many properties

The $tokens property keeps a list of all the tokens that the bot is working with.

Methods

The TokenizeFile() method accepts a path to a file which it will then load, read into a string and  then pass to the Tokenize() method.

The ProcessTokens() method will receive the matches found by the Tokenize() method and process them so as to remove apostrophes and hyphens i.e.  pre-game becomes pregame and ain’t’not’gonna’ever-never becomes aintnotgonnaevernever.  After which it converts the token to UPPERCASE so that tokens that are otherwise the same can be merged into a single token and be counted properly.

The Tokenize() method uses RegEx pattern matching and capture groups to match the pattern /(\w+)(\’?)(?)/m‘ which it then passes to the ProcessTokens() method. Once ProcessTokens() is finished Tokenize()  counts the tokens and then uses the token as the key and the value is the count. i.e. “The blue sky is blue” would be represented like this (‘THE’=>1, ‘SKY’=1, ‘IS’=1, ‘BLUE’=>2).

Code:

<?php
class Tokenizer {

    // Data
    public $tokens = array();


    function __construct(){
    }

    function __destruct(){
        // Do NOT call this function ie $object->__destruct();

        // Use unset($object); to let garbage collection properly
        // destroy the Tokenizer object in memory

        // No More Tokenizer after this point
    }

    public function TokenizeFile($file_name){
		// Get file contents
		$file_handle = fopen(trim($file_name), 'r');
		$file_data = fread($file_handle, filesize($file_name));
		fclose($file_handle);

		// do any preprocessing to $file_data here

		// Pass file data to Tokenize() method
		$this->Tokenize($file_data);
    }


	private function ProcessTokens(&$matches){
		
		foreach($matches as $key=>&$tokenset){
			// $tokenset[2] == ' 
			// $tokenset[3] == - 
			// Handle apostrophe and hyphen word merges
			// i.e. pre-game = PREGAME
			// & don't = DONT
			if(!empty($tokenset[2]) || !empty($tokenset[3])){

				$n = 1;
				$tokenset[0] = str_replace(array('\'', '-'), '', $tokenset[0]); // remove apostrophe and hyphen
				$next = $matches[$key + $n][0];
				$tokenset[0] .= $next; // merge with next captured token
				unset($matches[$key + $n]); // unset next token
				
				// Handle nested hyphen & apostrophe word merges 
				// i.e. pre-game-celebration  = PREGAMECELEBRATION
				// & ain't'not'gonna'ever-never  = AINTNOTGONNAEVERNEVER
				while(strpos($next, '-') !== false || strpos($next, '\'') !== false){
					$n++;
					$next = $matches[$key + $n][0];
					$tokenset[0] = str_replace(array('\'', '-'), '',$tokenset[0]) . str_replace(array('\'', '-'), '', $next); // merge with next captured token
					unset($matches[$key + $n]); // unset next token
				}			
			}

			$tokenset = strtoupper(trim($tokenset[0])); // convert to uppercase and string
		}	
	}
	

    private function Tokenize($string){
		if(!empty($string)){
			// Get Word Tokens using RegEx
			preg_match_all('/(\w+)(\'?)(-?)/m', $string, $this->tokens, PREG_SET_ORDER, 0);
			
			$this->ProcessTokens($this->tokens);

			// use words as keys in array and values are the counts
			$this->tokens = array_count_values($this->tokens);
		}
    }
}

`

DatabaseManager.Class.php

This class file allows the bot to connect to it’s database and since the DatabaseManager Object is designed to handle the communication for the bot some of it’s functionality has been merged into this class directly rather than passing data between classes.

There are six properties: $server, $username$password, $dbname, $conn, $classifications

As well as seven Standard Methods: SetCredentials(), Connect(), Disconnect(), GetKnownClasses(), KnownWord(), ScoreWord(), AddOrUpdateWord()

In addition there are two Magic Methods: __construct() & __destruct()

Properties

Other than the $classifications property this is probably what you would expect to see on a database manager.

The $server property is the DNS name or IP address of the server hosting the database for the bot.

The $username property is the username the bot uses to access the database.

The $password property is the password the bot uses to access the database.

The $dbname property is the name of the database the bot is using.

The $conn property stores the connection object once initialized.

The $classifications property is a key & values array of the classifications and the ‘weight’ used by the bot (see A Bag of Words) to determine the “weighted” score for a relationship class rather than simply relying on a raw count.

Methods

The SetCredentials() method accepts and sets the $server, $username$password, $dbname properties.

The Connect() method establishes a connection with the server and retains it as the $conn property.

The Disconnect() method severs the connection with the database.

The GetKnownClasses() method queries the database for known “Relationship Classifications” and weights then retains the information as the $classifications array property.

The KnownWord() method returns true if the word is known and false otherwise.

The ScoreWord() method is used during testing to obtain the class scores for a word in the database.

The AddOrUpdateWord() method is used during training to add new words or update known word.

Code:

<?php
class DatabaseManager {
  // Data
  private $server = '';
  private $username = '';
  private $password = '';
  private $dbname = '';
  public $conn;     // The DB connection

  public $classifications = array();
      
  function __construct($server = NULL, $username = NULL, $password = NULL, $dbname = NULL){
    if(!empty($server) && !empty($username) && !empty($password) && !empty($dbname)){
      $this->SetCredentials($server, $username, $password, $dbname);
    }
  }
  
  
  function __destruct(){
    // Do NOT call this function ie $object->__destruct();
    
    // Use unset($object); to let garbage collection properly
    // destroy the DatabaseManager object in memory
    
    // No More DatabaseManager after this point
  }
  
  
  public function SetCredentials($server, $username, $password, $dbname){
     $this->server = $server;
     $this->username = $username;
     $this->password = $password;
     $this->dbname = $dbname;
  }
  
  
  public function Connect(){
    // Create connection
    $this->conn = new mysqli($this->server, $this->username, $this->password, $this->dbname);
    
    // Check connection
    if ($this->conn->connect_error) {
      die("MYSQL DB Connection failed: " . $this->conn->connect_error);
    }

    return true;
  }
    
    
  public function Disconnect(){
    $this->conn->close(); // Close connection
  }


  public function GetKnownClasses(){
  $this->Connect();
    $sql = "SELECT * FROM `Classifications`";
    $result = $this->conn->query($sql);

  if ($result->num_rows > 0) {
    $classifications = array();
    // Obtain the Classifications
    while($row = $result->fetch_assoc()) {
       $classifications[$row['Classification']] = $row['Weight'];
    }
    $this->classifications = $classifications;
  }
  else {
    die('ERROR! No Known Classifications in Database.' . PHP_EOL);
  }
  $this->Disconnect();
  }

  public function KnownWord(&$word){
    $this->Connect();
      $sql = "SELECT * FROM `Words` WHERE `Word`='$word' LIMIT 1;";
      $result = $this->conn->query($sql);
      //$this->Disconnect();

    if ($result->num_rows > 0) {
      return true;
    }
    return false;      
  }
  
  public function ScoreWord(&$word, &$count){
	
	if(count($this->classifications) == 0){
	    $this->GetKnownClasses();
	    $classifications = array();
	    foreach($this->classifications as $class=>$value){
			$classifications["$class-Sender"] =	$value;
		}
		foreach($this->classifications as $class=>$value){
			$classifications["$class-Recipient"] =	$value;
		}
		$this->classifications = $classifications;
	}
	

	if($this->KnownWord($word)){
        $this->Connect();
		$sql = "SELECT * FROM `Words` WHERE `Word` LIKE '$word'";
		$result = $this->conn->query($sql);

	  if ($result->num_rows > 0) {
		$word_data = $result->fetch_assoc();
		foreach($word_data as $key=>$value){
			 if($key == 'ID'){
				 unset($word_data["$key"]);
			 }
			 elseif($key == 'Word'){
				 unset($word_data["$key"]);
			 }
			 else{
				 $word_data[$key] *= ($count * $this->classifications[$key]);
			 }
	    }
		return $word_data;
	  }
    }else{
	    // unknown word... add it or ignore it
	}
  }
  

  public function AddOrUpdateWord(&$word, &$count, &$EmailClassifications){

    if(count($this->classifications) < 1){
      $this->GetKnownClasses();
    }
        
    $sql = "";
  
    if($this->KnownWord($word) == false){
      // Add Word
      // Build Insert SQL
      $sql .= "INSERT INTO `Words` ";
      $sql .= "(`ID`, `Word`, `" . implode('`, `', array_keys($EmailClassifications)) . '`) ';
      $sql .= "VALUES (NULL, '$word', '" . implode("', '", array_values($EmailClassifications)) . "')";
    }else{
      // Update Word
      // Build Update SQL
      $sql .= "UPDATE `Words` SET ";  
      $EmailClassifications = array_diff($EmailClassifications, array('0')); // remove any classes
      $classes = array_keys($EmailClassifications);
      for($i = 0; $i < count($classes); $i++){
        $sql .= "`{$classes[$i]}` = `{$classes[$i]}` + $count";
        
        if( $i < count($classes) - 1){
           $sql .= ', ';
        }
      }
      $sql .= " WHERE `Word`='$word'";      
    }

      // DO QUERY
      $this->Connect();
      $result = $this->conn->query($sql);
      $this->Disconnect();    

    if ($result > 0){
      echo substr($sql, 0, 7) . " $word" . PHP_EOL;
    }else{
      die("FAIL");
    }
  }
}

 

We will cover the database and building the actual bot in an upcoming post.

I’d like to thank all of my supporters who’s generous contributions make these posts possible!

I hope you enjoyed reading this post and consider supporting me on Patreon.

 

 

Much Love,

~Joy

Advertisements