I once had a promising Padawan who came to me to learn the ways of The Code and was on his way to becoming a master of the light sided languages, when he was seduced by The Dark Side of The Code and switched languages to attended one of The Emperor’s “coding-camps”.

That’s only tangential to this post, none the less I’ll continue telling the story anyway. 😛

See, he went on a job interview recently and since there is the “noob” factor the employer used one of it’s new automated testing droids to gauge his expertise.

He later explained to me that the proctor droid did challenge his abilities and even though he completed most of the trials successfully, he was unable to complete the final coding challenge within the time available.

He said:

“The challenge was to ‘compress’ strings by taking contiguous repeating letters and group them with a number representing how many of that char that letter represented.”

He went on to describe his proposed solution to me and it sounded a lot like a “Suffix-Tree” which is nifty for sure but I don’t think that is the best way to solve this kind problem as he described it.

We don’t need a list of all possible permutations we just need to group reoccurring chars.

Now here’s the thing, I like building things that have a reason to be built and if you review my code and prototypes you will see that none of it is purely to test/practice an algorithm, every prototype I build does something, even if the prototype is just a proof of concept, it sill does something and had a reason to be created.

This is because I believe you don’t develop the ability to CODE (or most things really) by practicing them mindlessly by rote repetition.

Sure, repetition is valuable practice and you can learn to “reverse a string and hash it” by practicing it a thousand times, the ability will sink in!

You will know the HOW but your understanding of the WHY is likely to be lacking if all you focus on is “learning to code” rather than “learning how to do something with code”.

It’s a subtle but signification distinction I am making which is that you have to have a reason, some purpose behind why you want or need to reverse and hash a string because then you are forced to start weighing the pros and cons to different methodologies.

You absolutely SHOULD learn all the different possible ways you can do anything you set out to do but unless you employ them DOING something other than a textbook example you are libel to have worse outcomes when faced with real world challenges that have complex trade offs.

I’m NOT saying regular practice isn’t needed.

I AM saying that more important than regular practice is “mindful practice”.

Which is facilitated (in my opinion) by having a genuine interest in the subject beyond an academic need to know it.

Take learning a natural spoken language for example, who do you think would learn a language better?

  • Someone who needed to pass a class/test in that language…

OR

  • Someone who has a friend or family member who only spoke another language and wants to have meaningful conversations?

I believe long term the person who has a personal real world motivation (i.e. wanting to speak to the other person) will succeed better at learning another language.

Obviously I’m being very general and this is of course subjective to a large extent but having some kind of personal interest in whatever you do seems crucial to long term success.

If you are musically inclined, is rote arpeggio practice as interesting and motivating as learning to play a few of your favorite songs so you can show your friends?

Is it more engaging to learn to draw your favorite cartoons for an hour, or… learning to draw using n-point perspective vanishing points for 60 minutes (Dekeract anyone? 😛 )?

It’s much easier to #LearnToCode and or #TeachOthersToCode when you are “Building a Speaking Robot” rather than “Learning to Code a Cross-Platform Wrapper in PHP for Common TTS API”, which is equally better still than “reverse a string and hash it” don’t you think?

That’s just my opinion.

Anyway, why it’s relevant here is as I said, “I like building things that have a reason to be built”, and “algo challenges” that lack a real world reason to be built are full of stupid logical problems!

I suppose this is intended to make you think the problem through but in reality it just makes noobs feel good about writing bad code that serves no purpose!

In any case, despite my dislike for writing pointless code, I wanted to offer my solution here in the hopes that if you encounter a similar pointless algo barrier between you and a paycheck… maybe you will remember my solution and solve the employment bot’s challenge!

There Is No Spoon

The rules for the algorithm challenge are simple (at least as he described them to me):

“The challenge was to ‘compress’ strings by taking contiguous repeating letters and group them with a number representing how many of that char that letter represented.”

This challenge is deceptively simple and full of potential bugs just waiting for a coder to create them!

For example, if the goal is string compression, merging single non-repeating chars with a number won’t compress the string!

Consider the string: ABCDEFG

“Compressed” by adding a number and shorting to a shingle char is actually making the string longer.

New String (is double the length) : 1A1B1C1D1E1F1G

This string is 100% longer!

Also, it doesn’t really make much sense other than consistency to compress repeating character sub-strings unless there are more than two chars because otherwise you don’t actually get any compression and we’ve already established that single chars require different processing rules anyway so considering and accounting for this too seems reasonable.

Consider the string: AABBCC

New String (is the same length) : 2A2B2C

So the algorithm must be flexible enough not to make the string longer and able to use rules to handle these situations differently without breaking.

Also, we can assume (Can we? The rules are too vague! What is this “compression” algorithm really for?) that numbers will not naturally exist in the strings because we are using them for compression, but is it possible that symbols (non-number/letter chars) can appear in the strings?

Unless stipulated it seems fair to reason symbols could appear and should probably be compressed similar to letters.

To further make my point of how useless non-real-world coding challenges are, there is no mention of “uncompressing” / “decompressing” the compressed string back into the original and if you compress something you are going to want to decompress it so it makes sense that should be accounted for as well.

My Solution

The simplest way I can think of is to:

Compression

  1. Step through the string in whatever direction the “endianness” of the data or language requires (left to right demonstrated) grouping the repeating chars by checking if the next char (the one on the right) is the same a the current char.
    • If the next char is the same, keep stepping to the right until you reach the end or a different char.
    • if/when we encounter a different char, start a new “group” of chars and repeat growing that group until we find a different char again or we reach the end of the string.
  1. Once all the chars are grouped, step through the groups and count the number of chars in each group.
    • If the number is greater than or equal to a pre-defined “group threshold” (i.e. 3) emit the count and a single instance of that char i.e. AAAA = 4A.
    • Otherwise just emit the the char e.i. AA = AA.

Decompression (from compressed string)

  1. Looks at each char to the left of this one (as we move from left to right) and if it isn’t a number and the current char isn’t a number than this is a single char representing itself, echo the current char and proceed.
    • If the char to  is a number then we’ve already used it to decompress a char.
  2. If the current char is a number than keep checking the chars to the right of this char for numbers until we don’t find another number or we hit the end of the string, repeat the next char non numeric char for that number.
    • This will find repeating chars > than 9 repetitions.
The Code

Here’s the working code.

<?php

/*

With a $group_threshold of 3 or more process these strings to output the example results:

Given this (case sensitive) string: AAaaBaCGGGGSTDSSSDDGCMM
Return this result: AAaaBaC4GSTD3SDDGCMM

Given this (case insensitive) string: AAaaBaCGGGGSTDSSSDDGCMM
Return this result: 4ABAC4GSTD3SDDGCMM

Given this (case sensitive) string: wzzzZZZZZWXXXXyYyW
Return this result: w3z5ZW4XyYyW

Given this (case insensitive) string: wzzzZZZZZWXXXXyYyW
Return this result: W8ZW4X3YW



Actual (case sensitive) output:

Original: wzzzZZZZZWXXXXyYyW
Compressed: w3z5ZW4XyYyW
Uncompressed (from group data): wzzzZZZZZWXXXXyYyW
Uncompressed (from compressed string): wzzzZZZZZWXXXXyYyW

String compressed and decompressed successfully.



*/


//String Test examples:
$chars = 'AAaaBaCGGGGSTDSSSDDGCMM'; // chars is a string
$chars = 'wzzzZZZZZWXXXXyYyW';

$group_threshold = 3; // Less than 3 is unwise
                      // 2 i.e. AA would be encoded as 2A so you save nothing
                      // and a threshold of 1 is going to grow rather than
                      // compress the string i.e. A would be encoded as 1A 
                      // So a string of ABCDEFG would result in a string
                      // that is 100% longer (double length) : 1A1B1C1D1E1F1G

// String Case Sensitivity Options
define("NO_SENSITIVITY_UPPERCASE", 0); // No Case Sensitivity Convert to Upper Case
define("NO_SENSITIVITY_LOWERCASE", 1); // No Case Sensitivity Convert to Lower Case
define("CASE_SENSITIVE", 2); // Preserve Case as it exists in the string

// Pick a string processing case sensitivity
$case_sensitivity = CASE_SENSITIVE;

if($case_sensitivity === NO_SENSITIVITY_UPPERCASE){
    $chars = strtoupper($chars); // Convert to Upper Case
}
elseif($case_sensitivity === NO_SENSITIVITY_LOWERCASE){
    $chars = strtolower($chars); // Convert to Lower Case
}
elseif($case_sensitivity === CASE_SENSITIVE){
    // Do any case sensitive pre-processing here
}
else{
    die('You selected an invalid string case sensitivity.' . PHP_EOL);
}

echo 'Original: ' . $chars . PHP_EOL;
$chars_original = $chars;

// Split the $chars string
$chars = str_split($chars); // $chars is now an array containing 
                            // each char as an element

$groups = array(); // Combined chars go here as an array of arrays of chars
$group = 0; // The index position of the current char group

// Count how long the $chars array is
$number_of_chars = count($chars);

// Group the chars
foreach($chars as $key=>$char){ // For all the Chars
    
    // Add this char to the current group
    $groups[$group][] = $char; 
    
    // If this is not the last char
    if($key < $number_of_chars){
        
        // Check if the current char 
        // Is not same as the next char
        if($char != @$chars[$key+1]){
            // The char to the right of this char
            // is not the same as this char
            // so when the foreach proceeds
            // we want the next char to be added
            // to a different group
            $group++; // next group
        }
    }
}

// Count and echo the "compressed" char groups
echo 'Compressed: ';
$compressed_chars = '';
foreach($groups as $key=>$group){
    
    $number_of_times_repeated = count($group);
    
    // If the number of times this char group repeated 
    // is greater or equal to the $group_threshold value
    if($number_of_times_repeated >= $group_threshold){
        // Example ($group_threshold < 4) AAAA = 4A
        $compressed_chars .= $number_of_times_repeated . $group[0];
    }
    else{
        // Example ($group_threshold > 2) AA = AA
        $compressed_chars .= str_repeat($group[0], $number_of_times_repeated); 
    }
}
echo $compressed_chars . PHP_EOL;


// Rebuild "uncompress" the "compressed" char groups
// back to the original string
echo 'Uncompressed (from group data): '; 
$uncompressed_chars_from_groups = '';
foreach($groups as $key=>$group){
    $uncompressed_chars_from_groups .= str_repeat($group[0], count($group)); 
}
echo $uncompressed_chars_from_groups . PHP_EOL;


// Rebuild "uncompress" the "compressed" char string
// back to the original string
echo 'Uncompressed (from compressed string): ';
$compressed_chars = str_split($compressed_chars);
$compressed_chars_length = count($compressed_chars);
$uncompressed_chars_from_string = '';
foreach($compressed_chars as $key=>$char){
    
    // If the char to the left is not a number
    if(!is_numeric(@$compressed_chars[$key-1])){
        // If this char is a number
        if(is_numeric($char)){
            
            // keep checking to the right in case number is > 9
            $i = 0;
            $number = '';
            
            while(is_numeric(@$compressed_chars[$key+$i]) && ($key+$i) <= $compressed_chars_length){
                $number .= $compressed_chars[$key+$i]; // Concatenate the char onto a number string
                $i++;
            }
            $uncompressed_chars_from_string .= str_repeat($compressed_chars[$key+$i], (int) $number);
            
        }
        else{ 
            // This char is a symbol or letter representing itself only
            $uncompressed_chars_from_string .= $char;
        }
    }
}
echo $uncompressed_chars_from_string . PHP_EOL;

// If the backup of the original string matches the
// string generated by decompressing the group data held in memory
// and the string generated decompressing the compressed string
if($chars_original === $uncompressed_chars_from_string
    && $chars_original === $uncompressed_chars_from_groups){
    // Everything is working correctly
    echo PHP_EOL . "String compressed and decompressed successfully.". PHP_EOL;
}
else{
    // Something isn't working
    echo "You broke it you fix it! :-P". PHP_EOL; 
}

Without knowing more about the “use case” (what the code is actually supposed to do) and how it will be used it becomes impossible to improve further it in any meaningful way, which is why I don’t enjoy algorithm practice and challenges… you aren’t building  anything real and useful!

Featured Image

So, lately I’ve been doing this thing where I put in a little effort to make the featured image a wallpaper and some of you really seem to enjoy this so I’ve done that again with this post.

It illustrates the idea of compressing repeating elements in a group into smaller representations of the original.

Please enjoy!

Look Forward Char Group String Compression Wallpaper
Look Forward Char Group String Compression Wallpaper

If you enjoy my code and and content then please like, share, comment and subscribe!

And if you want to contribute financially to my efforts, I have a Patreon where you can pledge $1 or more a month for one or more months, cancel any time.

Much Love,

~Joy