Run Length Encoding in PHP with RegEx

Run Length Encoding (RLE) is a simple form of data compression where sequences of the same data value are stored as a single value and count. For example, the string AAAABBBCCDAA can be compressed to 4A3B2C1D2A.

Example from Rosetta Code:

function encode($str)
{
    return preg_replace_callback('/(.)\1*/', function ($match) {
        return strlen($match[0]) . $match[1];
    }, $str);
}

function decode($str)
{
    return preg_replace_callback('/(\d+)(\D)/', function($match) {
        return str_repeat($match[2], $match[1]);
    }, $str);
}

echo encode('WWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWW'), PHP_EOL;
echo decode('11W1B12W3B24W1B10W'), PHP_EOL;

About the Regular Expression

The regular expression /(.)\1*/ is used to match sequences of the same character. Here's how it works:

  1. (.)

    • . means "match any single character" (except newline)
    • The parentheses ( ) create a "capturing group", so this captures any single character and remembers it
  2. \1 - This is a backreference:

    • \1 refers back to whatever was matched in the first capturing group
    • In regex, \1 means "match the exact same character that was captured in group 1"
  3. * - This is a quantifier:

    • Means "zero or more occurrences" of the previous element
    • In this case, it means "zero or more occurrences of the same character"

Tags

 PHP  RegEx  Algorithms