Run Length Encoding (RLE) is a simple form of data compression where sequences of the same data value are stored as a single value and count. For example, the string AAAABBBCCDAA
can be compressed to 4A3B2C1D2A
.
Example from Rosetta Code:
function encode($str)
{
return preg_replace_callback('/(.)\1*/', function ($match) {
return strlen($match[0]) . $match[1];
}, $str);
}
function decode($str)
{
return preg_replace_callback('/(\d+)(\D)/', function($match) {
return str_repeat($match[2], $match[1]);
}, $str);
}
echo encode('WWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWW'), PHP_EOL;
echo decode('11W1B12W3B24W1B10W'), PHP_EOL;
About the Regular Expression
The regular expression /(.)\1*/
is used to match sequences of the same character. Here's how it works:
(.)
.
means "match any single character" (except newline)- The parentheses
( )
create a "capturing group", so this captures any single character and remembers it
\1
- This is a backreference:\1
refers back to whatever was matched in the first capturing group- In regex,
\1
means "match the exact same character that was captured in group 1"
*
- This is a quantifier:- Means "zero or more occurrences" of the previous element
- In this case, it means "zero or more occurrences of the same character"