Fixing PHP’s Explode Function

0
455
PHP explode

When you perform a PHP explode() on strings, there is a problem in the array items that are returned. Follow along as we look at the subtle glitch in this function, and how we will go about to fix it.

The explode() function is a very handy tool to use when you need to split up strings marked with a delimiter. This is commonly used to determine a number of things:

  1. The number of words in a string
  2. The parts of a URL
  3. The tokens in a comma delimited text
  4. and others

For example, in the string “a,b,c,d” we want to pull out each of the tokens. We can do this by performing the following:

$delimiter = ",";
$stringToParse = "a,b,c,d";
$aTokens = explode($delimiter, $token);

But that is an ideal string setup. explode has some flaws in it as we can see below. Study this code and then run it. It executes all combinations of delimited strings as a test of the explode() function.

echo "Operating System: " . PHP_OS . "<br/>";
echo "HTTP Server: " . $_SERVER['SERVER_SOFTWARE'] . "<br>";
echo "<hr/>";
 
$str = "/";
$parts = explode("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "a";
$parts = explode("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "/a";
$parts = explode("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "a/";
$parts = explode("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "/a/";
$parts = explode("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "a/b";
$parts = explode("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "/a/b";
$parts = explode("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "a/b/";
$parts = explode("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "/a/b/";
$parts = explode("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);

When I run this on Windows 7 Home and Apache I get:

Operating System is: WINNT
HTTP Server is: Apache/2.2.17 (Win32) PHP/5.3.4
 
String: / Count: 2
array
  0 => string '' (length=0)
  1 => string '' (length=0)
 
String: a Count: 1
array
  0 => string 'a' (length=1)
 
String: /a Count: 2
array
  0 => string '' (length=0)
  1 => string 'a' (length=1)
 
String: a/ Count: 2
array
  0 => string 'a' (length=1)
  1 => string '' (length=0)
 
String: /a/ Count: 3
array
  0 => string '' (length=0)
  1 => string 'a' (length=1)
  2 => string '' (length=0)
 
String: a/b Count: 2
array
  0 => string 'a' (length=1)
  1 => string 'b' (length=1)
 
String: /a/b Count: 3
array
  0 => string '' (length=0)
  1 => string 'a' (length=1)
  2 => string 'b' (length=1)
 
String: a/b/ Count: 3
array
  0 => string 'a' (length=1)
  1 => string 'b' (length=1)
  2 => string '' (length=0)
 
String: /a/b/ Count: 4
array
  0 => string '' (length=0)
  1 => string 'a' (length=1)
  2 => string 'b' (length=1)
  3 => string '' (length=0)

If you look closely, the results are very inaccurate and inconsistent. For example, in the case of just “/”, explode returns an array with a count of 2 and two empty slots. For the case of “a/b/” explode returns a count of 3 items with a, b, and an empty slot.

This is truly useless. Why would I care where the delimiters occurred? The purpose of using explode is to obtain the delimited tokens. It does not do that accurately and requires for you to take on additional processing to iterate through each item in the array, filtering out the empty slots.

Immediately I thought this may be an issue of Windows PHP only, however I ran this on my Linux server and the results were the same.

PHP Explode Statistics

Changing the explode function in the PHP core could affect existing programs. If we take a look at the applications that use it, we can see that it would possibly break if we changed core:

WordPress 3.2.1 226 occurrences
Drupal 7.20 171 occurrences
Joomla 1.7.3 263 occurrences
phpBB 3.0.10 234 occurrences

Fixing PHP explode()

So what do we do to fix this?

Here’s my solution. Create another function called explode_filtered(). Have it call explode and then filter the array through a filter callback that checks each array item to see if it is empty or not. We toss the empty one’s out and return only the tokens.

function explode_filtered_empty($var) {
  if ($var == "")
    return(false);
  return(true);
}
function explode_filtered($delimiter, $str) {
  $parts = explode($delimiter, $str);
  return(array_filter($parts, "explode_filtered_empty"));
}

Our new test will now look like this:

function explode_filtered_empty($var) {
  if ($var == "")
    return(false);
  return(true);
}
function explode_filtered($delimiter, $str) {
  $parts = explode($delimiter, $str);
  return(array_filter($parts, "explode_filtered_empty"));
}
 
echo "Operating System: " . PHP_OS . "<br>";
echo "HTTP Server: " . $_SERVER['SERVER_SOFTWARE'] . "<br>";
echo "<hr/>";
 
$str = "/";
$parts = explode_filtered("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "a";
$parts = explode_filtered("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "/a";
$parts = explode_filtered("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "a/";
$parts = explode_filtered("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "/a/";
$parts = explode_filtered("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "a/b";
$parts = explode_filtered("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "/a/b";
$parts = explode_filtered("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "a/b/";
$parts = explode_filtered("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);
echo "<hr/>";
 
$str = "/a/b/";
$parts = explode_filtered("/", $str);
$numParts = count($parts);
echo "String: " . $str . " Count: " . $numParts  . "<br>";
var_dump($parts);

Now notice that our arrays are cleaned up and only contain the tokens.

Operating System is: WINNT
HTTP Server is: Apache/2.2.17 (Win32) PHP/5.3.4
String: / Count: 0
 
array
  empty
 
String: a Count: 1
array
  0 => string 'a' (length=1)
 
String: /a Count: 1
array
  1 => string 'a' (length=1)
 
String: a/ Count: 1
array
  0 => string 'a' (length=1)
 
String: /a/ Count: 1
array
  1 => string 'a' (length=1)
 
String: a/b Count: 2
Array
  0 => string 'a' (length=1)
  1 => string 'b' (length=1)
 
String: /a/b Count: 2
array
  1 => string 'a' (length=1)
  2 => string 'b' (length=1)
 
String: a/b/ Count: 2
array
  0 => string 'a' (length=1)
  1 => string 'b' (length=1)
 
String: /a/b/ Count: 2
array
  1 => string 'a' (length=1)
  2 => string 'b' (length=1)

So in the case where you want a cleaner, better filtered version of explode(), use explode_filter and setup a filter callback and you’ll be all set.

Of Interest