Skip to content Skip to sidebar Skip to footer

Trim Not Working With Array From Mysql Fetched String

What I'm trying to do is take a block of html, strip out all the html tags, and put each line of text into a PHP array. I'm just trying it with one block to test (hence the WHERE I

Solution 1:

Here's your requested explanation on the regex pattern that solved your issue:

/[\s]+/ (more simply expressed as /\s+/) says "look for one or more white-space characters (this includes: ' ','\r','\n','\t','\f','\v'). The multi-line modifier/flag is not necessary because you are not using anchors (^$) in your pattern. The unicode modifier/flag is absolutely critical in your case because your string of html text contains many little devils called...

"NO-BREAK SPACE" and is a combination of unicode characters 194 and 160 represented as \x{00A0} See them highlighted here.

Without the u flag, the NO-BREAK SPACE characters remain and additional filtering will be required to remove them.


While you eventually got your code to the right output. I'm happy to produce a leaner single-step pattern that will get you there faster purely using preg_split().

while ($row = $results->fetch_array()) {
    $texts = preg_split('/\s*<[^>]+>\s*/u', $row['post_content'], 0, PREG_SPLIT_NO_EMPTY);
    var_export($texts);
}

Here is a working regex101 demo.

This new splitting pattern still looks for your tags, but it is more efficient because between the < and >, I merely ask to match all characters that are "not >" by using [^>]+. This is much simpler for the engine versus asking to match from the long list of characters that . represents.

Furthermore, I included matching for your unicode-extended white-space characters. \s* will match zero or more white-space characters before AND after each tag.

Finally, I should explain the additional parameters on preg_split(). The 0 says "find unlimited matches" -- this is the default behavior, but I must use 0 or -1 as its value to hold its place to ensure that the final parameter is used. PREG_SPLIT_NO_EMPTY spares you having to take the extra step of using array_filter() later. It omits any empty elements generated from the split, so you only get the good stuff.

Solution 2:

Trim doesn't work in place. You want this:

$arrayvalue = trim($arrayvalue);

That's really it. Trim returns the trimmed string: it doesn't modify the variable in place.

Solution 3:

I found a solution.

Not exactly sure how it works.. I'm quite unfamiliar with regex.

But the solution that I found (and maybe someone can explain it?) was

$clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray);

The entire script (excluding the MySQL stuff) that worked was

$converted = html_entity_decode( $row['post_content'], ENT_QUOTES);
$converted = trim($converted, chr(0xC2).chr(0xA0));

$htmlarray = preg_split('/<.+?>/', $converted);

$clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray);

$htmlarray2 = array_filter(array_map('trim', $clean_htmlarray));

$clean_htmlarray2 = array_values($htmlarray2);

echo'<pre>';
print_r($clean_htmlarray2);
echo'</pre>';

Output being

Array(
    [0] =>SaepeEncomia2.aDNECMirumPopuloSoluniIis8679-1370StatusErrorSed9.9
    [1] =>Description
    [2] =>DonecRem
    [3] =>AnimamUrgebat
    [4] =>RerumSed8613-3669 8358&6699
    [5] =>1.mE(magNA)QUOAdNominumStatumMassa
    [6] =>abSEMAutemReddetHabituSit
    [7] =>PRAEDAMACCUMSANPERSONARUMDENEGAREACDUORUM
    [8] =>Liustypisitnecquoadversiscrasministrioppressa,versusclasshicremquoscolubrosullocommune!economy!
    [9] =>adQuisqueModeste
    [10] =>acRemWisi
    [11] =>exHacConguemusLeo
    [12] =>ab7/92"Alias
    [13] =>ad2/73"Adverso&Erat
    [14] =>mePersonomEget
    [15] =>adViribusFugaFuga
    [16] =>abLouor-SitMolles
    [17] =>3xBlock-OffPlates
    [18] =>adFacunda
    [19] =>abPersonasDiam
    [20] =>NUNC
    [21] =>exTeniettePalmamEaque
    [22] =>meTenietinVersusUrna
    [23] =>**CONDEMNENDUSREMCUMMAGNORUM**)

A completely trimmed array.

This also works in my while loop for all rows, ie:

$results = $mysqli->query("SELECT ID, post_content
FROM wp_posts'
LIMIT 50;");

In this case I get all 50 rows with completely trimmed strings.

So finally... this was a challenge to figure out!

I just wish I understood it more. I don't really feel like I deserve to be confirmed as the answer to this question, as all I really did was try a BUNCH of different things and finally this worked.

If someone wants to chime in and explain why $clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray); or rather /[\s]+/mu was what I needed in this instance, I'll gladly award the answer to them :)

As for now just glad it's working properly. Thanks everyone for all the help and input with this!

Post a Comment for "Trim Not Working With Array From Mysql Fetched String"