Trim Not Working With Array From Mysql Fetched String
Solution 1:
Here's your requested explanation on the regex pattern that solved your issue:
/[\s]+/
(more simply expressed as /\s+/
) says "look for one or more white-space characters (this includes:
' ','\r','\n','\t','\f','\v'). The multi-line
modifier/flag is not necessary because you are not using anchors (^
$
) in your pattern. The unicode
modifier/flag is absolutely critical in your case because your string of html text contains many little devils called...
"NO-BREAK SPACE" and is a combination of unicode characters
194
and160
represented as\x{00A0}
See them highlighted here.
Without the u
flag, the NO-BREAK SPACE
characters remain and additional filtering will be required to remove them.
While you eventually got your code to the right output. I'm happy to produce a leaner single-step pattern that will get you there faster purely using preg_split().
while ($row = $results->fetch_array()) {
$texts = preg_split('/\s*<[^>]+>\s*/u', $row['post_content'], 0, PREG_SPLIT_NO_EMPTY);
var_export($texts);
}
Here is a working regex101 demo.
This new splitting pattern still looks for your tags, but it is more efficient because between the <
and >
, I merely ask to match all characters that are "not >
" by using [^>]+
. This is much simpler for the engine versus asking to match from the long list of characters that .
represents.
Furthermore, I included matching for your unicode-extended white-space characters. \s*
will match zero or more white-space characters before AND after each tag.
Finally, I should explain the additional parameters on preg_split()
. The 0
says "find unlimited matches" -- this is the default behavior, but I must use 0
or -1
as its value to hold its place to ensure that the final parameter is used. PREG_SPLIT_NO_EMPTY
spares you having to take the extra step of using array_filter()
later. It omits any empty elements generated from the split, so you only get the good stuff.
Solution 2:
Trim doesn't work in place. You want this:
$arrayvalue = trim($arrayvalue);
That's really it. Trim returns the trimmed string: it doesn't modify the variable in place.
Solution 3:
I found a solution.
Not exactly sure how it works.. I'm quite unfamiliar with regex.
But the solution that I found (and maybe someone can explain it?) was
$clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray);
The entire script (excluding the MySQL stuff) that worked was
$converted = html_entity_decode( $row['post_content'], ENT_QUOTES);
$converted = trim($converted, chr(0xC2).chr(0xA0));
$htmlarray = preg_split('/<.+?>/', $converted);
$clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray);
$htmlarray2 = array_filter(array_map('trim', $clean_htmlarray));
$clean_htmlarray2 = array_values($htmlarray2);
echo'<pre>';
print_r($clean_htmlarray2);
echo'</pre>';
Output being
Array(
[0] =>SaepeEncomia2.aDNECMirumPopuloSoluniIis8679-1370StatusErrorSed9.9
[1] =>Description
[2] =>DonecRem
[3] =>AnimamUrgebat
[4] =>RerumSed8613-3669 8358&6699
[5] =>1.mE(magNA)QUOAdNominumStatumMassa
[6] =>abSEMAutemReddetHabituSit
[7] =>PRAEDAMACCUMSANPERSONARUMDENEGAREACDUORUM
[8] =>Liustypisitnecquoadversiscrasministrioppressa,versusclasshicremquoscolubrosullocommune!economy!
[9] =>adQuisqueModeste
[10] =>acRemWisi
[11] =>exHacConguemusLeo
[12] =>ab7/92"Alias
[13] =>ad2/73"Adverso&Erat
[14] =>mePersonomEget
[15] =>adViribusFugaFuga
[16] =>abLouor-SitMolles
[17] =>3xBlock-OffPlates
[18] =>adFacunda
[19] =>abPersonasDiam
[20] =>NUNC
[21] =>exTeniettePalmamEaque
[22] =>meTenietinVersusUrna
[23] =>**CONDEMNENDUSREMCUMMAGNORUM**)
A completely trimmed array.
This also works in my while loop for all rows, ie:
$results = $mysqli->query("SELECT ID, post_content
FROM wp_posts'
LIMIT 50;");
In this case I get all 50 rows with completely trimmed strings.
So finally... this was a challenge to figure out!
I just wish I understood it more. I don't really feel like I deserve to be confirmed as the answer to this question, as all I really did was try a BUNCH of different things and finally this worked.
If someone wants to chime in and explain why $clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray);
or rather /[\s]+/mu
was what I needed in this instance, I'll gladly award the answer to them :)
As for now just glad it's working properly. Thanks everyone for all the help and input with this!
Post a Comment for "Trim Not Working With Array From Mysql Fetched String"