The PCRE section in the PHP manual says about
\w (any word character):
A “word” character is any letter or digit or the underscore character, that is, any character which can be part of a Perl “word”. The definition of letters and digits is controlled by PCRE’s character tables, and may vary if
locale-specific matching is taking place. For example, in the “fr” (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.
So PCRE’s escape sequence
\w should actually include locale-specific characters (like Ä, Ö, Ü in german) if the right locale is set. I tried to call setlocale(LC_ALL,‘de’) before i use this ‘match’ validator:
But it still doesn’t accept german umlauts. So any idea how to enable locale-specific character classes?
This validator uses preg_match
Check the last comment maybe it will help you -
Regarding the locale, try this (from
/* try different possible locale names for german as of PHP 4.3.0 */
$loc_de = setlocale(LC_ALL, 'de_DE@euro', 'de_DE', 'de', 'ge');
echo "Preferred locale for german on this system is '$loc_de'";
For me it outputs
So this way you can obviously make sure the correct locale was selected.
Regarding the regex, you have to add u modifier I think:
If there are still any problems try putting this in the entry script (not sure if it helps though):
Your tips have put me on the right track: The right locale is ‘de_DE.utf8’ on my system (gentoo).
Before finding this out i wondered if we should fix this in CRegularExpressionValidator. But it’s too system specific. I will not rely on this either.
So i’ll omit
\w and will use custom character classes instead:
works pretty fine.
Coming back to this old topic:
An alternative can be unicode character properties. They are locale independent, though. Still very useful, if you want to match e.g. any possible letter character in any language.