mb_split

(PHP 4 >= 4.2.0, PHP 5, PHP 7, PHP 8)

mb_split — 使用正则表达式分割多字节字符串

描述

mb_split(字符串 $pattern, 字符串 $string, 整数 $limit = -1): 数组|false

使用正则表达式 pattern 分割多字节 string 并将其结果作为数组返回。

参数

pattern: 正则表达式模式。
string: 要分割的字符串。
limit: 如果指定了可选参数 limit，则最多将分割成 limit 个元素。

返回值

结果作为数组，或在失败时返回 false。

注释

注意:
默认情况下，此函数将使用 mb_regex_encoding() 指定的字符编码作为字符编码。

参见

mb_regex_encoding() - 设置/获取多字节正则表达式的字符编码
mb_ereg() - 支持多字节的正则表达式匹配

发现问题？

了解如何改进此页面 • 提交拉取请求 • 报告错误

＋添加注释

用户贡献的注释 8 条注释

向上

向下

Stas Trefilov, Vertilia ¶

9 年前

一种（更简单）的方法，使用一次内置函数调用将 UTF-8 字符串中的所有字符提取到数组中

<?php
 $str = 'Ма-
руся';
 print_r(preg_split('//u', $str, null, PREG_SPLIT_NO_EMPTY));
?>

输出

数组
(
[0] => М
[1] => а
    [2] => -
    [3] => 

[4] => р
[5] => у
[6] => с
[7] => я
)

向上

向下

boukeversteegh at gmail dot com ¶

13 年前

与其他正则表达式函数（如 preg_match）不同，$pattern 参数不使用 /pattern/ 定界符。

<?php
 # 可行。/pattern/ 周围没有斜杠
 print_r( mb_split("\s", "hello world") );
 数组 (
 [0] => hello
 [1] => world
 )

 # 不起作用：
 print_r( mb_split("/\s/", "hello world") );
 数组 (
 [0] => hello world
 )
?>

向上

向下

adjwilli at yahoo dot com ¶

16 年前

我认为大多数人希望有一种简单的方法将多字节字符串分解成其各个字符。这是我用来执行此操作的函数。将 UTF-8 更改为您选择的编码方法。


<?php 
function mbStringToArray ($string) { 
 $strlen = mb_strlen($string); 
 while ($strlen) { 
 $array[] = mb_substr($string,0,1,"UTF-8"); 
 $string = mb_substr($string,1,$strlen,"UTF-8"); 
 $strlen = mb_strlen($string); 
 } 
 return $array; 
} 
?>

向上

向下

gunkan at terra dot es ¶

12 年前

要分割像这样的字符串：“日、に、本、ほん、語、ご”，使用“、”分隔符，我使用了

$v = mb_split('、',"日、に、本、ほん、語、ご");

但不起作用。

解决方案是在之前设置以下内容

mb_regex_encoding('UTF-8');
mb_internal_encoding("UTF-8");
$v = mb_split('、',"日、に、本、ほん、語、ご");

现在可以正常工作了

数组
(
[0] => 日
[1] => に
[2] => 本
[3] => ほん
[4] => 語
[5] => ご
)

向上

向下

boukeversteegh at gmail dot com ¶

14 年前

除了 Sezer Yalcin 的提示之外。


此函数将多字节字符串拆分为字符数组。类似于 str_split()。


<?php 
function mb_str_split( $string ) { 
 # 在开始之后的所有位置拆分：^ 
 # 以及在结束之前：$ 
 return preg_split('/(?<!^)(?!$)/u', $string ); 
} 
 
$string = '火车票'; 
$charlist = mb_str_split( $string ); 
 
print_r( $charlist ); 
?> 

# 输出
数组
(

[0] => 火
[1] => 车
[2] => 票
)

向上

向下

thflori at gmail ¶

7 年前

我同意有些人可能想要 mb_explode('', $string);

这是我的解决方案

<?php

$string = 'Hallöle';

$array = array_map(function ($i) use ($string) { 
 return mb_substr($string, $i, 1); 
}, range(0, mb_strlen($string) -1));

expect($array)->toEqual(['H', 'a', 'l', 'l', 'ö', 'l', 'e']);

?>

向上

向下

-1

gert dot matern at web dot de ¶

15 年前

这里我们讨论的是多字节（例如 UTF-8）字符串，因此对于以下字符串，preg_split 将会失败


'Weiße Rosen sind nicht grün!'


并且因为我没有找到模拟 str_split 的正则表达式，所以我对 adjwilli 的第一个解决方案进行了一些优化


<?php 
$string = 'Weiße Rosen sind nicht grün!' 
$stop = mb_strlen( $string); 
$result = array(); 
 
for( $idx = 0; $idx < $stop; $idx++) 
{ 
 $result[] = mb_substr( $string, $idx, 1); 
} 
?> 

这是一个使用 adjwilli 函数的示例


<?php 
mb_internal_encoding( 'UTF-8'); 
mb_regex_encoding( 'UTF-8'); 
 
function mbStringToArray 
( $string 
) 
{ 
 $stop = mb_strlen( $string); 
 $result = array(); 
 
 for( $idx = 0; $idx < $stop; $idx++) 
 { 
 $result[] = mb_substr( $string, $idx, 1); 
 } 
 
 return $result; 
} 
 
echo '<pre>', PHP_EOL, 
print_r( mbStringToArray( 'Weiße Rosen sind nicht grün!', true)), PHP_EOL, 
'</pre>'; 
?> 

如果有人找到了使用 mb_split 模拟 str_split 的正则表达式，请告诉我（通过个人邮件）。

向上

向下

-2

qdb at kukmara dot ru ¶

14 年前

另一种将多字节字符串分割成数组的方法
<?php
$s='әӘөүҗңһ';

//$temp_s=iconv('UTF-8','UTF-16',$s);
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_a_len=count($temp_a);
for($i=0;$i<$temp_a_len;$i++){
 //$temp_a[$i]=iconv('UTF-16','UTF-8',$temp_a[$i]);
 $temp_a[$i]=mb_convert_encoding($temp_a[$i],'UTF-8','UTF-16');
}

echo('<pre>');
print_r($temp_a);
echo('</pre>');

//也可以直接使用 UTF-16:
define('SLS',mb_convert_encoding('/','UTF-16'));
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_s=implode(SLS,$temp_a);
$temp_s=mb_convert_encoding($temp_s,'UTF-8','UTF-16');
echo($temp_s);
?>

＋添加注释