soundex

(PHP 4, PHP 5, PHP 7, PHP 8)

soundex — 计算字符串的 Soundex 键

描述

soundex(string $string): string

计算string的 Soundex 键。

Soundex 键具有以下特性：发音相似的单词会产生相同的 Soundex 键，因此可用于简化数据库中的搜索，在这种情况下，您知道发音但不知道拼写。

此 Soundex 函数是由 Donald Knuth 在“计算机程序设计艺术，第 3 卷：排序和搜索”，Addison-Wesley（1973），第 391-392 页中描述的。

参数

string: 输入字符串。

返回值

返回一个包含四个字符的 string 作为 Soundex 键。如果string中包含至少一个字母，则返回的字符串以字母开头。否则返回"0000"。

变更日志

版本	描述
8.0.0	在此版本之前，使用空字符串调用该函数会返回`false`，没有特别的理由。

示例

示例 #1 Soundex 示例

<?php
soundex("Euler") == soundex("Ellery"); // E460
soundex("Gauss") == soundex("Ghosh"); // G200
soundex("Hilbert") == soundex("Heilbronn"); // H416
soundex("Knuth") == soundex("Kant"); // K530
soundex("Lloyd") == soundex("Ladd"); // L300
soundex("Lukasiewicz") == soundex("Lissajous"); // L222
?>

参见

levenshtein() - 计算两个字符串之间的 Levenshtein 距离
metaphone() - 计算字符串的 Metaphone 键
similar_text() - 计算两个字符串之间的相似度

发现问题？

了解如何改进此页面 • 提交拉取请求 • 报告错误

＋添加注释

用户贡献注释 20 条注释

上移

下移

nicolas dot zimmer at einfachmarke dot de ¶

16 年前

由于 soundex() 对德语的生成结果并不理想
我们编写了一个函数来实现所谓的科隆语音学
(Cologne Phonetic)。

请在下方找到代码，希望它对您有所帮助 


<?php
/**
 * 用于获取字符串科隆音系值的函数
 * 
 * 如 http://de.wikipedia.org/wiki/Kölner_Phonetik 中所述
 * 基于 Hans Joachim Postel：科隆音系。
 * 一种基于形态分析识别人名的程序。
 * 载于：IBM 新闻，第 19 卷，1969 年，第 925-931 页
 * 
 * 本程序的发布希望能对您有所帮助，
 * 但不提供任何担保；甚至不提供适销性或适用于特定目的的默示担保。
 * 有关详细信息，请参阅 GNU 通用公共许可证。
 *
 * @package phonetics
 * @version 1.0
 * @link http://www.einfachmarke.de
 * @license GPL 3.0 <https://gnu.ac.cn/licenses/>
 * @copyright 2008 by einfachmarke.de
 * @author Nicolas Zimmer <nicolas dot zimmer at einfachmarke.de>
 */

function cologne_phon($word){
 
 /**
 * @param string $word 要分析的字符串
 * @return string $value 表示科隆音系值
 * @access public
 */
 
 //准备处理
 $word=strtolower($word);
 $substitution=array(
 "ä"=>"a",
 "ö"=>"o",
 "ü"=>"u",
 "ß"=>"ss",
 "ph"=>"f"
 );

 foreach ($substitution as $letter=>$substitution) {
 $word=str_replace($letter,$substitution,$word);
 }
 
 $len=strlen($word);
 
 //异常规则
 $exceptionsLeading=array(
 4=>array("ca","ch","ck","cl","co","cq","cu","cx"),
 8=>array("dc","ds","dz","tc","ts","tz")
 );
 
 $exceptionsFollowing=array("sc","zc","cx","kx","qx");
 
 //编码表
 $codingTable=array(
 0=>array("a","e","i","j","o","u","y"),
 1=>array("b","p"),
 2=>array("d","t"),
 3=>array("f","v","w"),
 4=>array("c","g","k","q"),
 48=>array("x"),
 5=>array("l"),
 6=>array("m","n"),
 7=>array("r"),
 8=>array("c","s","z"),
 );
 
 for ($i=0;$i<$len;$i++){
 $value[$i]="";
 
 //异常
 if ($i==0 AND $word[$i].$word[$i+1]=="cr") $value[$i]=4;
 
 foreach ($exceptionsLeading as $code=>$letters) {
 if (in_array($word[$i].$word[$i+1],$letters)){

 $value[$i]=$code;

} }
 
 if ($i!=0 AND (in_array($word[$i-1].$word[$i], 
$exceptionsFollowing))) {

 value[$i]=8; 

} 
 
 //常规编码
 if ($value[$i]==""){
 foreach ($codingTable as $code=>$letters) {
 if (in_array($word[$i],$letters))$value[$i]=$code;
 }
 }
 }
 
 //删除重复值
 $len=count($value);
 
 for ($i=1;$i<$len;$i++){
 if ($value[$i]==$value[$i-1]) $value[$i]="";
 }
 
 //删除元音
 for ($i=1;$i>$len;$i++){//省略第一个字符代码和 h
 if ($value[$i]==0) $value[$i]="";
 }
 
 
 $value=array_filter($value);
 $value=implode("",$value);
 
 return $value;
 
}

?>

上移

下移

fie at myrealbox dot com ¶

21 年前

zinious dot com 的管理员

抱歉，您的代码不符合 Soundex 标准
以下是使用您的代码、我的代码和默认代码得到的结果。

字符串：rest
R620 执行管理员的功能 0.009452
R230 执行 cg 的功能 0.001779
R230 执行默认 Soundex 功能 9.4999999999956E-005

字符串：reset
R620 执行管理员的功能 0.0055900000000001
R230 执行 cg 的功能 0.00091799999999997
R230 执行默认 Soundex 功能 0.00010600000000005

我不知道为什么默认情况下，偶尔会得到 9.xxx 的结果。我认为这很奇怪。
我的代码在底部。这些测试是在 Soundex 修改之前进行的，正如我下面所描述的。
顺便说一下，有关 Soundex 算法的所有原始规范，请访问
http://www.star-shine.net/~functionifelse/GFD/?word=soundex

dalibor dot toth at podravka dot hr

是的，也许很遗憾它给了你相同的代码，
即使 Metaphone 也有这个问题。
但人们可能不希望如此精确。如果有人 

如果在搜索引擎上（我们叫它shmoogle）
搜索 "php array reset" 却输入了 "php array rest"
那么shmoogle可能会返回一些关于床之类的东西…
（如果它们都很蠢，没有将前面的词
视为更重要的）所以无论如何，shmoogle可能需要
在这样的情况下降低准确性…但尽管如此…
我的解决方法是在字符串末尾添加音节数，使其成为5个字符长…
它将按如下方式工作…

代码地址：http://star-shine.net/~functionifelse/cg_soundex.php

或者如果你只想使用默认的soundex函数

$str = soundex($str).cg_sylc($str);

或多或少算是革命性的…可能更少…
不过这个函数只适用于单个单词…我想看看是否有人
可以修改它，使用split并通过循环来获取每个单词的cg_soundex
那会很有趣；)
我还想建议给那些制作PHP的PHP Zend Apache之类的人
添加一个可选的额外变量，用户可以按如下方式指定

soundex("string",SYL);

它将在字符串末尾返回音节数
高度准确的声音测试，太棒了！还可以添加VOW表示元音
以及CONS表示辅音，或者任何其他人想要的东西…
但我真的认为音节数将足够高效。
嗯…如果这对任何人有帮助，欢迎使用…嗯…祝你在所有
PHP冒险中好运…哦…以及最终结果

音节
1 rest
2 reset
metaphone
RST rest
RST reset
soundex
R230 rest
R230 reset

字符串：rest
R2301 执行cg的函数 0.00211
R230 执行默认soundex函数 0.00011299999999997

字符串：reset
R2302 执行cg的函数 0.001691
R230 执行默认soundex函数 0.00010399999999999

默认函数稍微快一点…
所以也许他们会添加这个选项，我们将获得速度和准确性。

毁灭的寂静之风呼啸！

上移

下移

Dirk Hoeschen - Feenders de ¶

10年前

我对niclas zimmer的“科隆语音”函数进行了一些改进。数组的键和值被反转，以使用简单的数组而不是多维数组。因此，不再需要所有循环和迭代来查找字符的匹配值。
我将该函数放入一个静态类中，并将数组声明移到函数之外。

结果比原始版本更可靠，速度提高了五倍。

<?php 
class CologneHash() {

 static $eLeading = array("ca" => 4, "ch" => 4, "ck" => 4, "cl" => 4, "co" => 4, "cq" => 4, "cu" => 4, "cx" => 4, "dc" => 8, "ds" => 8, "dz" => 8, "tc" => 8, "ts" => 8, "tz" => 8); 

 static $eFollow = array("sc", "zc", "cx", "kx", "qx");

 static $codingTable = array("a" => 0, "e" => 0, "i" => 0, "j" => 0, "o" => 0, "u" => 0, "y" => 0,
 "b" => 1, "p" => 1, "d" => 2, "t" => 2, "f" => 3, "v" => 3, "w" => 3, "c" => 4, "g" => 4, "k" => 4, "q" => 4,
 "x" => 48, "l" => 5, "m" => 6, "n" => 6, "r" => 7, "c" => 8, "s" => 8, "z" => 8);

 public static function getCologneHash($word)
 {
 if (empty($word)) return false;
 $len = strlen($word);
 
 for ($i = 0; $i < $len; $i++) {
 $value[$i] = "";
 
 //Exceptions
 if ($i == 0 && $word[$i] . $word[$i + 1] == "cr") {
 $value[$i] = 4;
 }
 
 if (isset($word[$i + 1]) && isset(self::$eLeading[$word[$i] . $word[$i + 1]])) {
 $value[$i] = self::$eLeading[$word[$i] . $word[$i + 1]];
 }

 if ($i != 0 && (in_array($word[$i - 1] . $word[$i], self::$eFollow))) {
 $value[$i] = 8;
 }
 
 // normal encoding
 if ($value[$i]=="") {
 if (isset(self::$codingTable[$word[$i]])) {
 $value[$i] = self::$codingTable[$word[$i]];
 }
 }
 }

 // delete double values
 $len = count($value);
 
 for ($i = 1; $i < $len; $i++) {
 if ($value[$i] == $value[$i - 1]) {
 $value[$i] = "";
 }
 }
 
 // delete vocals
 for ($i = 1; $i > $len; $i++) {
 // omitting first characer code and h
 if ($value[$i] == 0) {
 $value[$i] = "";
 }
 }
 
 $value = array_filter($value);
 $value = implode("", $value);
 
 return $value;
 }
 
}
?>

上移

下移

synnus at gmail dot com ¶

9年前

<?php
// https://github.com/Fruneau/Fruneau.github.io/blob/master/assets/soundex_fr.php
// http://blog.mymind.fr/blog/2007/03/15/soundex-francais/
function soundex_fr($sIn){
 static $convVIn, $convVOut, $convGuIn, $convGuOut, $accents;
 if (!isset($convGuIn)) {
 $accents = array('É' => 'E', 'È' => 'E', 'Ë' => 'E', 'Ê' => 'E',
 'Á' => 'A', 'À' => 'A', 'Ä' => 'A', 'Â' => 'A', 'Å' => 'A', 'Ã' => 'A',
 'Ï' => 'I', 'Î' => 'I', 'Ì' => 'I', 'Í' => 'I',
 'Ô' => 'O', 'Ö' => 'O', 'Ò' => 'O', 'Ó' => 'O', 'Õ' => 'O', 'Ø' => 'O',
 'Ú' => 'U', 'Ù' => 'U', 'Û' => 'U', 'Ü' => 'U',
 'Ç' => 'C', 'Ñ' => 'N', 'Ç' => 'S', '¿' => 'E',
 'é' => 'e', 'è' => 'e', 'ë' => 'e', 'ê' => 'e',
 'á' => 'a', 'à' => 'a', 'ä' => 'a', 'â' => 'a', 'å' => 'a', 'ã' => 'a',
 'ï' => 'i', 'î' => 'i', 'ì' => 'i', 'í' => 'i',
 'ô' => 'o', 'ö' => 'o', 'ò' => 'o', 'ó' => 'o', 'õ' => 'o', 'ø' => 'o',
 'ú' => 'u', 'ù' => 'u', 'û' => 'u', 'ü' => 'u',
 'ç' => 'c', 'ñ' => 'n');
 $convGuIn = array( 'GUI', 'GUE', 'GA', 'GO', 'GU', 'SCI', 'SCE', 'SC', 'CA', 'CO',
 'CU', 'QU', 'Q', 'CC', 'CK', 'G', 'ST', 'PH');
 $convGuOut = array( 'KI', 'KE', 'KA', 'KO', 'K', 'SI', 'SE', 'SK', 'KA', 'KO',
 'KU', 'K', 'K', 'K', 'K', 'J', 'T', 'F');
 $convVIn = array( '/E?(AU)/', '/([EA])?[UI]([NM])([^EAIOUY]|$)/', '/[AE]O?[NM]([^AEIOUY]|$)/',
 '/[EA][IY]([NM]?[^NM]|$)/', '/(^|[^OEUIA])(OEU|OE|EU)([^OEUIA]|$)/', '/OI/',
 '/(ILLE?|I)/', '/O(U|W)/', '/O[NM]($|[^EAOUIY])/', '/(SC|S|C)H/',
 '/([^AEIOUY1])[^AEIOUYLKTPNR]([UAO])([^AEIOUY])/', '/([^AEIOUY]|^)([AUO])[^AEIOUYLKTP]([^AEIOUY1])/', '/^KN/',
 '/^PF/', '/C([^AEIOUY]|$)/', '/E(Z|R)$/',
 '/C/', '/Z$/', '/(?<!^)Z+/', '/H/', '/W/');
 $convVOut = array( 'O', '1\3', 'A\1',
 'E\1', '\1E\3', 'O',
 'Y', 'U', 'O\1', '9', 
 '\1\2\3', '\1\2\3', 'N',
 'F', 'K\1', 'E',
 'S', 'SE', 'S', '', 'V');
 }

 if ( $sIn === '' ) return ' ';
 $sIn = strtr( $sIn, $accents);
 $sIn = strtoupper( $sIn );
 $sIn = preg_replace( '`[^A-Z]`', '', $sIn );
 if ( strlen( $sIn ) === 1 ) return $sIn . ' ';
 $sIn = str_replace( $convGuIn, $convGuOut, $sIn );
 $sIn = preg_replace( '`(.)\1`', '$1', $sIn );
 $sIn = preg_replace( $convVIn, $convVOut, $sIn);
 $sIn = preg_replace( '`L?[TDX]?S?$`', '', $sIn );
 $sIn = preg_replace( '`(?!^)Y([^AEOU]|$)`', '\1', $sIn);
 $sIn = preg_replace( '`(?!^)[EA]`', '', $sIn);
 return substr( $sIn . ' ', 0, 4);
}
?>

上移

下移

cap at capsi dot cx ¶

24年前

不幸的是，soundex() 对第一个字符非常敏感。无法使用它并使 Clansy 和 Klansy 返回相同的值。如果您想对这样的名称进行语音搜索，您仍然需要编写一个例程来评估 C452 与 K452 的相似性。

上移

下移

synnus at gmail dot com ¶

4 年前

<?php
/* SOUNDEX 法语 
Frederic Bouchery 2003年9月26日
http://www.php-help.net/sources-php/a.french.adapted.soundex.289.html
*/

function soundex2( $sIn ) {
 // 如果没有单词，则立即退出
 if ( $sIn === '' ) return ' ';
 // 将所有内容转换为小写
 $sIn = strtoupper( $sIn );
 // 删除重音符号
 $sIn = strtr( $sIn, 'ÂÄÀÇÈÉÊË&#338;ÎÏÔÖÙÛÜ', 'AAASEEEEEIIOOUUU' );
 // 删除所有非字母字符
 $sIn = preg_replace( '`[^A-Z]`', '', $sIn );
 // 如果字符串只有一个字符，则退出。
 if ( strlen( $sIn ) === 1 ) return $sIn . ' ';
 // 替换主要辅音
 $convIn = array( 'GUI', 'GUE', 'GA', 'GO', 'GU', 'CA', 'CO', 'CU',
'Q', 'CC', 'CK' );
 $convOut = array( 'KI', 'KE', 'KA', 'KO', 'K', 'KA', 'KO', 'KU', 'K',
'K', 'K' );
 $sIn = str_replace( $convIn, $convOut, $sIn );
 // 替换元音，除了 Y 和第一个元音以外，都替换为 A
 $sIn = preg_replace( '`(?<!^)[EIOU]`', 'A', $sIn );
 // 替换前缀，然后保留第一个字母
 // 并进行补充替换
 $convIn = array( '`^KN`', '`^(PH|PF)`', '`^MAC`', '`^SCH`', '`^ASA`',
'`(?<!^)KN`', '`(?<!^)(PH|PF)`', '`(?<!^)MAC`', '`(?<!^)SCH`',
'`(?<!^)ASA`' );
 $convOut = array( 'NN', 'FF', 'MCC', 'SSS', 'AZA', 'NN', 'FF', 'MCC',
'SSS', 'AZA' );
 $sIn = preg_replace( $convIn, $convOut, $sIn );
 // 删除 H，除了 CH 或 SH
 $sIn = preg_replace( '`(?<![CS])H`', '', $sIn );
 // 删除 Y，除了以 A 开头
 $sIn = preg_replace( '`(?<!A)Y`', '', $sIn );
 // 删除结尾的 A、T、D、S
 $sIn = preg_replace( '`[ATDS]$`', '', $sIn );
 // 删除所有 A，除了开头
 $sIn = preg_replace( '`(?!^)A`', '', $sIn );
 // 删除重复字母
 $sIn = preg_replace( '`(.)\1`', '$1', $sIn );
 // 只保留 4 个字符或用空格填充
 return substr( $sIn . ' ', 0, 4);
}
?>

上移

下移

dcallaghan at linuxmail dot org ¶

22 年前

虽然标准的 soundex 字符串长度为 4 个字符，并且这是 php 函数返回的内容，但某些数据库程序会返回任意数量的字符串。例如 MySQL。

MySQL 文档对此进行了说明，建议您可能希望使用子字符串输出标准的 4 个字符。让我们以“Dostoyevski”为例。

select soundex("Dostoyevski")
返回 D2312
select substring(soundex("Dostoyevski"), 1, 4);
返回 D231

PHP 将返回值为“D231”

因此，要使用 soundex 函数在 MySQL SELECT 语句中生成 WHERE 参数，您可以尝试以下操作
$s = soundex('Dostoyevski');
SELECT * FROM authors WHERE substring(soundex(lastname), 1 , 4) = "' . $s . '"';

或者，如果您想绕过 php 函数
$result = mysql_query("select soundex('Dostoyevski')");
$s = mysql_result($result, 0, 0);

上移

下移

administrator at zinious dot com ¶

22 年前

我很久以前在 CGI-perl 中编写了此函数，然后将其（如果可以这样称呼）翻译成 PHP。至少可以说有点笨拙，但应该能够 100% 处理真正的 soundex 规范。

// ---代码开始---

function MakeSoundEx($stringtomakesoundexof)
{
$temp_Name = $stringtomakesoundexof;
$SoundKey1 = "BPFV";
$SoundKey2 = "CSKGJQXZ";
$SoundKey3 = "DT";
$SoundKey4 = "L";
$SoundKey5 = "MN";
$SoundKey6 = "R";
$SoundKey7 = "AEHIOUWY";

$temp_Name = strtoupper($temp_Name);
$temp_Last = "";
$temp_Soundex = substr($temp_Name, 0, 1);

$n = 1;
for ($i = 0; $i < strlen($SoundKey1); $i++)
    {
if ($temp_Soundex == substr($SoundKey1, i - 1, 1))
        {
$temp_Last = "1";
            }
    }
for ($i = 0; $i < strlen($SoundKey2); $i++)
    {
if ($temp_Soundex == substr($SoundKey2, i - 1, 1))
        {
$temp_Last = "2";
            }
    }
for ($i = 0; $i < strlen($SoundKey3); $i++)
    {
if ($temp_Soundex == substr($SoundKey3, i - 1, 1))
        {
$temp_Last = "3";
            }
    }
for ($i = 0; $i < strlen($SoundKey4); $i++)
    {
if ($temp_Soundex == substr($SoundKey4, i - 1, 1))
        {
$temp_Last = "4";
            }
    }
for ($i = 0; $i < strlen($SoundKey5); $i++)
    {
if ($temp_Soundex == substr($SoundKey5, i - 1, 1))
        {
$temp_Last = "5";
            }
    }
for ($i = 0; $i < strlen($SoundKey6); $i++)
    {
if ($temp_Soundex == substr($SoundKey6, i - 1, 1))
        {
$temp_Last = "6";
            }
    }
for ($i = 0; $i < strlen($SoundKey6); $i++)
    {
if ($temp_Soundex == substr($SoundKey6, i - 1, 1))
        {
$temp_Last = "";
            }
    }

for ($n = 1; $n < strlen($temp_Name); $n++)
    {
if (strlen($temp_Soundex) < 4)
        {
for ($i = 0; $i < strlen($SoundKey1); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey1, $i - 1, 1) && $temp_Last != "1")
                {
$temp_Soundex = $temp_Soundex."1";
$temp_Last = "1";
                }
            }
for ($i = 0; $i < strlen($SoundKey2); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey2, $i - 1, 1) && $temp_Last != "2")
                {
$temp_Soundex = $temp_Soundex."2";
$temp_Last = "2";
                }
            }
for ($i = 0; $i < strlen($SoundKey3); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey3, $i - 1, 1) && $temp_Last != "3")
                {
$temp_Soundex = $temp_Soundex."3";
$temp_Last = "3";
                }
            }
for ($i = 0; $i < strlen($SoundKey4); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey4, $i - 1, 1) && $temp_Last != "4")
                {
$temp_Soundex = $temp_Soundex."4";
$temp_Last = "4";
                }
            }
for ($i = 0; $i < strlen($SoundKey5); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey5, $i - 1, 1) && $temp_Last != "5")
                {
$temp_Soundex = $temp_Soundex."5";
$temp_Last = "5";
                }
            }
for ($i = 0; $i < strlen($SoundKey6); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey6, $i - 1, 1) && $temp_Last != "6")
                {
$temp_Soundex = $temp_Soundex."6";
$temp_Last = "6";
                }
            }
for ($i = 0; $i < strlen($SoundKey7); $i++)
            {
if (substr($temp_Name, $n - 1, 1) == substr($SoundKey7, $i - 1, 1))
                {
$temp_Last = "";
                }
            }
        }
    }

while (strlen($temp_Soundex) < 4)
    {
$temp_Soundex = $temp_Soundex."0";
    }

return $temp_Soundex;
}

// ---代码结束---

上移

下移

witold4249 at rogers dot com ¶

22 年前

检查单词之间相似性的一个更容易的方法，并避免 Klancy/Clancy 出现的问题，是在字符串前面简单地添加任何字母

例如：OKlancy/OClancy

上移

下移

mail at gettheeawayspam dot iaindooley dot com ¶

21 年前

可以使用 levenshtein() 对 soundex 代码进行比较来解决 soundex“前面字母不同”的问题。在我的应用程序中，它正在搜索专辑名称数据库中与用户提供的特定字符串匹配的条目，我执行以下操作

1. 搜索数据库以查找完全匹配的名称
2. 搜索数据库以查找名称以任何方式作为字符串出现的条目
3. 搜索数据库以查找名称中任何单词（如果用户输入了多个单词）出现的条目，除了小词（and、the、of 等）
4. 然后，如果所有这些都失败了，我转到备用方案 B

- 计算用户搜索词与数据库中每个条目的莱文斯坦距离 (levenshtein())，作为输入的用户搜索词长度的百分比

- 计算输入的用户搜索词的元音代码与数据库中每个字段的莱文斯坦距离，作为输入的用户搜索词的元音代码长度的百分比 


- 计算用户输入的搜索词的 Soundex 码与数据库中每个字段的 Soundex 码之间的 Levenshtein 距离，并将其表示为用户输入的原始搜索词的 Soundex 码长度的百分比。

如果这些百分比中的任何一个小于 50（这意味着将接受首字母不同的两个 Soundex 码！！），则该条目将被接受为可能的匹配项。

上移

下移

justin at NO dot blukrew dot SPAM dot com ¶

20 年前

我最初关注 Soundex() 是因为我想比较单个字母的发音。因此，在朗读生成的字符字符串时，可以轻松地将它们彼此区分开来。（例如，TGDE 很难区分，而 RFQA 则更容易理解）。目标是生成可以通过不同质量的无线电以高精度轻松理解的 ID。我很快发现 Soundex 和 Metaphone 无法做到这一点（它们适用于单词），所以我编写了以下内容来提供帮助。ID 生成函数迭代调用 chrSoundAlike() 来比较每个新字符与前面字符。我很乐意收到对此的任何反馈。谢谢。

<?php
function chrSoundAlike($char1, $char2, $opts = FALSE) {
 $char1 = strtoupper($char1);
 $char2 = strtoupper($char2);
 $opts = strtoupper($opts);

 // 设置发音相似的字符集。
 // (选项：包含数字，包含 W，包含两者，或默认是不包含这些。) 
 switch ($opts) {
 case 'NUMBERS':
 $sets = array(0 => array('A', 'J', 'K'),
 1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z', '3'),
 2 => array('F', 'S', 'X'),
 3 => array('I', 'Y'),
 4 => array('M', 'N'),
 5 => array('Q', 'U', 'W'));
 break;

 case 'STRICT':
 $sets = array(0 => array('A', 'J', 'K'),
 1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z'),
 2 => array('F', 'S', 'X'),
 3 => array('I', 'Y'),
 4 => array('M', 'N'),
 5 => array('Q', 'U', 'W'));
 break;
 
 case 'BOTH':
 $sets = array(0 => array('A', 'J', 'K'),
 1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z', '3'),
 2 => array('F', 'S', 'X'),
 3 => array('I', 'Y'),
 4 => array('M', 'N'),
 5 => array('Q', 'U', 'W'));
 break;

 default:
 $sets = array(0 => array('A', 'J', 'K'),
 1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z'),
 2 => array('F', 'S', 'X'),
 3 => array('I', 'Y'),
 4 => array('M', 'N'),
 5 => array('Q', 'U'));
 break;
 }
 
 // 查看 $char1 是否在一个集合中。
 $matchset = array();
 for ($i = 0; $i < count($sets); $i++) {
 if (in_array($char1, $sets[$i])) {
 $matchset = $sets[$i];
 }
 }

 // 如果 char2 与 char1 在同一个集合中，或者 char1 和 char2 相同，则返回 TRUE。
 if (in_array($char2, $matchset) OR $char1 == $char2) {
 return TRUE;
 } else {
 return FALSE;
 }
}
?>

上移

下移

fie at myrealbox dot com ¶

21 年前

哎呀... 该服务器上的主机被关闭了.. 这是之前代码

函数 cg_sylc($nos){
$nos = strtoupper($nos);
$syllables = 0;

$before = strlen($nos);
$nos = str_replace(array('AA','AE','AI','AO','AU',
'EA','EE','EI','EO','EU','IA','IE','II','IO',
'IU','OA','OE','OI','OO','OU','UA','UE',
'UI','UO','UU'), "", $nos);
$after = strlen($nos);
$diference = $before - $after;
if($before != $after) $syllables += $diference / 2;

if($nos[strlen($nos)-1] == "E") $syllables --;
if($nos[strlen($nos)-1] == "Y") $syllables ++;

$before = $after;
$nos = str_replace(array('A','E','I','O','U'),"",$nos);
$after = strlen($nos);
$syllables += ($before - $after);

return $syllables;
}

函数 cg_SoundEx($SExStr){
$syl = cg_sylc($SExStr);
$SExStr = strtoupper($SExStr);

for($i = 1, $ii = 2,print $SExStr[0]; ;$ii++){

if(($SExStr[$i] != $SExStr[$ii])){
$tsstr .= $SExStr[$ii];
$i ++;
      }
if($SExStr[$ii] == false){
break;
      }
    }

$tsstr = str_replace(array('A', 'E', 'H', 'I', 'O', 'U', 'W', 'Y'), "", $tsstr);
$tsstr = str_replace(array('B', 'F', 'P', 'V'), "1", $tsstr);
$tsstr = str_replace(array('C', 'G', 'J', 'K', 'Q', 'S', 'X', 'Z', '?'), "2", $tsstr);
$tsstr = str_replace(array('D', 'T'), "3", $tsstr);
$tsstr = str_replace(array('L'), "4", $tsstr);
$tsstr = str_replace(array('M', 'N', '?'), "5", $tsstr);
$tsstr = str_replace(array('R'), "6", $tsstr);

while($iii < 3){
if($tsstr[$iii] != false){
$ttsstr .= $tsstr[$iii];
} else {
$ttsstr .= "0";
    }
$iii ++;
  }
$ttsstr .= $syl;
print $ttsstr; 

}

上移

下移

匿名 ¶

22 年前

一种更简单的方法来执行上述搜索，就是简单地在字符串前面添加任何字母，然后进行比较。

例如，Klancy => LKlancy
Clancy => LClancy

上移

下移

匿名 ¶

19 年前

由于第一个字母包含在输出的音标表示中，因此值得指出，如果您希望 Soundex 键能够解决 Klancy 和 Clancy 发音不同的问题，请从第一个字母开始获取子字符串，因为第一个字母是单词的主要辅音，数值是单词音标结构的数值。

上移

下移

pee whitt at dental dot ufl dor edu ¶

21 年前

fie at myrealbox dot com-

关于您的 Soundex 音节请求 - 我认为计算单词中元音簇的数量将导致准确的音节计数。因此，不需要 Soundex 特性，只需遍历单词中的字符，并在每次从元音到辅音时递增音节计数。

使用此逻辑，此句子分类如下。
2 1 2 1 1 (3) (0) (4) (0) 2

其中 (#) 标记了一个被错误分类的单词。我相信使用一点思考，人们可以找出导致准确计数的那些情况下的逻辑。计算元音到辅音的变化将产生 -
(1) 1 2 1 2 1 (4) 1 2

取两种类型的平均值，然后取上限，可以修复大多数错误。

上移

下移

crchafer-php at c2se dot com ¶

19 年前

可以重写，但是该算法有一些明显的
优化可以进行，例如...

function text__soundex( $text ) {
$k = ' 123 12 22455 12623 1 2 2';
$nl = strlen( $tN = strtoupper( $text ) );
$p = trim( $k{ ord( $tS = $tN{0} ) - 65 } );
for( $n = 1; $n < $nl; ++$n )
if( ( $l = trim( $k{ ord( $tN{ $n } ) - 65 } ) ) != $p )
$tS .= ( $p = $l );
return substr( $tS . '000', 0, 4 );
        }

// 注释
// $k 是 $key，本质上是 $SoundKey 的反转
// $tN 是要优化的文本的大写形式
// $tS 是部分生成的输出
// $l 是当前字母，$p 是前一个字母
// $n 和 $nl 是迭代索引
// 65 是 ord('A')，预先计算以提高速度
// 不支持非 ASCII 字母
// 注意括号，这里有很多混合使用

（代码仅经过基本测试，尽管它似乎
与 PHP 的 soundex() 的输出匹配，速度未经测试 -
尽管由于删除了大多数循环和比较，这应该比 a4_perfect 的
重写速度快得多。）

C
2005-09-13

上移

下移

Marc Quinton. ¶

19 年前

一个法语 Soundex 版本；可用于 Soundex 缺乏的其他外语。也许可以编写一个包含每种语言特性的类。

http://www.php-help.net/sources-php/a.french.adapted.soundex.289.html

上移

下移

快捷方式 ¶

18 年前

关于 Soundex 是否除了 Klancy 与 Clancy 的第一个字母之外都能正常工作的问题的答案是始终使用相同的字母作为单词的前缀。

aklancy 将匹配 aclancy
bklancy 将匹配 bclancy

Soundex 似乎只检查前 2 个音节。??
例如：spectacular 匹配 spectacle

如果您依赖 Soundex，这只是一个想法。

k-

上移

下移

jr ¶

21 年前

解决 MySQL/PHP 中 Soundex 实现差异的解决方法是在 MySQL 中完全执行 Soundex 比较。

例如
$sql = "SELECT * FROM table WHERE substring(soundex(field), 1, 4) = substring(soundex('".$wordsearch."'), 1, 4)";

上移

下移

-1

info at nederlandsch dot net ¶

21 年前

MySQL Soundex (3.23.49) 根本不检查第一个字符以查看是否应该跳过它。因此，荷兰海牙（该国政府所在地）的名称“'s-Gravenhage”在 MySQL 中的 Soundex 值为“261”，在 PHP 中为 S615。

＋添加注释