Skip to content

General Levenshtein algorithm and k-bounded levenshtein distance in linear time and constant space. Implementation in C as a MySQL UDF

License

Notifications You must be signed in to change notification settings

PKiman/Levenshtein-MySQL-UDF

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MySQL UDF functions implemented in C for

  • General Levenshtein algorithm
  • k-bounded Levenshtein distance algorithm (linear time, constant space),
  • Levenshtein ratio (syntactic sugar for: levenshtein_ratio(s, t) = 1 - levenshtein(s, t) / max(s.length, t.length))
  • k-bounded Levenshtein ratio
  • Fuzzy search with levensthein case sensitive
  • Fuzzy search with levensthein case insensitive
  • Fuzzy search with damerau-levensthein case sensitive
  • Fuzzy search with damerau-levensthein case insensitive
  • native C unit testing

How to compile?

make && make test

Compile with another location of mysql include folder use:

MYSQL_CFLAGS="-I/opt/local/include/mysql57/mysql/" && make && make test

How to install?

Find out the plugin_dir of your MySQL server:

mysql> SHOW GLOBAL VARIABLES LIKE 'plugin_dir';
+---------------+--------------------------------+
| Variable_name | Value                          |
+---------------+--------------------------------+
| plugin_dir    | /opt/local/lib/mysql56/plugin/ |
+---------------+--------------------------------+
1 row in set (0.00 sec)

Copy the compiled similarities.so (linux/mac) or similarities.dll (windows) to the plugin_dir

Register the udf functions in MySQL which you need:

CREATE FUNCTION levenshtein RETURNS INT SONAME 'similarities.so';
CREATE FUNCTION levenshtein_k RETURNS INT SONAME 'similarities.so';
CREATE FUNCTION levenshtein_ratio RETURNS REAL SONAME 'similarities.so';
CREATE FUNCTION levenshtein_k_ratio RETURNS REAL SONAME 'similarities.so';
CREATE FUNCTION levenshtein_substring_k RETURNS INT SONAME 'similarities.so';
CREATE FUNCTION levenshtein_substring_ci_k RETURNS INT SONAME 'similarities.so';
CREATE FUNCTION damerau RETURNS INT SONAME 'similarities.so';
CREATE FUNCTION damerau_substring RETURNS INT SONAME 'similarities.so';
CREATE FUNCTION damerau_substring_ci RETURNS INT SONAME 'similarities.so';

How to uninstall?

DROP FUNCTION levenshtein;
DROP FUNCTION levenshtein_k';
DROP FUNCTION levenshtein_ratio;
DROP FUNCTION levenshtein_k_ratio;
DROP FUNCTION levenshtein_substring_k;
DROP FUNCTION levenshtein_substring_ci_k;
DROP FUNCTION damerau;
DROP FUNCTION damerau_substring;
DROP FUNCTION damerau_substring_ci;

How to use?

Levenshtein Distance

mysql> SELECT LEVENSHTEIN("Levenhstein", "Levenshtein") AS distance;
+----------+
| distance |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

Levenshtein-Damerau Distance

mysql> SELECT DAMERAU("Levenhstein", "Levenshtein") AS distance;
+----------+
| distance |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

Levenshtein Fuzzy-Search Distance

mysql> SELECT LEVENSHTEIN_SUBSTRING_K("Levenhstein", "This is a long string Levenshtein", 255) AS distance;
+----------+
| distance |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

Levenshtein Fuzzy-Search Distance Case Insensitive

mysql> SELECT LEVENSHTEIN_SUBSTRING_CI_K("levenhstein", "This is a long string Levenshtein", 255) AS distance;
+----------+
| distance |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

Levenshtein-Damerau Fuzzy-Search Distance

mysql> SELECT DAMERAU_SUBSTRING("Levenhstein", "This is a long string Levenshtein") AS distance;
+----------+
| distance |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

Levenshtein-Damerau Fuzzy-Search Distance Case Insensitive

mysql> SELECT DAMERAU_SUBSTRING_CI("levenhstein", "This is a long string Levenshtein") AS distance;
+----------+
| distance |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

About

General Levenshtein algorithm and k-bounded levenshtein distance in linear time and constant space. Implementation in C as a MySQL UDF

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 98.1%
  • Makefile 1.9%