This repository has been archived by the owner on Nov 17, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
dfdb8bd
commit 0f934c1
Showing
2 changed files
with
87 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,86 @@ | ||
# IPA Validator | ||
Composer package for validating and normalizing IPA | ||
# Composer package for validating and normalizing IPA | ||
[![PHP](https://github.com/theresnotime/php-ipa-validator/actions/workflows/PHP.yml/badge.svg)](https://github.com/theresnotime/php-ipa-validator/actions/workflows/PHP.yml) | ||
|
||
|
||
## Installing | ||
Go grab it on [packagist](https://packagist.org/packages/theresnotime/ipa-validator) via | ||
``` | ||
composer require theresnotime/ipa-validator | ||
``` | ||
## Basic usage | ||
```php | ||
// Load composer's autoloader | ||
require_once __DIR__ . '/vendor/autoload.php'; | ||
|
||
// Load the validator | ||
use TheresNoTime\IPAValidator\Validator; | ||
|
||
/* | ||
* Create a new validator with the options: | ||
* - Remove delimiters (defaults to true) | ||
* - Normalize IPA (defaults to false) | ||
* - Normalize to Google TTS standard (defaults to false) | ||
*/ | ||
$validator = new Validator( '/pʰə̥ˈkj̊uːliɚ/', true, true, true ); | ||
|
||
// Check if the IPA is valid | ||
echo $validator->valid; # true | ||
|
||
// Get the normalized IPA | ||
echo $validator->normalizedIPA; # phəˈkjuːliɚ | ||
|
||
// Get the original IPA | ||
echo $validator->originalIPA; # /pʰə̥ˈkj̊uːliɚ/ | ||
``` | ||
|
||
## Options | ||
When constructing a new `Validator`, you can set the following options: | ||
```php | ||
/** | ||
* Constructor | ||
* | ||
* @param string $ipa IPA to validate | ||
* @param bool $strip Remove delimiters | ||
* @param bool $normalize Normalize IPA | ||
* @param bool $google Normalize IPA for Google TTS | ||
* @return void | ||
*/ | ||
public function __construct( $ipa, $strip = true, $normalize = false, $google = false ) | ||
``` | ||
|
||
### Remove delimiters | ||
This option will remove *some* delimiters from the IPA — currently `/.../` and `[...]` | ||
|
||
|
||
### Normalize IPA | ||
When `$google` is `false`, this option will normalize the IPA and remove commonly mistaken unicode characters (for example, using `:` instead of `ː` in a word such as `tenoːt͡ʃˈtit͡ɬan`). | ||
|
||
### Normalize IPA for Google TTS | ||
As part of a work project, we're feeding IPA to Google's TTS engine — Google is a little opinionated about things like diacritics. | ||
For example, the IPA `ˈɔːfɫ̩` would not render correctly in Google TTS. A custom charmap is used to normalize certain characters: | ||
```php | ||
$charmap = [ | ||
[ "'", 'ˈ' ], | ||
[ ':', 'ː' ], | ||
[ ',', 'ˌ' ], | ||
[ 'ⁿ', 'n' ], // 207F | ||
[ 'ʰ', 'h' ], // 02B0 | ||
[ 'ɫ', 'l' ], // 026B | ||
[ 'ˡ', 'l' ], // 02E1 | ||
[ 'ʲ', 'j' ], // 02B2 | ||
]; | ||
``` | ||
Setting `$google` to `true` also removes all diacritics from the IPA string. | ||
|
||
## Developing | ||
1. [Fork n' clone](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) this repo | ||
2. Run `composer install` | ||
3. Run `composer test` because who knows, maybe its already broken | ||
4. Hack! | ||
|
||
## The Regex | ||
``` | ||
^[().a-z|æçðøħŋœǀ-ǃɐ-ɻɽɾʀ-ʄʈ-ʒʔʕʘʙʛ-ʝʟʡʢʰʲʷʼˀˈˌːˑ˞ˠˡˤ-˩̴̘̙̜̝̞̟̠̤̥̩̪̬̯̰̹̺̻̼̀́̂̃̄̆̈̊̋̌̏̽̚͜͡βθχ᷄᷅᷈‖‿ⁿⱱ]+$ | ||
``` | ||
|
||
I've also placed it at https://regex101.com/r/f2Qhuk if you think you can improve it... (**please do**!) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters