Recipes for the Unicode Cookbook

Overview

This directory contains recipes (use cases) that supplement The Unicode Cookbook for Linguists. Each recipe is in its own subdirectory:

Basics: basics of grapheme segmentation and text tokenization in the Python and R programming languages
ASJP: tokenize ASJP wordlists with the R
Dogon: tokenize the Dogon comparative wordlist and create an orthography profile in Python with Pandas
Dutch: create an orthography profile for tokenizing Dutch orthography with R
JIPA: tokenize text in the International Phonetic Alphabet (IPA) with Python or R

Installing Python `segments` package

To install the Python segments package from the Python Package Index (PyPI) run:

 pip install segments

on the command line. This will give you access to both the CLI and programmatic functionality in Python scripts, when you import the segments library.

You can also install the segments package with from the Github repository:

 git clone https://github.com/cldf/segments.git
 cd segments
 python setup.py develop

Installing R `qlcData` library

To install the qlcData library and accompanying data, install qlcData:

 install.packages("devtools")
 devtools::install_github("cysouw/qlcData", build_vignettes = T)

and then load the library:

 library(qlcData)

To access help, call:

help(qlcData)

To access the vignette, call:

vignette("orthography_processing")

Recipes

Each recipe contains a short use case with accompanying code. The directory structure is typically as follows:

|-- Recipe name
|    |-- recipe files
|    |-- data
|    |    └── orthography profiles
|    |-- sources
|    |    └── input data
|    |-- sandbox
|    |    └── where the output is written

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recipes for the Unicode Cookbook

Overview

Installing Python `segments` package

Installing R `qlcData` library

Recipes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
ASJP		ASJP
Basics		Basics
Dogon		Dogon
Dutch		Dutch
JIPA		JIPA
README.md		README.md

unicode-cookbook/recipes

Folders and files

Latest commit

History

Repository files navigation

Recipes for the Unicode Cookbook

Overview

Installing Python segments package

Installing R qlcData library

Recipes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Installing Python `segments` package

Installing R `qlcData` library

Packages