The Open Data Sample project wants to provide various sample data in an organized way. The project includes several text/plain files with data in many languages. The project follows some specification standards to assure the overall quality.
- The file name must be lowercase;
- the file must be a plain text file with .txt extension;
- every row in the file must be separated by new line;
- the file cannot contain a blank line inside;
- the folder's name must respect the RFC 4647 and RFC 5646 specification with the following notation:
language-region
It's released under the Open Data Commons Open Database License v1.0
Contributions are very welcome but, there are some rules:
- Don't submit data which can be easily created by most of the programming languages (i.e. day of the week, numbers, dates, ...).
- Make one pull request for data type. For instance, if you want to add cities and animals please make 2 pull requests.
- Keep one word per row.
- Valid character are letters of the alphabet and new lines.
- Don't use synonyms or words which are almost equivalent (i.e. cat, kitten).
- Submit as much data as you can. A submissions with less than 20 rows could be declined.
- Avoid the use of headers in the files. For instance, the first row of a city file cannot be "cities".
The project has some tools to simplify the file creation:
- check (bin/check dataFolder) validates the standard rules.
- sort (bin/sort filename.txt) outputs an alphabetically sorted list without duplications.