LVL Software Development Guidelines

This repository contains the software development guidelines for LVL.
Center for Analysis and Design of Intelligent Agents, Language and Voice Lab
Reykjavik University

1. About the guide
2. Project Structure
- 2.1. Template README.md
3. License
4. Documentation
5. Version control
6. Testing
7. Supporting tools
7.2. Style
- 7.2.1. Python
8. Continuous integration
- 8.1. Github Actions
  - 8.1.1. Further information
9. Packaging / Releasing
10. Examples
11. Other resources
12. License

Table of contents generated with markdown-toc

1. About the guide

1.1. Goals

These guidelines should help us achieve the following goals:

Make our lives easier by not having to reinvent the wheel in each of our induvidual projects and find good resources and guides which we are all aware of.
Help us write quality software by following good software development proceedures.
Enable collaboration by having collaboration as a part of our workflow.

1.2. TLDR

The guide can be summarized as the following:

Have a good README.md, see our template.
Provide a License.
Write documentation within the code so that you can automatically generate it later (if you need to).
Use git for semantic versioning.
Test your code, preferably automatically.
Setup "GitHub actions" to run testing and linting.
Check out our examples.

1.3. Other resources

Compute for information about the cluster (called Terra) at LVL.
SÍM guidelines for writing and delivering software.
Icelandic NLP resources to get an overview of existing icelandic resources.
Máltækni fyrir íslensku - (in Icelandic) overview of language resources from the Language Technology for Icelandic 2019-2023

1.4. Deliverables

The SÍM guidelines define the deliverable types APP, MOD, ADD-ON, WEB and RES.

These types are quite abstract and LVL does not deliver all of them. Due to this we further break down these deliverables to offer more concrete guidelines:

UI (webpage, mobile, browser-plugin) = WEB, APP or ADD-ON
Server (restful webserver) = WEB
Library (python package) = MOD
Model (trained model, runnable trained model) = RES, APP or MOD
Command line client (python package, bash) = APP, MOD

Translating from the SÍM requirements to these deliverables will need to be discussed with a SÍM project manager to answer the question of "what do they expect?".

1.5. Contributors

To contribute to this project please submit a pull request to the master branch and request a review from someone in the software guidelines team. If you have a lot of suggestions feel free to make multiple requests.

1.5.1. Notes for writing the guide

This guide should be short and simple and mention what is required or optional
Offer examples
Add links to good resources

1.5.2. Maintainers

Gunnar Thor Örnólfsson [email protected]
Judy Yum Fong [email protected]
Safa Jemai
Staffan J. S. Hedström [email protected]
Smári Freyr Guðmundsson [email protected]
Þorsteinn Daði Gunnarsson [email protected]

1.6. Dictionary

Word	Meaning
ADD-ON	a plugin to a larger framework
APP	stand-alone application
CD	Continuous Deployment / Continuous Delivery
CI	Continuous Integration
MOD	a module which can be embedded into other applications
RES	language resource
IDE	Integrated development environment

back to TOC

2. Project Structure

Try to maintain a standardized project structure across your projects, as they always need to contain certain things. We suggest the following:

README.md  # See more information below
LICENSE    # See license section
docs/      # Contains automatically generated documentation in HTML.

Other scripts such as .py and .sh files should be in the root folder.

2.1. Template README.md

A readme.md template can be found here.

back to TOC

3. License

All of the projects we work on need to have licenses. A single LICENSE file, mentioned and linked from the README, suffices. Adding a license header in every single file is optional.

If a project does not contain a license, the default copyright is in place which means that no-one is allowed to make derivative work with your code.

We want our code to be freely available for everyone so we prefer permissive licenses such as

Apache 2.0 (quite permissive)
MIT License (very permissive)
CC BY 4.0 (for resources)

For more help choosing a license, see Choosing a license. Keep in mind that if you are working through SÍM you have agreed to use open licenses such as these. Furthermore, according to RU are the copyright owner.

When a license has been chosen, simply copy the license text, fill in any additional information required (like copyright owner and year) and write it to a LICENCE file in the repository.

back to TOC

4. Documentation

We all know that documentation is key if software is supposed to be usable. Writing documentation can be hard but we have some suggestions to make it easier, but keep in mind that a good #Running section the README.md is complementary to the documentation, i.e. you need to write both.

The #Running section handles common use-cases. This can be seen as a small user guide.
Write the documentation for functions and classes in your code using your language's conventions since all (common) languages have tools which can extract this documentation. This documentation is more detailed than the section but can also contain the same examples.

4.1. Code

For existing repositories, follow the coding conventions that are already in the repository.
Use easy to understand variables and function names, that is avoid ambiguous names.
Settle on a single format for the documentation. This is for automatic documentation generation.
When you have settled on a format, find a tool which supports that format which can generate HTML and place it in the docs/ folder.
Write documentation in English, unless you have a good reason not to do so.
Be sure to define accepted inputs and return values.

4.1.1. Python

We suggest the following format and tool for Python

Format: Google. It's easily readable in code, widely support and not verbose.
Tool: Sphinx with napoleon

4.2. Publishing documentation

Given that you have generated HTML pages using your documentation tool, this documentation can be hosted on GitHub using github-pages. This allows us to host a website directory from our code respositories. This avoids us having to host the documentation in some remote server.

4.3. Developer documentation

For larger projects a developer guide can be helpful for newcomers. These contain installation guides and contribution guidelines or any information that would help a developer use and modify this codebase faster.

4.4. User guide

User guides are (generally) not as technical as documentation and are thought for the end-user. For larger projects which require a user guide the guide should be in the language of the user. These guides should contain a lot of examples and lead the user through common-use cases. For example:

This could be implemented as a tutorial within a webpage with popups that guide the user through the UI.
This could be a video guide which go through the same steps as above.
This could be implemented as a Wiki on GitHub.
This could be implemented as a long #Running section.
If working with the command-line your program should support --help / -h commands which should offer some text explaining what the script does, examples and description of parameters. For Python argparse and click make it easy to get this functionality.

back to TOC

5. Version control

Within Cadia-LVL, using git for version control is required. A clean repository with descriptive comments makes for a good representation of your project which makes it easier for new developers to join it. The following is a way to maintain a project as such.

5.1. Workflow

We use the GitHub flow workflow. We further clarify how we use this workflow in the next sections.

All merges to the master branch should include semantic version tags as listed below.

Here is a short guide

5.1.1. Branching

In short, the master branch should always contain production ready code. To achieve this, create "feature branches" from the master branch. In the feature branch you develop your changes. When the work is done, create/open a "Pull Request" in GitHub.

5.1.2. Pull requests

When you have finished working on the feature branch you should create a pull request (PR) to the master branch. Assign someone other than yourself to review the pull request (the code). If you request more than one reviewer, the default standard at Cadia-LVL is that only one person must review and approve the PR. The reviewer is responsible for making sure that everything is (within limits) tested and documented. If the reviewer has issues with the pull request (very common) the reviewer requests changes and repeats this process until they are satisfied. When there are no more issues, the reviewer approves the pull request and merges it into the main branch.

To clearly state the benefits of this, if enforced:

No-one works in isolation, more people understand the project.
Code quality generally increases.
More tests are written.
The project is well documented.

For more information about pull requests see here

5.2. Commits

Every commit message should include a short description of work being commited. Pull request comments should be more detailed.

5.3. Semantic versioning

Version tags can be informative, especially to current users. Given a version number v[major].[minor].[patch], increments represent the following:

Major: Incompatible API changes, when you break something
Minor: Added backwards compatible functionality, when you add features
Patch: Backwards compatible fixes, when you fix bugs

5.4. Further information

Read more about semantic versioning and how to use tags to apply semantic versioning.
Developers and maintainers of a project should always Watch their repos to be notified of all issues and pull requests created.

back to TOC

6. Testing

How do you know that your code works? No-one writes code that is without bugs, and even though you could, testing the code might provide useful feedback (design, userability, architecture, etc.). We suggest testing your code often and thouroughly. Since doing that "manually" can be dull and cumbersome, we further suggest making these tests automatic.

6.1. Unit tests

Unit tests are conceptually the smallest tests possible, they can be thought of as "function tests", i.e. testing wether the function works. All projects should attempt to do as much unit testing as possible. At minimum aim to have tests for mission critical algorithms and functions. Every programming language has a unit testing framework, google it.

Python: pytest

6.2. Web

A huge part of the web is portability and accessibility, so making sure your website works on all major devices and platforms is very important.

6.2.1. Browsers

When testing your website keep in mind the demographic that will be using the site, what devices they are using (mobile vs. desktop), and which browsers. Preferebly test each deplpoyment on these devices and browsers. You can see browsers usage statistics here to decide which browsers to test.

Major browsers on desktop are:

Chrome
Firefox
Safari
Edge

Major browsers for mobile are:

Chrome
Safari
Samsung Internet

Caniuse.com is helpful if you are unsure about browser support for specific features.

If for some specific and clear reason a particular browser is needed make sure it is clearly stated on the page. You can use the User-Agent HTTP header to display warnings to all unsupported browsers.

6.2.2. Deployment strategy

Keep two entirely separate deployments up and running, staging and production. Deploy all changes first to the staging environment (this can be done automatically).

Preferably have multiple users test all major changes on the staging deployment before deploying to production. This part could be done automatically but is often time consuming and difficlut to keep up to date. Suggested tools for automated testing: Selenium with Python and Cucumber

6.3. User testing

User testing is a great way to see how the users interact with your product. However, user testing is difficult to do well. When doing user tests keep the following points in mind.

User tests should only be used if you have planned time to make adjustments
- There is no reason to spend time user testing if the results won't be used to make adjustments and improve the product.
User testing should focus on a specific task
- Focusing on a specific scope or task helps to keep the user focusing on what is important. This does not mean the same user can not test different tasks, only that each task you give the users should be clearly defined.
Up to 4 people for tests
- Asking too many users to do the same tests will only result in more of the same responses. Every users experience matters and should be taken into account. If you have more users to test, think about trying to test different things or do another round when adjustments have been made from the first one.
Testing prototypes often gives no useful information
- Users will most likely point out things that you know are missing and are less likely to give the feedback you are looking for. This can be reduced somewhat by targetting very specific tasks early in the development, see point about specific tasks.

back to TOC

7. Supporting tools

When writing code, additional tools can help you find bugs, make your code look consistent and generally help you write better code.

7.1. Linters

Linters perform static analysis on code without running it. They alert you of possible errors, missing documentation or overly complex code without writing tests. They are easily incorporated into the continuous integration system and should be run on "Push" to notify of errors. Using (some) linters is required in all projects. Adding linters to your IDE is very easy and can be configured to run on file-save.

7.1.1. Python

For Python we suggest the following tools for all projects.

Common errors: flake8
Type checks: mypy
Documentation checker: pydocstyle

7.1.2. JavaScript

Best practices and errors: eslint

7.1.3. Bash

Best practices and errors: shellcheck

7.1.4. CSS

Best practices and errors: stylelint

7.2. Style

Maintaining the same style across a project makes the code more readable. We suggest using a tool which automatically formats/styles your code. This eliminates inconsistent styles in the code and allows you to focus on the rather than the format.

7.2.1. Python

We highly suggest black

8. Continuous integration

Continuous integration is a way to build software (compile), automatically run tests, lint, check documentation and alert developers when something is wrong. We suggest running all these steps automatically in the CI system. We suggest using Github Actions as a CI system for all your projects.

8.1. Github Actions

Simply navigate to your repository and click the Actions on the right side of the Pull request button. Here you can choose to use workflows from others or click set up a workflow yourself to create your own. This defines the steps you would like to perform when you push changes.

Your workflow should run at minimum each time a push or a pull request is made to the master. It is also recomended to run a scheduled workflow once a day, in-case a dependency of your project is updated and breaks your code.

An example workflow can be found here

8.1.1. Further information

back to TOC

9. Packaging / Releasing

When a project should be released try to:

Package it so that it can be easily used by other people.
Make sure that it includes all the main parts mentioned in the 2.2. TLDR section.
Make sure that all changes have been committed and a proper README file is in place so that the release process can be repeated by someone else.

9.1. Versioning

All deliverables should be referenced with a git tag (f.ex. a version). This is done so that the code of a deliverable can be easily reviewed and for easy recreation.

Such as for milestone 4, add the m4 tag. Here's an example for the samromur-asr repo.

9.2. Python

We suggest using poetry for dependency management, building and packaging.

9.3. Docker

For more complex deliverables (which have many dependencies) we also recommend packaging the deliverable using docker.

back to TOC

10. Examples

An example of good documentation is provided: kaldi-asr

11. Other resources

Web Technologies reference
bash help usage example
Better Programming has a short list of Bash Best Practices
Explains the Kaldi folder structure: kaldi-for-dummies

12. License

CC BY 4.0

back to TOC

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
SIM_software_development_standards.pdf		SIM_software_development_standards.pdf
readme.md		readme.md
readme_template.md		readme_template.md
workflow_example.md		workflow_example.md

cadia-lvl/SoftwareDevelopmentGuidelines

Folders and files

Latest commit

History

Repository files navigation

LVL Software Development Guidelines

Table of Contents

1. About the guide

1.1. Goals

1.2. TLDR

1.3. Other resources

1.4. Deliverables

1.5. Contributors

1.5.1. Notes for writing the guide

1.5.2. Maintainers

1.6. Dictionary

2. Project Structure

2.1. Template README.md

3. License

4. Documentation

4.1. Code

4.1.1. Python

4.2. Publishing documentation

4.3. Developer documentation

4.4. User guide

5. Version control

5.1. Workflow

5.1.1. Branching

5.1.2. Pull requests

5.2. Commits

5.3. Semantic versioning

5.4. Further information

6. Testing

6.1. Unit tests

6.2. Web

6.2.1. Browsers

6.2.2. Deployment strategy

6.3. User testing

7. Supporting tools

7.1. Linters

7.1.1. Python

7.1.2. JavaScript

7.1.3. Bash

7.1.4. CSS

7.2. Style

7.2.1. Python

8. Continuous integration

8.1. Github Actions

8.1.1. Further information

9. Packaging / Releasing

9.1. Versioning

9.2. Python

9.3. Docker

10. Examples

11. Other resources

12. License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Packages