Skip to content

Commit

Permalink
Parser release (#55)
Browse files Browse the repository at this point in the history
* data parser

* try to release the data-parser

* add working dir to gh actions

* fix idiotic misstake

* last fix

* update gitignore

* label multiple choice options for enablin data parsing of cross-language courses.

* parser parses options with separate labels correctly

* tryinh to fix actions uploads

* try asterix for path

* this cannot be the fix

* clean up the workflow file

* update system tests

* user manual for parser binary

* wrapping up the docs

* trying to sign the macos version in gh actions

* fixesw to workflow file

* fix mistakes in yml

* try if the mac version will fix it

* try with one more typo fix

* change mac version to latest after all
  • Loading branch information
anadis504 committed May 23, 2023
1 parent 3d91ff5 commit 1533431
Show file tree
Hide file tree
Showing 25 changed files with 1,127 additions and 80 deletions.
82 changes: 82 additions & 0 deletions .github/workflows/data-parser-release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
name: Parser-Release

on:
push:
branches: [ "parser-release" ]
pull_request:
branches: [ "parser-release" ]


jobs:
build:
strategy:
fail-fast: false
matrix:
python-version: ["3.10"]
poetry-version: ["1.4.2"]
os: [macos-latest, ubuntu-latest, windows-latest]
runs-on: ${{ matrix.os }}
defaults:
run:
working-directory: ./data-parser
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Run image
uses: abatilo/actions-poetry@v2
with:
poetry-version: ${{ matrix.poetry-version }}
- name: View poetry --help
run: poetry --help

- name: Install dependencies
run: poetry install

- name: install pyinstaller
run: poetry run pip install -U pyinstaller
- name: Build execution file
run: poetry run pyinstaller main.py --onefile

- name: Sign the macos-version build
if: ${{ matrix.os == 'macos-latest' }}
run: codesign --force -s - ./dist/main

- name: Rename built binary
run: poetry run mv ./dist/main ./dist/main-${{ matrix.os }}

- name: Store built binary
uses: actions/upload-artifact@v3
with:
name: parser-binary
path: data-parser/dist
retention-days: 5

release:
needs: build
runs-on: ubuntu-latest
defaults:
run:
working-directory: ./data-parser
steps:

- name: Download built binary
uses: actions/download-artifact@v3
with:
name: parser-binary
path: ./dist/

- name: Release
uses: softprops/action-gh-release@v1
with:
files: ./dist/*
name: parser-binary
tag_name: release
permissions:
contents: write


# TODO:
# codesign --force -s - target/release/tmc-langs-cli for the mac release
# add this mac sign to gh actions and run the compilation
16 changes: 14 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
The factor-analysis-exercise-service is used for creating survey exercises.
The factor-analysis-exercise-service is used for creating survey exercises.

## Using the service for creating survey exercises

Expand All @@ -10,4 +10,16 @@ Run `npm ci` in the repo root

Run the development server with `npm run dev` the server is running on `localhost:3008`

###
## Expose the service to locally running secret-project minikube cluster

using [ktunnel](https://github.com/omrikiei/ktunnel). Allows you to test exercises that actually need the database, such as testing the `global variables`.

Run `bin/ktunnel` from repo root.

The adress to use in minikube is:

`http://factorial-analysis.default.svc.cluster.local:80/api/service-info`

When stopping the cluster, the best is to stop the ktunnel exposure before killing the minikube cluster.

If the service is active when minikube is stopped remember to either delete the service from the cluster (see commands in the bin/ktunnel [file](./bin/ktunnel)) or to update the name of the service (factorial-analysis2 for example) which will again affect the local cluster-adress.
165 changes: 165 additions & 0 deletions data-parser/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# Never want to upload the data being used to the VC
data
# Never want to upload the data been parsed either
*outputs
78 changes: 78 additions & 0 deletions data-parser/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Parsing the collected submissions on courses.mooc.fi

The output of the data-parser is a .csv file containing only answers to the `DOGS FACTORIAL ANALYSIS SURVEY` exercise types. The file will contain answers submitted **after** 22.05.2023 due to the latest format. The separator used in the .csv file is the semicolon `;`.

## Dataset layout

The file contains columns `user_id, name, email`, followed by a column per `questionLabel` existing in the course. Empty submissions (not answered questions) have empty entry-points.

## Multiple-choice questions

An exception to the above format are the multiple-choice questions. These questions are represented in the dataset as `"questionLabel option"` column per option that may be selected. The user answer is then represented as 1 for chosen option, 0 for not chosen option. If the user has not answered the given question at all, the fields are empty (null).

For submissions being collected across different _language versions_ it is adviced to `label` the multiple-choice options in the same manner as the questions. This allows easier combining of datasets from the different language courses, having the same column headers. The format is `label ; option text` where the text on the left-hand side of the semicolon `;` is used as the column header in the resulting dataset, while the text on the right is what is shown to the survey user. Only the first semicolon will be used as a separator, meaning the option text may contain arbitrary amount of semicolons if needed. In case no semicolon is found the full option text is used as the column header.

## Using the parser

In order to parse the collected submissions you need to download the files from the main course management page on courses.mooc.fi. The links to download the files are shown at the bottom of the picture

<img src="../docs/imgs/data-parser/Download-files.png" height=500>

The csv file for course instances is not used in the process and may be skipped. The needed files are:

- submissions
- user details
- exercise-tasks

To download the data-parser go to github release page https://github.com/rage/factor-analysis-exercise-service/releases/tag/release and choose the execution file for your operative system.

The parser expects folder named `data` to contain the downloaded .csv files and being located in the same folder as it self. This is the directory structure:

<img src="../docs/imgs/data-parser/dir-struct.png" width=500>

where the green `main` is the executable program in question (will probably be called `main-[name of you os]-latest`).
The parser will use the latest versions of the .csv files if there are several versions available in the `data` folder as in the above example.

Run the parser with

> `./name-of-executable`
from the directory. The parser will create a `parsed-outputs` folder with the resulting .csv file:

<img src="../docs/imgs/data-parser/dir-with-output-dir.png" width=500>

## Executing on Cubbli machine using VMware Horizen Client from your browser

Go to https://vdi.helsinki.fi/. Choose `WMware Horizon HTML Access`:

![vdi.helsinki](../docs/imgs/data-parser/access-vdi.png)

Sign in with you `University of Helsinki` credentials.

Choose the `Cubbli Linux` desktop:

![Cubbli Linux desktop](../docs/imgs/data-parser/choose-os.png)

Download the files and and the executable as explained above.

> Open a browser in the VMware Client in you browser, remember you are accessing your helsinki Cubbli desktop through your bowser. Your keyboard may also be different layout than you are used to. Search for `Keyboard` in the menu and change the `Layout` to the wanted one. (For Finnish Layout you may also just run the command `setxkbmap fi` in the Konsole)
Choose the `main-ubuntu-latest` executable from the github release page:

![ubuntu executable](../docs/imgs/data-parser/binary-download.png)

Open up a `Konsole` (search for `Konsole` in the menu). Create a new folder where you are going to work with your files. Move the executable file to the folder. Additionally, create a subfolder named `data` and move all the downloaded .csv files there. In the `Konsole`, navigate to the folder with the executable file and the `data` folder using the `cd` (change directory) command:

![navigate to the given directory](../docs/imgs/data-parser/dir-navigate.png)

The folder in question is named `moocdata` here, you can see the name of the direcotry you are in as the last name before the $-sign.

Executing the binary fila is done by running command

> `./main-ubuntu-latest`
![command flow](../docs/imgs/data-parser/control-flow.png)

You may need to add exucution rights to the executable program:

> `chmod +x main-ubuntu-latest`
Loading

0 comments on commit 1533431

Please sign in to comment.