-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* data parser * try to release the data-parser * add working dir to gh actions * fix idiotic misstake * last fix * update gitignore * label multiple choice options for enablin data parsing of cross-language courses. * parser parses options with separate labels correctly * tryinh to fix actions uploads * try asterix for path * this cannot be the fix * clean up the workflow file * update system tests * user manual for parser binary * wrapping up the docs * trying to sign the macos version in gh actions * fixesw to workflow file * fix mistakes in yml * try if the mac version will fix it * try with one more typo fix * change mac version to latest after all
- Loading branch information
Showing
25 changed files
with
1,127 additions
and
80 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
name: Parser-Release | ||
|
||
on: | ||
push: | ||
branches: [ "parser-release" ] | ||
pull_request: | ||
branches: [ "parser-release" ] | ||
|
||
|
||
jobs: | ||
build: | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
python-version: ["3.10"] | ||
poetry-version: ["1.4.2"] | ||
os: [macos-latest, ubuntu-latest, windows-latest] | ||
runs-on: ${{ matrix.os }} | ||
defaults: | ||
run: | ||
working-directory: ./data-parser | ||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: actions/setup-python@v4 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
- name: Run image | ||
uses: abatilo/actions-poetry@v2 | ||
with: | ||
poetry-version: ${{ matrix.poetry-version }} | ||
- name: View poetry --help | ||
run: poetry --help | ||
|
||
- name: Install dependencies | ||
run: poetry install | ||
|
||
- name: install pyinstaller | ||
run: poetry run pip install -U pyinstaller | ||
- name: Build execution file | ||
run: poetry run pyinstaller main.py --onefile | ||
|
||
- name: Sign the macos-version build | ||
if: ${{ matrix.os == 'macos-latest' }} | ||
run: codesign --force -s - ./dist/main | ||
|
||
- name: Rename built binary | ||
run: poetry run mv ./dist/main ./dist/main-${{ matrix.os }} | ||
|
||
- name: Store built binary | ||
uses: actions/upload-artifact@v3 | ||
with: | ||
name: parser-binary | ||
path: data-parser/dist | ||
retention-days: 5 | ||
|
||
release: | ||
needs: build | ||
runs-on: ubuntu-latest | ||
defaults: | ||
run: | ||
working-directory: ./data-parser | ||
steps: | ||
|
||
- name: Download built binary | ||
uses: actions/download-artifact@v3 | ||
with: | ||
name: parser-binary | ||
path: ./dist/ | ||
|
||
- name: Release | ||
uses: softprops/action-gh-release@v1 | ||
with: | ||
files: ./dist/* | ||
name: parser-binary | ||
tag_name: release | ||
permissions: | ||
contents: write | ||
|
||
|
||
# TODO: | ||
# codesign --force -s - target/release/tmc-langs-cli for the mac release | ||
# add this mac sign to gh actions and run the compilation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
cover/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
.pybuilder/ | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
# For a library or package, you might want to ignore these files since the code is | ||
# intended to run in multiple environments; otherwise, check them in: | ||
# .python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# poetry | ||
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. | ||
# This is especially recommended for binary packages to ensure reproducibility, and is more | ||
# commonly ignored for libraries. | ||
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control | ||
#poetry.lock | ||
|
||
# pdm | ||
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. | ||
#pdm.lock | ||
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it | ||
# in version control. | ||
# https://pdm.fming.dev/#use-with-ide | ||
.pdm.toml | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# pytype static type analyzer | ||
.pytype/ | ||
|
||
# Cython debug symbols | ||
cython_debug/ | ||
|
||
# PyCharm | ||
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can | ||
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore | ||
# and can be added to the global gitignore or merged into this file. For a more nuclear | ||
# option (not recommended) you can uncomment the following to ignore the entire idea folder. | ||
#.idea/ | ||
|
||
# Never want to upload the data being used to the VC | ||
data | ||
# Never want to upload the data been parsed either | ||
*outputs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
# Parsing the collected submissions on courses.mooc.fi | ||
|
||
The output of the data-parser is a .csv file containing only answers to the `DOGS FACTORIAL ANALYSIS SURVEY` exercise types. The file will contain answers submitted **after** 22.05.2023 due to the latest format. The separator used in the .csv file is the semicolon `;`. | ||
|
||
## Dataset layout | ||
|
||
The file contains columns `user_id, name, email`, followed by a column per `questionLabel` existing in the course. Empty submissions (not answered questions) have empty entry-points. | ||
|
||
## Multiple-choice questions | ||
|
||
An exception to the above format are the multiple-choice questions. These questions are represented in the dataset as `"questionLabel option"` column per option that may be selected. The user answer is then represented as 1 for chosen option, 0 for not chosen option. If the user has not answered the given question at all, the fields are empty (null). | ||
|
||
For submissions being collected across different _language versions_ it is adviced to `label` the multiple-choice options in the same manner as the questions. This allows easier combining of datasets from the different language courses, having the same column headers. The format is `label ; option text` where the text on the left-hand side of the semicolon `;` is used as the column header in the resulting dataset, while the text on the right is what is shown to the survey user. Only the first semicolon will be used as a separator, meaning the option text may contain arbitrary amount of semicolons if needed. In case no semicolon is found the full option text is used as the column header. | ||
|
||
## Using the parser | ||
|
||
In order to parse the collected submissions you need to download the files from the main course management page on courses.mooc.fi. The links to download the files are shown at the bottom of the picture | ||
|
||
<img src="../docs/imgs/data-parser/Download-files.png" height=500> | ||
|
||
The csv file for course instances is not used in the process and may be skipped. The needed files are: | ||
|
||
- submissions | ||
- user details | ||
- exercise-tasks | ||
|
||
To download the data-parser go to github release page https://github.com/rage/factor-analysis-exercise-service/releases/tag/release and choose the execution file for your operative system. | ||
|
||
The parser expects folder named `data` to contain the downloaded .csv files and being located in the same folder as it self. This is the directory structure: | ||
|
||
<img src="../docs/imgs/data-parser/dir-struct.png" width=500> | ||
|
||
where the green `main` is the executable program in question (will probably be called `main-[name of you os]-latest`). | ||
The parser will use the latest versions of the .csv files if there are several versions available in the `data` folder as in the above example. | ||
|
||
Run the parser with | ||
|
||
> `./name-of-executable` | ||
from the directory. The parser will create a `parsed-outputs` folder with the resulting .csv file: | ||
|
||
<img src="../docs/imgs/data-parser/dir-with-output-dir.png" width=500> | ||
|
||
## Executing on Cubbli machine using VMware Horizen Client from your browser | ||
|
||
Go to https://vdi.helsinki.fi/. Choose `WMware Horizon HTML Access`: | ||
|
||
![vdi.helsinki](../docs/imgs/data-parser/access-vdi.png) | ||
|
||
Sign in with you `University of Helsinki` credentials. | ||
|
||
Choose the `Cubbli Linux` desktop: | ||
|
||
![Cubbli Linux desktop](../docs/imgs/data-parser/choose-os.png) | ||
|
||
Download the files and and the executable as explained above. | ||
|
||
> Open a browser in the VMware Client in you browser, remember you are accessing your helsinki Cubbli desktop through your bowser. Your keyboard may also be different layout than you are used to. Search for `Keyboard` in the menu and change the `Layout` to the wanted one. (For Finnish Layout you may also just run the command `setxkbmap fi` in the Konsole) | ||
Choose the `main-ubuntu-latest` executable from the github release page: | ||
|
||
![ubuntu executable](../docs/imgs/data-parser/binary-download.png) | ||
|
||
Open up a `Konsole` (search for `Konsole` in the menu). Create a new folder where you are going to work with your files. Move the executable file to the folder. Additionally, create a subfolder named `data` and move all the downloaded .csv files there. In the `Konsole`, navigate to the folder with the executable file and the `data` folder using the `cd` (change directory) command: | ||
|
||
![navigate to the given directory](../docs/imgs/data-parser/dir-navigate.png) | ||
|
||
The folder in question is named `moocdata` here, you can see the name of the direcotry you are in as the last name before the $-sign. | ||
|
||
Executing the binary fila is done by running command | ||
|
||
> `./main-ubuntu-latest` | ||
![command flow](../docs/imgs/data-parser/control-flow.png) | ||
|
||
You may need to add exucution rights to the executable program: | ||
|
||
> `chmod +x main-ubuntu-latest` |
Oops, something went wrong.