diff --git a/README.md b/README.md index daa9ee9..e677fa0 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,77 @@ -# parse-hamnosys -Sign language HamNoSys notation parsing tool. +![CI workflow badge](https://github.com/hearai/parse-hamnosys/workflows/CI-pipeline/badge.svg) ![Visits Badge](https://badges.pufler.dev/visits/hearai/parse-hamnosys) + +

+ +

+ +# HamNoSys Parser + +![parser schema](./imgs/schemat.JPG) + +The main goal of the parser is to translate a label representing a SL gloss, +written in HamNoSys format, into a form that can be used for DL-based classification. +We used a decision tree to decompose the notation into numerical multilabels +for the defined classes. +The parser analyzes a series of symbols and matches each symbol with the class +that describes it (while assigning it the appropriate number) or removes it. + +## Hamburg Sign Language Notation System + +The Hamburg Sign Language Notation System (HamNoSys) is a _phonetic_ +transcription system that has been widespread for more then 20 years. +HamNoSys does not refer to different national finger alphabets and +can therefore be used internationally. +It can be divided into six basic blocks, as presented +in the below Figure (upper panel). +The first two out of six blocks - symmetry operator and +non-manual features - are optional. The remaining four components - handshape, +hand position, hand location, and movement - are mandatory. +More detailed structure is also presented in the figure (lower panel). + +![HamNoSys structure](./imgs/HamNoSys_structure_detailed.JPG) + +### Font + +HamNoSys font is required to be installed, to properly visualise provided examples. +It can be downloaded directly from +[DSG Corpus](https://www.sign-lang.uni-hamburg.de/dgs-korpus/index.php/hamnosys-97.html) +website. + +## Parser classes + +In our implementation, four blocks (symmetry operator, location left/right, location top/bottom, distance from the body) refer to the overall human posture, while five blocks (handshape base form, handshape thumb position, handshape bending, hand position extended finger direction, and hand position palm orientation) relate to a single hand. + +List of blocks and their ranges: +* __Symmetry operator__ class consists of 9 symbols (0 to 8). +* __Handshape - Baseform__ class consists of 12 symbols (0 to 11). +* __Handshape - Thumb position__ class consists of 4 symbols (0 to 3). +* __Handshape - bending__ class consists of 6 symbols (0 to 5). +* __Handposition - Extended finger direction__ class consists of 18 symbols (0 to 17). +* __Handposition - Palm orientation__ class consists of 8 symbols (0 to 7). +* __Handposition - Left/Right__ class consists of 5 symbols (0 to 4). +* __Handposition - Top/Bottom__ class consists of 37 symbols (0 to 36). +* __Handposition - Distance__ class consists of 6 symbols (0 to 5). + +The figure presents the numerical values and assigned to them HamNoSys symbols. +![multilabel classes](./imgs/Classes_all_some_frames.JPG) ## Usage + +In its most basic form, parsing new annotation file boils down to: + ``` -$ python parse-hamnosys.py -sf -df +$ python parse-hamnosys.py -sf -df -ef ``` -## Usage example +In repositry we prepared some example files, which were produced by running: + ``` -$ python parse-hamnosys.py -sf hamnosys_example.txt -df hamnosys_parsed.txt +$ python parse-hamnosys.py -sf hamnosys_example.txt -df hamnosys_parsed.txt -ef error.txt ``` -## Source file format -Default input file format is defined by the [HearAI](https://github.com/hearai/hearai) project requirements. As in[hamnosys_example.txt](hamnosys_example.txt) parser requires a file that has 6 columns separated with a space sign " ". Due to this, if any of the columns contains space, it must be removed or replaced (for example with "_" sign) before passing to the parser. Parsers operates only on HamNoSys notation, that is stored in the last (6th) column. It shall start with HamNoSys sign (No quote nor apostrophe sign is allowed). +### Source file format + +Default input file format is defined by the [HearAI](https://github.com/hearai/hearai) project requirements. As in [hamnosys_example.txt](hamnosys_example.txt) parser requires a file that has 6 columns separated with a space sign " ". Due to this, if any of the columns contains space, it must be removed or replaced (for example with "_" sign) before passing to the parser. Parsers operates only on HamNoSys notation, that is stored in the last (6th) column. It shall start with HamNoSys sign (No quote nor apostrophe sign is allowed). Input file columns and description: * Name - name of a video file that given notation refers to * Start - sign start time (on a video) @@ -21,162 +80,60 @@ Input file columns and description: * Word - translation to a spoken language * Hamnosys - Notation -## Destination File format -Destination file consists of following columns separated by the space " " sign: +### Destination file format + +Default estination file consists of following columns separated by the space " " sign: * Name - name of a video file that given notation refers to, directly copied from source file * Start - sign start time (on a video), directly copied from source file * End - sign end time (on a vide), directly copied from source file * Symmetry operator - Number that represents one of the classes (please refer to hamnosys_dicts.txt), parsed from notation +* NonDom first - Used when notation starts with  sign * Dominant - Handshape - Baseform - Number that represents one of the classes (please refer to hamnosys_dicts.txt), parsed from notation * Dominant - Handshape - Thumb position - Number that represents one of the classes (please refer to hamnosys_dicts.txt), parsed from notation * Dominant - Handshape - bending - Number that represents one of the classes (please refer to hamnosys_dicts.txt), parsed from notation * Dominant - Handposition - extended finger direction - Number that represents one of the classes (please refer to hamnosys_dicts.txt), parsed from notation * Dominant - Handposition - palm orientation - Number that represents one of the classes (please refer to hamnosys_dicts.txt), parsed from notation -* Dominant - Handposition - LR - Number that represents one of the classes (please refer to hamnosys_dicts.txt), parsed from notation -* Dominant - Handposition - TB - Number that represents one of the classes (please refer to hamnosys_dicts.txt), parsed from notation -* Dominant - Handposition - Distance - Number that represents one of the classes (please refer to hamnosys_dicts.txt), parsed from notation +* Dominant - Handshape - Baseform2 +* Dominant - Handshape - Thumb position2 +* Dominant - Handshape - Bending2 +* Dominant - Handposition - Extended finger direction2 +* Dominant - Handposition - Palm orientation2 +* NONDominant - Handshape - Baseform +* NONDominant - Handshape - Thumb position +* NONDominant - Handshape - Bending +* NONDominant - Handposition - Extended finger direction +* NONDominant - Handposition - Palm orientation +* NONDominant - Handshape - Baseform2 +* NONDominant - Handshape - Thumb position2 +* NONDominant - Handshape - Bending2 +* NONDominant - Handposition - Extended finger direction2 +* NONDominant - Handposition - Palm orientation2 +* Handposition - LR - Number that represents one of the classes (please refer to hamnosys_dicts.txt), parsed from notation +* Handposition - TB - Number that represents one of the classes (please refer to hamnosys_dicts.txt), parsed from notation +* Handposition - Distance - Number that represents one of the classes (please refer to hamnosys_dicts.txt), parsed from notation -## hamnosys_example.txt -File created using [Korpusowy słownik polskiego języka migowego](https://www.slownikpjm.uw.edu.pl/) -Joanna Łacheta, Małgorzata Czajkowska-Kisil, Jadwiga Linde-Usiekniewicz, Paweł Rutkowski (red.), 2016, Korpusowy słownik polskiego języka migowego, Warszawa: Wydział Polonistyki Uniwersytetu Warszawskiego, ISBN: 978-83-64111-49-5 (publikacja online). +## Citation + +If you find this code useful in your research, please consider [citing](https://arxiv.org/abs/2204.06924): -## Font -HamNoSys font is required to be installed to properly undestand subcesction [Classes](#classes). -It can be downloaded directly from [DSG Corpus](https://www.sign-lang.uni-hamburg.de/dgs-korpus/index.php/hamnosys-97.html) website. - -## Classes -### Symmetry operator -Symmetry operator class consists of 9 symbols (0 to 8). Following list represents class number to symbol mapping: -0. None -1.  -2.  -3.  -4.  -5.  -6.  -7.  -8.  - -### Handshape - Baseform -Handshape - baseform class consists of 12 symbols (0 to 11). Following list represents class number to symbol mapping: -1.  -2.  -3.  -4.  -5.  -6.  -7.  -8.  -9.  -10.  -11.  -12.  - -### Handshape - Thumb position -Handshape - thumb position class consists of 4 symbols (0 to 3). Following list represents class number to symbol mapping ([Handshape - base form sign](#handshape---baseform) sign is used only as a reference): -0. None -1.  -2.  -3.  - -### Handshape - Bending -Handshape - bending class consists of 6 symbols (0 to 5). Following list represents class number to symbol mapping ([Handshape - base form sign](#handshape---baseform) sign is used only as a reference): -0.  (none) -1.  -2.  -3.  -4.  -5.  - -### Handposition - Extended finger direction -Handposition - extended finger direction class consists of 18 symbols (0 to 17). Following list represents class number to symbol mapping: -0.  -1.  -2.  -3.  -4.  -5.  -6.  -7.  -8.  -9.  -10.  -11.  -12.  -13.  -14.  -15.  -16.  -17.  -### Handposition - Palm orientation -Handshape - palm orientation class consists of 8 symbols (0 to 7). Following list represents class number to symbol mapping: -0.  -1.  -2.  -3.  -4.  -5.  -6.  -7.  - -### Handposition - Left/Right -Handposition - left/right class consists of 5 symbols (0 to 4). Following list represents class number to symbol mapping ([Handposition - Top/Bottom sign](#handposition---topbottom) is used only as a reference): -0.  - center -1.  - left to the left -2.  - left -3.  - right -4.  - right to the right - -### Handposition - Top/Bottom -Handposition - top/bottom class consists of 37 symbols (0 to 36). Following list represents class number to symbol mapping: -0. None -1.  -2.  -3.  -4.  -5.  -6.  -7.  -8.  -9.  -10.  -11.  -12.  -13.  -14.  -15.  -16.  -17.  -18.  -19.  -20.  -21.  -22.  -23.  -24.  -25.  -26.  -27.  -28.  -29.  -30.  -31.  -32.  -33.  -34.  -35.  -36.  - -### Handposition - Distance -Handposition - top/bottom class consists of 37 symbols (0 to 36). Following list represents class number to symbol mapping: -0. None -1.  -2.  -3.  -4.  -5.  -6.  +``` +@misc{majchrowska2022hamnosys, + doi = {10.48550/ARXIV.2204.06924}, + url = {https://arxiv.org/abs/2204.06924}, + author = {Majchrowska, Sylwia and Plantykow, Marta and Olech, Milena}, + title = {Open Source HamNoSys Parser for Multilingual Sign Language Encoding}, + publisher = {arXiv}, + year = {2022}, + copyright = {arXiv.org perpetual, non-exclusive license} +} +``` + +## Acknowledgement + +File `hamnosys_example.txt` was created using [Korpusowy słownik polskiego języka migowego](https://www.slownikpjm.uw.edu.pl/) + +Joanna Łacheta, Małgorzata Czajkowska-Kisil, Jadwiga Linde-Usiekniewicz, Paweł Rutkowski (red.), 2016, Korpusowy słownik polskiego języka migowego, Warszawa: Wydział Polonistyki Uniwersytetu Warszawskiego, ISBN: 978-83-64111-49-5 (publikacja online). ## Read more * [Introduction to HamNoSys](https://www.hearai.pl/post/4-hamnosys/) -* [Introduction to HamNoSys Part 2](https://www.hearai.pl/post/5-hamnosys2/) \ No newline at end of file +* [Introduction to HamNoSys Part 2](https://www.hearai.pl/post/5-hamnosys2/) diff --git a/imgs/Classes_all_some_frames.JPG b/imgs/Classes_all_some_frames.JPG new file mode 100644 index 0000000..7d6ab30 Binary files /dev/null and b/imgs/Classes_all_some_frames.JPG differ diff --git a/imgs/HamNoSys_structure_detailed.JPG b/imgs/HamNoSys_structure_detailed.JPG new file mode 100644 index 0000000..068676d Binary files /dev/null and b/imgs/HamNoSys_structure_detailed.JPG differ diff --git a/imgs/schemat.JPG b/imgs/schemat.JPG new file mode 100644 index 0000000..2488328 Binary files /dev/null and b/imgs/schemat.JPG differ diff --git a/parse-hamnosys.py b/parse-hamnosys.py index b92e35c..ca23a06 100644 --- a/parse-hamnosys.py +++ b/parse-hamnosys.py @@ -149,7 +149,6 @@ def main(args): data["Handposition - Distance"] = 0 for index, _ in data.iterrows(): - print(index) # Search for symmetry operators that consists of from 3 to 1 symbol, # remove if found for i in range(1, 10):