[FIX] Fix main (#59)

* fix port for default 9000 * reset loading function of levels * snomed api process * add = in env to avoid warning * changed port to 9000 * nice error message * removed empty dict as argument * also scores should appear in initial levels section * adapted pydantic model * adjusted test output * added test function * update readme * add readme images
neurobagel · Aug 19, 2024 · 152d70b · 152d70b
1 parent 825ffe8
commit 152d70b
Show file tree

Hide file tree

Showing 12 changed files with 255 additions and 152 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -15,7 +15,7 @@ COPY ./entrypoint.sh ./
 EXPOSE 9000
 
 # Define environment variable
-ENV PORT 9000
+ENV PORT=9000
 
 # Make the entrypoint script executable
 RUN chmod +x ./entrypoint.sh

diff --git a/Readme.md b/Readme.md
@@ -44,7 +44,7 @@ To run the current version of the LLM-based Annotation Tool locally execute
 the following command to start the uvicorn server locally:
 
 ```
-python3 app/api.py --host 127.0.0.1 --port 8000 
+python3 app/api.py --host 127.0.0.1 --port 9000 
 ```
 
 - For accessing the API via the browser please follow the instructions [here](#access-the-api-via-the-gui).
@@ -54,7 +54,7 @@ python3 app/api.py --host 127.0.0.1 --port 8000
 
 Since the annotation tool uses ollama to run the LLM it has to be provided by the docker container.
 This is done by extending the available [ollama container](https://hub.docker.com/r/ollama/ollama)
-For this instructions it is assumed that [docker](https://www.docker.com/) is installed.
+For this instruction it is assumed that [docker](https://www.docker.com/) is installed.
 
 #### Build the image
 
@@ -71,56 +71,61 @@ Let's break down the command:
 
 #### Run the container from the built image
 
+- CPU only:
+
+```bash
+docker run -d -v ollama:/root/.ollama -v /some/local/path/output:/app/output/  --name instance_name -p 9000:9000 annotation-tool-ai
+```
+
+- Nvidia GPU ([Nvidia container toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installation) has to be installed first)
+
 ```bash
-docker run -d 
--v ollama:/root/.ollama 
--v /some/local/path/output:/app/output/  
---name instance_name
--p 9000:8000 
-annotation-tool-ai
+docker run -d --gpus=all -v ollama:/root/.ollama -v /some/local/path/output:/app/output/  --name instance_name -p 9000:9000 annotation-tool-ai
 ```
 
-Let't break down the command:
+Let't break down the commands:
 
 - `docker run -d`: The -d flag runs the container in the background without any output in the terminal.
+- `--gpus=all`: GPUs should be used to run the model.
 - `-v ollama:root/.ollama`: The -v flag mounts external volumes into the container. In this case the models used within the container are stored locally as well as *Docker volumes* - these are created and managed by Docker itself and is not directly accessible via the local file system.
 - `-v /path/to/some/local/folder/:/app/output/`: This is a bind mount (also indicated by the -v flag) and makes a local directory accessible to the container. Via this folder the input and output files (i.e. the `.tsv` input and `.json` output files) are passed to the container but since the directory is mounted also locally accessible. Within the container the files are located in `app/output/`. For more information about Docker volumes vs. Bind mounts see [here](https://www.geeksforgeeks.org/docker-volume-vs-bind-mount/).
 - `--name instance-name`: Here you choose a (nice) name for your container from the image we created in the step above.
-- `-p 9000:8000`: Mount port for API requests inside the container
+- `-p 9000:9000`: Mount port for API requests inside the container
 - `annotation-tool-ai`: Name of the image we create the instance of.
 
-## Access the API via the GUI
-
-Once the `docker run` command or the `app/api.py` script has been executed, the uvicorn server for the FastAPI application will be initiated. To access the GUI for the API, please enter the following in your browser and follow the instructions provided.
+**NOTE**
 
-- Docker
-```
-http://127.0.0.1:9000/docs
-```
+If you want to access the API only from outside the container (which might be usually the case) it is not necessary to mount a directory when running the container. However, it has been kept in the command since it might be useful for debugging purposes.
 
-- Locally
-```
-http://127.0.0.1:8000/docs
-```
+---
 
+## Access the tool
 
-### Explanation of parameters used
+After successful deployment there are three options how to access the tool - either accessing the API directly, accessing it via the UI or running it via the command line. Independent of the access mode there are two parameters that have to be set:
 
-| param | value | info  |   
+| Parameter | Value | Info  |   
 |---|---|---|
 |`code_system`| `cogatlas`  | If assessment tools are identified within the provided `.tsv` file, the TermURLs and Labels from the [Cognitive Atlas](https://www.cognitiveatlas.org/) are assigned (if available). `cogatlas` is the default value.   |
 |   | `snomed`  | If assessment tools are identified within the provided `.tsv` file, the TermURLs and Labels from [SNOMED CT](https://www.snomed.org/) are assigned (if available).  |
 |`response_type`| `file` | After categorization and annotation the API provides a `.json` file ready to download. `file` is the default value |
 | | `json` | After categorization and annotation the API provides the raw JSON output.
 
+
+### Access the API directly
+
+Once the `docker run` command or the `app/api.py` script has been executed, the uvicorn server for the FastAPI application will be initiated. To access the GUI for the API, please enter the following in your browser and follow the instructions provided.
+
+```
+http://127.0.0.1:9000/docs
+```
+
 ![startAPI](docs/img/api-load.png)
 
+#### Results API
 
-### Results
 
 If `file` is the chosen `response_type` a `.json` file will be provided for download:
 
-
 ![fileResponse](docs/img/fileResponse.png)
 
 If `json` is the chosen `response_type` the direct JSON output will be provided by the API:
@@ -131,38 +136,74 @@ If `json` is the chosen `response_type` the direct JSON output will be provided
 Well done - you have annotated your tabular file! 
 (It's clear that this documentation is written in a way that you can follow the instructions and annotate your tabular file.)
 
-## Using the tool from the command line
+### Access the Tool via the the User-Interface
+
+If you don't want to access the tool directly through the API, but rather through a more user-friendly interface, you can set up the integrated UI locally on your machine.
 
-The following command runs the script for the annotation process if you deployed it via docker:
+First, since the UI is a react application, `nodejs` and `npm` (the node package manager) need to be installed on the system:
+
+```bash
+sudo apt-get update
+sudo apt-get install nodejs
+sudo apt-get install npm
+```
+
+Second, to access the interface, the application must be started locally. This is done from the `ui-integration' directory of the repository.
+
+```bash
+cd annotation-tool-ai/ui-integration
+npm start
+```
+
+If this was successful, the terminal shows:
+
+![ui-start](docs/img/ui-start.png)
+
+and the userinterface is accessible via `http://localhost:3000`. Please follow the instructions there.
+
+![ui-success](docs/img/ui-success.png)
+
+#### Results UI
+
+If `JSON` is the parameter chosen for response type, after running you data you should get something like:
+
+![alt text](docs/img/ui-json.png)
+
+If `File` is the chosen response type, a file will be automatically downloaded.
+
+### Using the tool from the command line
+
+The following command runs the script for the annotation process if you deployed it via docker (i.e. access is from INSIDE the docker container):
 
 Please choose the `code_system`, `response_type`and indicate the correct `instance_name` and filepaths.
 ```
-docker exec -it instance_name curl -X POST "http://127.0.0.1:8000/process/?code_system=<snomed | cogatlas>&response_type=<file | json>" 
+docker exec -it instance_name curl -X POST "http://127.0.0.1/9000/process/?code_system=<snomed | cogatlas>&response_type=<file | json>" 
 -F "file=@<filepath-to-tsv-inside-container>.tsv" 
 -o <filepath-to-output-file-inside-container>.json
 ```
 
-If you chose the local deployment you can run the tool via this command:
+If you chose the local deployment or you want to access the container from outside of it you can run the tool via this command:
+
 ```
-curl -X POST "http://127.0.0.1:8000/process/?code_system=<snomed | cogatlas>&response_type=<file | json>" 
--F "file=@<filepath-to-tsv-inside-container>.tsv" 
--o <filepath-to-output-file-inside-container>.json
+curl -X POST "http://127.0.0.1:9000/process/?code_system=<snomed | cogatlas>&response_type=<file | json>" 
+-F "file=@<filepath-to-tsv-outside-container>.tsv" 
+-o <filepath-to-output-file-outside-container>.json
 ```
 
-Let's break down this again (for non-docker deployment ignore the first 3 list items):
+
+Let's break down this again (for local/outside docker deployment ignore the first 3 list items):
 - `docker exec`: This command is used to execute a command in a running Docker container.
 - `-it`: Here are the `-i` and `-t` flag combined which allows for interactive terminal session. It is needed, for example, when you run commands that require input.
 - `api_test`: Name of the instance. 
-- `curl -X POST "http://127.0.0.1:9000/process/?code_system=<snomed | cogatlas>" -F "file=@<filepath-to-tsv-inside-container>.tsv" -o <filepath-to-output-file-inside-container>.json`: This is the command you want to execute in the interactive terminal session within the container. The input file is the to-be-annotated `.tsv` file and the output file is the `.json` file.
+- `curl -X POST "http://127.0.0.1:9000/process/?code_system=<snomed | cogatlas>" -F "file=@<filepath-to-tsv-inside/outside-container>.tsv" -o <filepath-to-output-file-inside/outside-container>.json`: This is the command that makes a POST request to the API. The input file is the to-be-annotated `.tsv` file and the output file is the `.json` file.
 
 ---
 **NOTE**
 
-The `-o <filepath-to-output-file-inside-container>.json` is only necessary if `file` is chosen as `response_type` parameter.
+The `-o <filepath-to-output-file-inside/outside-container>.json` is only necessary if `file` is chosen as `response_type` parameter.
 
 ---
 
-
 # Details of the codebase 
 
 Currently the development of the tool is divided into 2 aspects: Parsing and Categorization
@@ -309,7 +350,7 @@ flowchart LR
 subgraph TSV-Annotations
 Description([Description:\n set for each entity])
 Levels-Description([Levels-Description:\n used in Sex and Diagnosis,  responded by \n the LLM, mapped to the pre-defined terms \nand used for annotation in Levels-Explanation])
-	subgraph Annotations
+    subgraph Annotations
         subgraph Identifies
         identifies([used for ParticipantID \nand SessionID])
         end
@@ -321,11 +362,11 @@ Levels-Description([Levels-Description:\n used in Sex and Diagnosis,  responded
         end
         subgraph IsPartOf
         ispartof([used for AssessmentTool,\n provides TermURL and Label\n for the Assessment Tool.])
-		end
+        end
         subgraph IsAbout
         isabout([TermURL responded by \n the LLM categorization \n serves as controller \nfor further annotation])
         end
-	end
+    end
 end
 
 style isabout fill:#f542bc

diff --git a/app/api.py b/app/api.py
@@ -81,7 +81,7 @@ async def process_files(
         help="Host to run the server on",
     )
     parser.add_argument(
-        "--port", type=int, default=8000, help="Port to run the server on"
+        "--port", type=int, default=9000, help="Port to run the server on"
     )
 
     args = parser.parse_args()

diff --git a/app/categorization/llm_categorization.py b/app/categorization/llm_categorization.py
@@ -23,9 +23,7 @@ def Diagnosis(
     if "yes" in reply.lower():
         output = {"TermURL": "nb:Diagnosis", "Levels": {}}
         unique_entries=list_terms(key,value)
-        levels={} #the empty dictionary passed to the diagnosis_level function to be filled 
-        level={} # the dictionary which will become the output 
-        level = Diagnosis_Level(unique_entries, code_system,levels)        
+        level = Diagnosis_Level(unique_entries, code_system)        
         output["Levels"] = level
         print(json.dumps(output))
         return output

diff --git a/app/categorization/llm_helper.py b/app/categorization/llm_helper.py
@@ -146,7 +146,7 @@ def are_all_digits(input_list):
     # Check if all elements in the list are digit strings
         return all(element.isdigit() for element in input_list)
 
-def Diagnosis_Level(unique_entries:dict,code_system: str,levels):
+def Diagnosis_Level(unique_entries:dict,code_system: str):
     # print(unique_entries)
 
     def load_dictionary(file_path):
@@ -172,17 +172,20 @@ def get_label_for_abbreviation(abbreviation:str, abbreviation_to_label):
 
 
     def Get_Level(unique_entries:list):
+        levels = {}
         if are_all_digits(unique_entries):
             print("scores")
+            levels = {str(level): "unknown" for level in unique_entries}
+            print("levels only numbers")
+            return levels
         else:
             for i in range (0,len(unique_entries)):
                 levelfield=get_label_for_abbreviation(unique_entries[i],data)
                 levels[unique_entries[i]] = levelfield
-
-
+            return levels
 
 
-    Get_Level(unique_entries)
+    levels = Get_Level(unique_entries)
     print(''' 
 
 helper return
@@ -192,9 +195,6 @@ def Get_Level(unique_entries:list):
     print(levels)
     return levels
 
-
-
-
 def get_assessment_label(key: str, code_system: str) -> Union[str, List[str]]:
     def load_dictionary(file_path: str) -> Any:
         with open(file_path, "r") as file:

diff --git a/app/parsing/json_parsing.py b/app/parsing/json_parsing.py
@@ -51,13 +51,13 @@ class Annotations(BaseModel):  # type:ignore
     Identifies: Optional[str] = None
 
     Levels: Optional[
-        Union[
-            Dict[str, List[Dict[str, str]]],
-            Dict[str, Dict[str, str]],
-            Dict[str, str],
-            Dict[str, List[str]],
-            # Add this to allow for lists of strings
-        ]
+        Dict[str, Union[
+            List[Dict[str, str]],  # Detailed items
+            List[str],             # List of strings for simpler cases
+            Dict[str, str],        # Dictionary of strings for simpler cases
+            Dict[str, Dict[str, str]],  # Complex nested dictionaries
+            Dict[str, List[str]]   # List of strings in dictionary format
+        ]]
     ] = None
     Transformation: Optional[Dict[str, str]] = None
     IsPartOf: Optional[Union[List[Dict[str, str]], Dict[str, str], str]] = None
@@ -148,6 +148,12 @@ def handle_categorical(
             ]
             for key, value in parsed_output.get("Levels", {}).items()
         }
+
+        # Convert lists with a single item into a single dictionary if only one value exists
+        for key in levels:
+            if len(levels[key]) == 1:
+                levels[key] = levels[key][0]
+
     if termurl == "nb:Sex":
         levels = {
             key: (
@@ -208,7 +214,8 @@ def handle_assessmentTool(
         )
 
     elif ispartof_key == "Not found":
-        annotations = Annotations(IsAbout=annotation_instance, IsPartOf=None)
+        empty_ispartof = {"TermURL": " ", "Label": " "}
+        annotations = Annotations(IsAbout=annotation_instance, IsPartOf=empty_ispartof)
 
     else:
         ispartof_key = ispartof_key.strip().lower()
@@ -230,27 +237,12 @@ def handle_assessmentTool(
 def load_levels_mapping(mapping_file: str) -> Dict[str, Dict[str, str]]:
     with open(mapping_file, "r") as file:
         mappings = json.load(file)
-
-    levels_mapping = {}
-    for entry in mappings:
-        label_key = entry.get("label", "").strip().lower()
-        identifier_key = entry.get("identifier")
-
-        if not label_key:
-            print(f"Warning: Missing or empty 'label' in entry: {entry}")
-            continue
-
-        if not identifier_key:
-            # print(f"Warning: Missing 'identifier' for label '{label_key}' in entry: {entry}")
-            # Optionally, you can skip this entry or assign a default value
-            identifier_key = "default_identifier"
-
-        levels_mapping[label_key] = {
-            "TermURL": identifier_key,
-            "Label": entry["label"],
-        }
-
-    return levels_mapping
+    return {
+        entry["label"]
+        .strip()
+        .lower(): {"TermURL": entry["identifier"], "Label": entry["label"]}
+        for entry in mappings
+    }
 
 # noqa: E501
 def load_assessmenttool_mapping(
@@ -286,7 +278,7 @@ def process_parsed_output(
     elif code_system == "snomed":
         print("Using SNOMED CT terms for assessment tool annotation.")
         assessmenttool_mapping_file = (
-            "app/parsing/abbreviations_measurementTerms.json"
+            "app/parsing/measurementTerms.json"
         )
         assessmenttool_mapping = load_levels_mapping(
             assessmenttool_mapping_file
@@ -339,8 +331,8 @@ def process_parsed_output(
                 )
         else:
             return "Error: TermURL is missing from the parsed output"
-    else:
-        return "Error: parsed_output is not a dictionary"
+    elif parsed_output is None:
+        return "The LLM does not find any suitable entity in the current Neurobagel data model. Please be patient as we are working on increasing the LLM performance and extending the data model :)"
 
 
 def update_json_file(

diff --git a/docs/img/ui-json.png b/docs/img/ui-json.png
diff --git a/docs/img/ui-start.png b/docs/img/ui-start.png
diff --git a/docs/img/ui-success.png b/docs/img/ui-success.png
diff --git a/entrypoint.sh b/entrypoint.sh
@@ -1,5 +1,5 @@
 #!/bin/bash
-python3 app/api.py --host 0.0.0.0 --port 8000 &
+python3 app/api.py --host 0.0.0.0 --port 9000 &
 ollama serve &&
 ollama pull gemma &&
 ollama run gemma 

diff --git a/tests/test_json_parsing.py b/tests/test_json_parsing.py
@@ -246,8 +246,8 @@ def test_diagnosis_variable(
                         "Label": "Primary dysthymia",
                     },
                 ],
-                "CTRL": [{}],
-                "Group": [{}],  # noqa: E501
+                "CTRL": {},
+                "Group": {},  # noqa: E501
             },
         ),
     )