Skip to content

Commit

Permalink
[FIX] Fix main (#59)
Browse files Browse the repository at this point in the history
* fix port for default 9000

* reset loading function of levels

* snomed api process

* add = in env to avoid warning

* changed port to 9000

* nice error message

* removed empty dict as argument

* also scores should appear in initial levels section

* adapted pydantic model

* adjusted test output

* added test function

* update readme

* add readme images
  • Loading branch information
barbarastrasser authored Aug 19, 2024
1 parent 825ffe8 commit 152d70b
Show file tree
Hide file tree
Showing 12 changed files with 255 additions and 152 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ COPY ./entrypoint.sh ./
EXPOSE 9000

# Define environment variable
ENV PORT 9000
ENV PORT=9000

# Make the entrypoint script executable
RUN chmod +x ./entrypoint.sh
Expand Down
119 changes: 80 additions & 39 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ To run the current version of the LLM-based Annotation Tool locally execute
the following command to start the uvicorn server locally:

```
python3 app/api.py --host 127.0.0.1 --port 8000
python3 app/api.py --host 127.0.0.1 --port 9000
```

- For accessing the API via the browser please follow the instructions [here](#access-the-api-via-the-gui).
Expand All @@ -54,7 +54,7 @@ python3 app/api.py --host 127.0.0.1 --port 8000

Since the annotation tool uses ollama to run the LLM it has to be provided by the docker container.
This is done by extending the available [ollama container](https://hub.docker.com/r/ollama/ollama)
For this instructions it is assumed that [docker](https://www.docker.com/) is installed.
For this instruction it is assumed that [docker](https://www.docker.com/) is installed.

#### Build the image

Expand All @@ -71,56 +71,61 @@ Let's break down the command:

#### Run the container from the built image

- CPU only:

```bash
docker run -d -v ollama:/root/.ollama -v /some/local/path/output:/app/output/ --name instance_name -p 9000:9000 annotation-tool-ai
```

- Nvidia GPU ([Nvidia container toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installation) has to be installed first)

```bash
docker run -d
-v ollama:/root/.ollama
-v /some/local/path/output:/app/output/
--name instance_name
-p 9000:8000
annotation-tool-ai
docker run -d --gpus=all -v ollama:/root/.ollama -v /some/local/path/output:/app/output/ --name instance_name -p 9000:9000 annotation-tool-ai
```

Let't break down the command:
Let't break down the commands:

- `docker run -d`: The -d flag runs the container in the background without any output in the terminal.
- `--gpus=all`: GPUs should be used to run the model.
- `-v ollama:root/.ollama`: The -v flag mounts external volumes into the container. In this case the models used within the container are stored locally as well as *Docker volumes* - these are created and managed by Docker itself and is not directly accessible via the local file system.
- `-v /path/to/some/local/folder/:/app/output/`: This is a bind mount (also indicated by the -v flag) and makes a local directory accessible to the container. Via this folder the input and output files (i.e. the `.tsv` input and `.json` output files) are passed to the container but since the directory is mounted also locally accessible. Within the container the files are located in `app/output/`. For more information about Docker volumes vs. Bind mounts see [here](https://www.geeksforgeeks.org/docker-volume-vs-bind-mount/).
- `--name instance-name`: Here you choose a (nice) name for your container from the image we created in the step above.
- `-p 9000:8000`: Mount port for API requests inside the container
- `-p 9000:9000`: Mount port for API requests inside the container
- `annotation-tool-ai`: Name of the image we create the instance of.

## Access the API via the GUI

Once the `docker run` command or the `app/api.py` script has been executed, the uvicorn server for the FastAPI application will be initiated. To access the GUI for the API, please enter the following in your browser and follow the instructions provided.
**NOTE**

- Docker
```
http://127.0.0.1:9000/docs
```
If you want to access the API only from outside the container (which might be usually the case) it is not necessary to mount a directory when running the container. However, it has been kept in the command since it might be useful for debugging purposes.

- Locally
```
http://127.0.0.1:8000/docs
```
---

## Access the tool

### Explanation of parameters used
After successful deployment there are three options how to access the tool - either accessing the API directly, accessing it via the UI or running it via the command line. Independent of the access mode there are two parameters that have to be set:

| param | value | info |
| Parameter | Value | Info |
|---|---|---|
|`code_system`| `cogatlas` | If assessment tools are identified within the provided `.tsv` file, the TermURLs and Labels from the [Cognitive Atlas](https://www.cognitiveatlas.org/) are assigned (if available). `cogatlas` is the default value. |
| | `snomed` | If assessment tools are identified within the provided `.tsv` file, the TermURLs and Labels from [SNOMED CT](https://www.snomed.org/) are assigned (if available). |
|`response_type`| `file` | After categorization and annotation the API provides a `.json` file ready to download. `file` is the default value |
| | `json` | After categorization and annotation the API provides the raw JSON output.


### Access the API directly

Once the `docker run` command or the `app/api.py` script has been executed, the uvicorn server for the FastAPI application will be initiated. To access the GUI for the API, please enter the following in your browser and follow the instructions provided.

```
http://127.0.0.1:9000/docs
```

![startAPI](docs/img/api-load.png)

#### Results API

### Results

If `file` is the chosen `response_type` a `.json` file will be provided for download:


![fileResponse](docs/img/fileResponse.png)

If `json` is the chosen `response_type` the direct JSON output will be provided by the API:
Expand All @@ -131,38 +136,74 @@ If `json` is the chosen `response_type` the direct JSON output will be provided
Well done - you have annotated your tabular file!
(It's clear that this documentation is written in a way that you can follow the instructions and annotate your tabular file.)

## Using the tool from the command line
### Access the Tool via the the User-Interface

If you don't want to access the tool directly through the API, but rather through a more user-friendly interface, you can set up the integrated UI locally on your machine.

The following command runs the script for the annotation process if you deployed it via docker:
First, since the UI is a react application, `nodejs` and `npm` (the node package manager) need to be installed on the system:

```bash
sudo apt-get update
sudo apt-get install nodejs
sudo apt-get install npm
```

Second, to access the interface, the application must be started locally. This is done from the `ui-integration' directory of the repository.

```bash
cd annotation-tool-ai/ui-integration
npm start
```

If this was successful, the terminal shows:

![ui-start](docs/img/ui-start.png)

and the userinterface is accessible via `http://localhost:3000`. Please follow the instructions there.

![ui-success](docs/img/ui-success.png)

#### Results UI

If `JSON` is the parameter chosen for response type, after running you data you should get something like:

![alt text](docs/img/ui-json.png)

If `File` is the chosen response type, a file will be automatically downloaded.

### Using the tool from the command line

The following command runs the script for the annotation process if you deployed it via docker (i.e. access is from INSIDE the docker container):

Please choose the `code_system`, `response_type`and indicate the correct `instance_name` and filepaths.
```
docker exec -it instance_name curl -X POST "http://127.0.0.1:8000/process/?code_system=<snomed | cogatlas>&response_type=<file | json>"
docker exec -it instance_name curl -X POST "http://127.0.0.1/9000/process/?code_system=<snomed | cogatlas>&response_type=<file | json>"
-F "file=@<filepath-to-tsv-inside-container>.tsv"
-o <filepath-to-output-file-inside-container>.json
```

If you chose the local deployment you can run the tool via this command:
If you chose the local deployment or you want to access the container from outside of it you can run the tool via this command:

```
curl -X POST "http://127.0.0.1:8000/process/?code_system=<snomed | cogatlas>&response_type=<file | json>"
-F "file=@<filepath-to-tsv-inside-container>.tsv"
-o <filepath-to-output-file-inside-container>.json
curl -X POST "http://127.0.0.1:9000/process/?code_system=<snomed | cogatlas>&response_type=<file | json>"
-F "file=@<filepath-to-tsv-outside-container>.tsv"
-o <filepath-to-output-file-outside-container>.json
```

Let's break down this again (for non-docker deployment ignore the first 3 list items):

Let's break down this again (for local/outside docker deployment ignore the first 3 list items):
- `docker exec`: This command is used to execute a command in a running Docker container.
- `-it`: Here are the `-i` and `-t` flag combined which allows for interactive terminal session. It is needed, for example, when you run commands that require input.
- `api_test`: Name of the instance.
- `curl -X POST "http://127.0.0.1:9000/process/?code_system=<snomed | cogatlas>" -F "file=@<filepath-to-tsv-inside-container>.tsv" -o <filepath-to-output-file-inside-container>.json`: This is the command you want to execute in the interactive terminal session within the container. The input file is the to-be-annotated `.tsv` file and the output file is the `.json` file.
- `curl -X POST "http://127.0.0.1:9000/process/?code_system=<snomed | cogatlas>" -F "file=@<filepath-to-tsv-inside/outside-container>.tsv" -o <filepath-to-output-file-inside/outside-container>.json`: This is the command that makes a POST request to the API. The input file is the to-be-annotated `.tsv` file and the output file is the `.json` file.

---
**NOTE**

The `-o <filepath-to-output-file-inside-container>.json` is only necessary if `file` is chosen as `response_type` parameter.
The `-o <filepath-to-output-file-inside/outside-container>.json` is only necessary if `file` is chosen as `response_type` parameter.

---


# Details of the codebase

Currently the development of the tool is divided into 2 aspects: Parsing and Categorization
Expand Down Expand Up @@ -309,7 +350,7 @@ flowchart LR
subgraph TSV-Annotations
Description([Description:\n set for each entity])
Levels-Description([Levels-Description:\n used in Sex and Diagnosis, responded by \n the LLM, mapped to the pre-defined terms \nand used for annotation in Levels-Explanation])
subgraph Annotations
subgraph Annotations
subgraph Identifies
identifies([used for ParticipantID \nand SessionID])
end
Expand All @@ -321,11 +362,11 @@ Levels-Description([Levels-Description:\n used in Sex and Diagnosis, responded
end
subgraph IsPartOf
ispartof([used for AssessmentTool,\n provides TermURL and Label\n for the Assessment Tool.])
end
end
subgraph IsAbout
isabout([TermURL responded by \n the LLM categorization \n serves as controller \nfor further annotation])
end
end
end
end
style isabout fill:#f542bc
Expand Down
2 changes: 1 addition & 1 deletion app/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ async def process_files(
help="Host to run the server on",
)
parser.add_argument(
"--port", type=int, default=8000, help="Port to run the server on"
"--port", type=int, default=9000, help="Port to run the server on"
)

args = parser.parse_args()
Expand Down
4 changes: 1 addition & 3 deletions app/categorization/llm_categorization.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,7 @@ def Diagnosis(
if "yes" in reply.lower():
output = {"TermURL": "nb:Diagnosis", "Levels": {}}
unique_entries=list_terms(key,value)
levels={} #the empty dictionary passed to the diagnosis_level function to be filled
level={} # the dictionary which will become the output
level = Diagnosis_Level(unique_entries, code_system,levels)
level = Diagnosis_Level(unique_entries, code_system)
output["Levels"] = level
print(json.dumps(output))
return output
Expand Down
14 changes: 7 additions & 7 deletions app/categorization/llm_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ def are_all_digits(input_list):
# Check if all elements in the list are digit strings
return all(element.isdigit() for element in input_list)

def Diagnosis_Level(unique_entries:dict,code_system: str,levels):
def Diagnosis_Level(unique_entries:dict,code_system: str):
# print(unique_entries)

def load_dictionary(file_path):
Expand All @@ -172,17 +172,20 @@ def get_label_for_abbreviation(abbreviation:str, abbreviation_to_label):


def Get_Level(unique_entries:list):
levels = {}
if are_all_digits(unique_entries):
print("scores")
levels = {str(level): "unknown" for level in unique_entries}
print("levels only numbers")
return levels
else:
for i in range (0,len(unique_entries)):
levelfield=get_label_for_abbreviation(unique_entries[i],data)
levels[unique_entries[i]] = levelfield


return levels


Get_Level(unique_entries)
levels = Get_Level(unique_entries)
print('''
helper return
Expand All @@ -192,9 +195,6 @@ def Get_Level(unique_entries:list):
print(levels)
return levels




def get_assessment_label(key: str, code_system: str) -> Union[str, List[str]]:
def load_dictionary(file_path: str) -> Any:
with open(file_path, "r") as file:
Expand Down
56 changes: 24 additions & 32 deletions app/parsing/json_parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,13 @@ class Annotations(BaseModel): # type:ignore
Identifies: Optional[str] = None

Levels: Optional[
Union[
Dict[str, List[Dict[str, str]]],
Dict[str, Dict[str, str]],
Dict[str, str],
Dict[str, List[str]],
# Add this to allow for lists of strings
]
Dict[str, Union[
List[Dict[str, str]], # Detailed items
List[str], # List of strings for simpler cases
Dict[str, str], # Dictionary of strings for simpler cases
Dict[str, Dict[str, str]], # Complex nested dictionaries
Dict[str, List[str]] # List of strings in dictionary format
]]
] = None
Transformation: Optional[Dict[str, str]] = None
IsPartOf: Optional[Union[List[Dict[str, str]], Dict[str, str], str]] = None
Expand Down Expand Up @@ -148,6 +148,12 @@ def handle_categorical(
]
for key, value in parsed_output.get("Levels", {}).items()
}

# Convert lists with a single item into a single dictionary if only one value exists
for key in levels:
if len(levels[key]) == 1:
levels[key] = levels[key][0]

if termurl == "nb:Sex":
levels = {
key: (
Expand Down Expand Up @@ -208,7 +214,8 @@ def handle_assessmentTool(
)

elif ispartof_key == "Not found":
annotations = Annotations(IsAbout=annotation_instance, IsPartOf=None)
empty_ispartof = {"TermURL": " ", "Label": " "}
annotations = Annotations(IsAbout=annotation_instance, IsPartOf=empty_ispartof)

else:
ispartof_key = ispartof_key.strip().lower()
Expand All @@ -230,27 +237,12 @@ def handle_assessmentTool(
def load_levels_mapping(mapping_file: str) -> Dict[str, Dict[str, str]]:
with open(mapping_file, "r") as file:
mappings = json.load(file)

levels_mapping = {}
for entry in mappings:
label_key = entry.get("label", "").strip().lower()
identifier_key = entry.get("identifier")

if not label_key:
print(f"Warning: Missing or empty 'label' in entry: {entry}")
continue

if not identifier_key:
# print(f"Warning: Missing 'identifier' for label '{label_key}' in entry: {entry}")
# Optionally, you can skip this entry or assign a default value
identifier_key = "default_identifier"

levels_mapping[label_key] = {
"TermURL": identifier_key,
"Label": entry["label"],
}

return levels_mapping
return {
entry["label"]
.strip()
.lower(): {"TermURL": entry["identifier"], "Label": entry["label"]}
for entry in mappings
}

# noqa: E501
def load_assessmenttool_mapping(
Expand Down Expand Up @@ -286,7 +278,7 @@ def process_parsed_output(
elif code_system == "snomed":
print("Using SNOMED CT terms for assessment tool annotation.")
assessmenttool_mapping_file = (
"app/parsing/abbreviations_measurementTerms.json"
"app/parsing/measurementTerms.json"
)
assessmenttool_mapping = load_levels_mapping(
assessmenttool_mapping_file
Expand Down Expand Up @@ -339,8 +331,8 @@ def process_parsed_output(
)
else:
return "Error: TermURL is missing from the parsed output"
else:
return "Error: parsed_output is not a dictionary"
elif parsed_output is None:
return "The LLM does not find any suitable entity in the current Neurobagel data model. Please be patient as we are working on increasing the LLM performance and extending the data model :)"


def update_json_file(
Expand Down
Binary file added docs/img/ui-json.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/ui-start.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/ui-success.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion entrypoint.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
python3 app/api.py --host 0.0.0.0 --port 8000 &
python3 app/api.py --host 0.0.0.0 --port 9000 &
ollama serve &&
ollama pull gemma &&
ollama run gemma
Expand Down
4 changes: 2 additions & 2 deletions tests/test_json_parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,8 +246,8 @@ def test_diagnosis_variable(
"Label": "Primary dysthymia",
},
],
"CTRL": [{}],
"Group": [{}], # noqa: E501
"CTRL": {},
"Group": {}, # noqa: E501
},
),
)
Expand Down
Loading

0 comments on commit 152d70b

Please sign in to comment.