Fix Type Error in Nomic Logging (#174)

* updated nomic version in requirements.txt * Updated Nomic in requirements.txt * fix openai version to pre 1.0 * upgrade python from 3.8 to 3.10 * trying to fix tesseract // pdfminer requirements for image ingest * adding strict versions to all requirements * Bump pymupdf from 1.22.5 to 1.23.6 (#136) Bumps [pymupdf](https://github.com/pymupdf/pymupdf) from 1.22.5 to 1.23.6. - [Release notes](https://github.com/pymupdf/pymupdf/releases) - [Changelog](https://github.com/pymupdf/PyMuPDF/blob/main/changes.txt) - [Commits](pymupdf/PyMuPDF@1.22.5...1.23.6) --- updated-dependencies: - dependency-name: pymupdf dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * compatible wheel version * upgrade pip during image startup * properly upgrade pip * Fully lock ALL requirements. Hopefully speed up build times, too * Limit unstructured dependencies, image balloned from 700MB to 6GB. Hopefully resolved * Lock version of pip * Lock (correct) version of pip * add libgl1 for cv2 in Docker (for unstructured) * adding proper error logging to image ingest * Installing unstructured requirements individually to hopefully redoce bundle size by 5GB * Reduce use of unstructured, hopefully the install is much smaller now * Guarantee Unique S3 Upload paths (#137) * should be fully working, in final testing * trying to fix double nested kwargs * fixing readable_filename in pdf ingest * apt install tesseract-ocr, LAME * remove stupid typo * minor bug * Finally fix **kwargs passing * minor fix * guarding against webscrape kwargs in pdf * guarding against webscrape kwargs in pdf * guarding against webscrape kwargs in pdf * adding better error messages * revert req changes * simplify prints * Bump typing-extensions from 4.7.1 to 4.8.0 (#90) Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.7.1 to 4.8.0. - [Release notes](https://github.com/python/typing_extensions/releases) - [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md) - [Commits](python/typing_extensions@4.7.1...4.8.0) --- updated-dependencies: - dependency-name: typing-extensions dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kastan Day <[email protected]> * Bump flask from 2.3.3 to 3.0.0 (#101) Bumps [flask](https://github.com/pallets/flask) from 2.3.3 to 3.0.0. - [Release notes](https://github.com/pallets/flask/releases) - [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst) - [Commits](pallets/flask@2.3.3...3.0.0) --- updated-dependencies: - dependency-name: flask dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kastan Day <[email protected]> * Guard against kwargs failures during webscrape * HOTFIX: kwargs in html and pdf ingest for /webscrape * Export conversation history on /analysis page (#141) * updated nomic version in requirements.txt * initial commit to PR * created API endpoint * completed export function * testing csv export on railway * code to remove file from repo after download * moved file storing out of docs folder * added option for extending one URL our when on baseurl or to opt out of it * Guarentee unique s3 upload paths, support file updates (e.g. duplicate file guardfor Cron jobs) (#99) * added the add_users() for Canvas * added canvas course ingest * updated requirements * added .md ingest and fixed .py ingest * deleted test ipynb file * added nomic viz * added canvas file update function * completed update function * updated course export to include all contents * modified to handle diff file structures of downloaded content * modified canvas update * modified ingest function * modified update_files() for file replacement * removed the extra os.remove() * fix underscore to dash in for pip * removed json import and added abort to canvas functions * created separate PR for file update * added file-update logic in ingest, WIP * removed irrelevant text files * modified pdf ingest function * fixed PDF duplicate issue * removed unwanted files * updated nomic version in requirements.txt * modified s3_paths * testing unique filenames in aws upload * added missing library to requirements.txt * finished check_for_duplicates() * fixed filename errors * minor corrections * added a uuid check in check_for_duplicates() * regex depends on this being a dash * regex depends on this being a dash * Fix bug when no duplicate exists. * cleaning up prints, testing looks good. ready to merge * Further print and logging refinement * Remove s3 pased method for de-duplication, use Supabase only * remove duplicate imports * remove new requirement * Final print cleanups * remove pypdf import --------- Co-authored-by: root <root@ASMITA> Co-authored-by: Kastan Day <[email protected]> * Add Trunk Superlinter on-commit hooks (#164) * First attempt, should auto format on commit * maybe fix my yapf github action? Just bad formatting. * Finalized, excellent Trunk configs for my desired formatting * Further fix yapf GH Action * Full format of all files with Trunk * Fix more linting errors * Ignore .vscdoe folder * Reduce max line size to 120 (from 140) * Format code * Delete GH Action & Revert formatting in favor of Trunk. * Ignore the Readme * Remove trufflehog -- failing too much, confusing to new devs * Minor docstring update * trivial commit for testing * removing trivial commit for testing * Merge main into branch, vector_database.py probably needs work * Cleanup all Trunk lint errors that I can --------- Co-authored-by: KastanDay <[email protected]> Co-authored-by: Rohan Marwaha <[email protected]> * Add example usage of our public API for chat calls * Add timeout to request, best practice * Add example usage notebook for our public API * Improve usage example to return model's response for easy storage. Fix linter inf loop * Final fix: Switch to https connections * Enhance logging in getTopContexts(), improve usage exmaple * minor changes for postman testing * minor changes for testing * added print statements * re-creating error * added condition to check if content is a list * added json handling needed to test with Postman * exception handling for get-nomic-map * json decoding for testing * added prints for testing * added prints for testing * added prints for testing * added prints for testing * fix for string error in nomic log * removed json debugging code * Cleanup comments * Enhance type checking, cleanup formatting * formatting * Fix type checks to isinstance() * Revert vector_database.py to status on main --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Kastan Day <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jkmin3 <[email protected]> Co-authored-by: root <root@ASMITA> Co-authored-by: KastanDay <[email protected]> Co-authored-by: Rohan Marwaha <[email protected]>
UIUC-Chatbot · Dec 19, 2023 · 90ec8b9 · 90ec8b9
1 parent a619243
commit 90ec8b9
Show file tree

Hide file tree

Showing 3 changed files with 64 additions and 25 deletions.
diff --git a/ai_ta_backend/export_data.py b/ai_ta_backend/export_data.py
@@ -5,6 +5,7 @@
 import supabase
 import sentry_sdk
 
+
 def export_convo_history_csv(course_name: str, from_date='', to_date=''):
   """
   This function exports the conversation history to a csv file.

diff --git a/ai_ta_backend/main.py b/ai_ta_backend/main.py
@@ -33,8 +33,7 @@
     # Set profiles_sample_rate to 1.0 to profile 100% of sampled transactions.
     # We recommend adjusting this value in production.
     profiles_sample_rate=1.0,
-    enable_tracing=True
-)
+    enable_tracing=True)
 
 app = Flask(__name__)
 CORS(app)
@@ -491,6 +490,7 @@ def logToNomic():
   data = request.get_json()
   course_name = data['course_name']
   conversation = data['conversation']
+
   if course_name == '' or conversation == '':
     # proper web error "400 Bad request"
     abort(

diff --git a/ai_ta_backend/nomic_logging.py b/ai_ta_backend/nomic_logging.py
@@ -18,14 +18,15 @@ def log_convo_to_nomic(course_name: str, conversation) -> str:
   NOMIC_MAP_NAME_PREFIX = 'Conversation Map for '
   """
   Logs conversation to Nomic.
-  1. Check if map exists for given course
+  1. Check if ma
+  p exists for given course
   2. Check if conversation ID exists 
     - if yes, delete and add new data point
     - if no, add new data point
   3. Keep current logic for map doesn't exist - update metadata
   """
-  print(f"in log_convo_to_nomic() for course: {course_name}")
 
+  print(f"in log_convo_to_nomic() for course: {course_name}")
   messages = conversation['conversation']['messages']
   user_email = conversation['conversation']['user_email']
   conversation_id = conversation['conversation']['id']
@@ -42,6 +43,7 @@ def log_convo_to_nomic(course_name: str, conversation) -> str:
   try:
     # fetch project metadata and embbeddings
     project = AtlasProject(name=project_name, add_datums_if_exists=True)
+
     map_metadata_df = project.maps[1].data.df  # type: ignore
     map_embeddings_df = project.maps[1].embeddings.latent
     map_metadata_df['id'] = map_metadata_df['id'].astype(int)
@@ -70,7 +72,12 @@ def log_convo_to_nomic(course_name: str, conversation) -> str:
         else:
           emoji = "🤖 "
 
-        prev_convo += "\n>>> " + emoji + message['role'] + ": " + message['content'] + "\n"
+        if isinstance(message['content'], list):
+          text = message['content'][0]['text']
+        else:
+          text = message['content']
+
+        prev_convo += "\n>>> " + emoji + message['role'] + ": " + text + "\n"
 
       # modified timestamp
       current_time = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
@@ -92,15 +99,24 @@ def log_convo_to_nomic(course_name: str, conversation) -> str:
       # add new data point
       user_queries = []
       conversation_string = ""
+
       first_message = messages[0]['content']
+      if isinstance(first_message, list):
+        first_message = first_message[0]['text']
       user_queries.append(first_message)
 
       for message in messages:
         if message['role'] == 'user':
           emoji = "🙋 "
         else:
           emoji = "🤖 "
-        conversation_string += "\n>>> " + emoji + message['role'] + ": " + message['content'] + "\n"
+
+        if isinstance(message['content'], list):
+          text = message['content'][0]['text']
+        else:
+          text = message['content']
+
+        conversation_string += "\n>>> " + emoji + message['role'] + ": " + text + "\n"
 
       # modified timestamp
       current_time = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
@@ -163,18 +179,19 @@ def get_nomic_map(course_name: str):
 
   try:
     project = atlas.AtlasProject(name=project_name, add_datums_if_exists=True)
-  except Exception as e:
-    err = f"Nomic map does not exist yet, probably because you have less than 20 queries on your project: {e}"
+    map = project.get_map(project_name)
+
+    print(f"⏰ Nomic Full Map Retrieval: {(time.monotonic() - start_time):.2f} seconds")
+    return {"map_id": f"iframe{map.id}", "map_link": map.map_link}
+  except ValueError as ve:
+    # Error: ValueError: You must specify a unique_id_field when creating a new project.
+    err = f"Nomic map does not exist yet, probably because you have less than 20 queries on your project: {ve}"
     print(err)
+    return {"map_id": None, "map_link": None}
+  except Exception as e:
     sentry_sdk.capture_exception(e)
     return {"map_id": None, "map_link": None}
 
-  map = project.get_map(project_name)
-
-  print(f"⏰ Nomic Full Map Retrieval: {(time.monotonic() - start_time):.2f} seconds")
-
-  return {"map_id": f"iframe{map.id}", "map_link": map.map_link}
-
 
 def create_nomic_map(course_name: str, log_data: list):
   """
@@ -216,28 +233,44 @@ def create_nomic_map(course_name: str, log_data: list):
       created_at = pd.to_datetime(row['created_at']).strftime('%Y-%m-%d %H:%M:%S')
       convo = row['convo']
       messages = convo['messages']
+
       first_message = messages[0]['content']
+      if isinstance(first_message, list):
+        first_message = first_message[0]['text']
+
       user_queries.append(first_message)
 
       # create metadata for multi-turn conversation
       conversation = ""
-      if message['role'] == 'user':  # type: ignore
-        emoji = "🙋 "
-      else:
-        emoji = "🤖 "
       for message in messages:
         # string of role: content, role: content, ...
-        conversation += "\n>>> " + emoji + message['role'] + ": " + message['content'] + "\n"
+        if message['role'] == 'user':  # type: ignore
+          emoji = "🙋 "
+        else:
+          emoji = "🤖 "
+
+        if isinstance(message['content'], list):
+          text = message['content'][0]['text']
+        else:
+          text = message['content']
+
+        conversation += "\n>>> " + emoji + message['role'] + ": " + text + "\n"
 
       # append current chat to previous chat if convo already exists
       if convo['id'] == log_conversation_id:
         conversation_exists = True
-        if m['role'] == 'user':  # type: ignore
-          emoji = "🙋 "
-        else:
-          emoji = "🤖 "
+
         for m in log_messages:
-          conversation += "\n>>> " + emoji + m['role'] + ": " + m['content'] + "\n"
+          if m['role'] == 'user':  # type: ignore
+            emoji = "🙋 "
+          else:
+            emoji = "🤖 "
+
+          if isinstance(m['content'], list):
+            text = m['content'][0]['text']
+          else:
+            text = m['content']
+          conversation += "\n>>> " + emoji + m['role'] + ": " + text + "\n"
 
       # adding modified timestamp
       current_time = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
@@ -265,7 +298,12 @@ def create_nomic_map(course_name: str, log_data: list):
           emoji = "🙋 "
         else:
           emoji = "🤖 "
-        conversation += "\n>>> " + emoji + message['role'] + ": " + message['content'] + "\n"
+
+        if isinstance(message['content'], list):
+          text = message['content'][0]['text']
+        else:
+          text = message['content']
+        conversation += "\n>>> " + emoji + message['role'] + ": " + text + "\n"
 
       # adding timestamp
       current_time = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")