Merge pull request #13 from beeinger/develop

Sunset
beeinger · Oct 2, 2024 · ab60952 · ab60952
2 parents 8c5de55 + aee627f
commit ab60952
Show file tree

Hide file tree

Showing 191 changed files with 2,289,994 additions and 1,997 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1 @@
+backend/dynamo-db-dump/dump-job-posts.json filter=lfs diff=lfs merge=lfs -text
diff --git a/.github/workflows/backend.yml b/.github/workflows/backend.yml
@@ -50,27 +50,28 @@ jobs:
   # are configured in travis settings
   # see https://serverless.com/framework/docs/providers/aws/guide/credentials/
   # for more information
-  deploy:
-    if: github.ref == 'refs/heads/release'
-    runs-on: ubuntu-latest
-    needs: [test]
-    steps:
-      - name: Set up Rust
-        uses: hecrj/setup-rust-action@v1
-      - name: Checkout
-        uses: actions/checkout@v2
-      - name: Deploy
-        if: env.AWS_ACCESS_KEY_ID && env.AWS_SECRET_ACCESS_KEY
-        env:
-          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
-          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
-          AWS_DEFAULT_REGION: eu-west-2
-        run: |
-          cd backend
-          echo ${{ secrets.ENV_FILE }} > .env
-          sudo apt-get install musl-tools
-          export CC_x86_64_unknown_linux_musl=musl-gcc
-          export CARGO_TARGET_X86_64_UNKNOWN_LINUX_MUSL_LINKER=musl-gcc
-          rustup target add x86_64-unknown-linux-musl
-          yarn install --frozen-lockfile
-          yarn sls deploy --conceal --stage prod
+  # ! Uncomment the below to deploy to AWS
+  # deploy:
+  #   if: github.ref == 'refs/heads/release'
+  #   runs-on: ubuntu-latest
+  #   needs: [test]
+  #   steps:
+  #     - name: Set up Rust
+  #       uses: hecrj/setup-rust-action@v1
+  #     - name: Checkout
+  #       uses: actions/checkout@v2
+  #     - name: Deploy
+  #       if: env.AWS_ACCESS_KEY_ID && env.AWS_SECRET_ACCESS_KEY
+  #       env:
+  #         AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+  #         AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+  #         AWS_DEFAULT_REGION: eu-west-2
+  #       run: |
+  #         cd backend
+  #         echo ${{ secrets.ENV_FILE }} > .env
+  #         sudo apt-get install musl-tools
+  #         export CC_x86_64_unknown_linux_musl=musl-gcc
+  #         export CARGO_TARGET_X86_64_UNKNOWN_LINUX_MUSL_LINKER=musl-gcc
+  #         rustup target add x86_64-unknown-linux-musl
+  #         yarn install --frozen-lockfile
+  #         yarn sls deploy --conceal --stage prod
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+.DS_Store
diff --git a/README.md b/README.md
@@ -0,0 +1,34 @@
+# TTR-guide
+
+### Project available at: [ttr.guide](https://ttr.guide)
+
+### Recorded demo available at: [https://youtu.be/5jQCix0P_fE](https://youtu.be/5jQCix0P_fE)
+
+## Sunset Statement
+
+**Sorry to everyone whose been using it, but as of October 2024 this is now sunset.**
+
+Anyone who wants to use this is free to set it up on their own!
+
+I am sunsetting this due to high costs of DynamoDb on AWS, I'd love to make another iteration of this project in the future, but for now it's not feasible.
+It definitely needs lots of changes, first of all getting rid of DynamoDb, architecture redesign and drastically improving the code quality.
+
+**_Thank you for understanding and sorry for the inconvenience!_**
+
+#### DynamoDB Dump
+
+As this project is sunset I made a dump of all collected data from the DynamoDB database.
+
+It is available in the [`backend/dynamo-db-dump`](https://github.com/beeinger/TTR-guide/tree/develop/backend/dynamo-db-dump) folder.
+
+## Abstract
+
+The Tools and Technologies Research guide (TTR.guide) project aimed to provide a comprehensive guide to job market analysis and provide valuable insights for both end-users and developers by leveraging the power of natural language processing (NLP) techniques. This open-source project utilised the GPT-3.5 Turbo OpenAI API to extract tools and technologies from job postings. The project followed an Agile methodology, which allowed for continuous iteration and improvement, while careful consideration of ethical, legal, and social aspects related to data handling and user privacy were also prioritised.
+
+The TTR platform encompasses a backend with an API and a frontend with dashboard visualisations and API documentation. The data collection component has so far amassed over 60,000 job posts, primarily focusing on programming and engineering jobs sourced from reed.co.uk API. The GPT model was used for efficient data processing, capable of processing diverse and unstructured data. The system successfully processed three bursts of data, each containing about 10,000 job posts per second.
+
+The TTR system's adaptability and modularity are key strengths, facilitating scalability and potential expansion. The frontend features a landing page with SEO optimisations, metadata tags, and branding, an interactive API documentation page, and a search and statistics page that delivers valuable insights to users. The TTR project's open-source future is secured by licensing it under the GPL-3.0 licence, encouraging contributions from other developers and researchers while fostering growth and contributions within the broader open-source community.
+
+The TTR project's success was due to the effectiveness of GPT-based NLP for data processing, the Agile methodology's adaptability, effective data handling and processing, scalability, open-source nature and best practices, and careful consideration of ethical, legal, and social aspects. The project provided valuable learning experiences, including NLP and GPT techniques, software development, architecture of systems and project management, and ethical, legal, and social considerations.
+
+Future improvements to the TTR platform could include dataset expansion, enhancing NLP and GPT techniques, additional features, routine maintenance and optimization, and collaborations with job posting websites. In conclusion, the TTR project represents a significant achievement in providing a valuable and adaptable tool for job market analysis, with the experiences gained and lessons learned throughout its development serving as a strong foundation for future projects and endeavours.
diff --git a/backend/dynamo-db-dump/README.md b/backend/dynamo-db-dump/README.md
@@ -0,0 +1,5 @@
+# DynamoDB Dump
+
+As this project is sunset I made a dump of all collected data from the DynamoDB database.
+
+Feel free to use it for your own projects or research.
diff --git a/backend/dynamo-db-dump/count_items.py b/backend/dynamo-db-dump/count_items.py
@@ -0,0 +1,29 @@
+import json
+import sys
+
+# Check if the user provided a file path
+if len(sys.argv) != 2:
+    print("Usage: python3 count_items.py <file_path>")
+    sys.exit(1)
+
+# Get the file path from the command line argument
+json_file_path = sys.argv[1]
+
+# Load the JSON file
+try:
+    with open(json_file_path, 'r') as json_file:
+        data = json.load(json_file)
+except FileNotFoundError:
+    print(f"File not found: {json_file_path}")
+    sys.exit(1)
+except json.JSONDecodeError:
+    print(f"Error decoding JSON from file: {json_file_path}")
+    sys.exit(1)
+
+# Count the number of items in the JSON (assuming the 'Items' key holds the data)
+if 'Items' in data:
+    item_count = len(data['Items'])
+    print(f'Total number of items in the JSON file: {item_count}')
+else:
+    print("The JSON file does not contain an 'Items' key.")
+
diff --git a/backend/dynamo-db-dump/dump-job-posts.json b/backend/dynamo-db-dump/dump-job-posts.json
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		backend/dynamo-db-dump/dump-job-posts.json filter=lfs diff=lfs merge=lfs -text