Skip to content

Commit

Permalink
Merge pull request #13 from beeinger/develop
Browse files Browse the repository at this point in the history
Sunset
  • Loading branch information
beeinger authored Oct 2, 2024
2 parents 8c5de55 + aee627f commit ab60952
Show file tree
Hide file tree
Showing 191 changed files with 2,289,994 additions and 1,997 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
backend/dynamo-db-dump/dump-job-posts.json filter=lfs diff=lfs merge=lfs -text
49 changes: 25 additions & 24 deletions .github/workflows/backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,27 +50,28 @@ jobs:
# are configured in travis settings
# see https://serverless.com/framework/docs/providers/aws/guide/credentials/
# for more information
deploy:
if: github.ref == 'refs/heads/release'
runs-on: ubuntu-latest
needs: [test]
steps:
- name: Set up Rust
uses: hecrj/setup-rust-action@v1
- name: Checkout
uses: actions/checkout@v2
- name: Deploy
if: env.AWS_ACCESS_KEY_ID && env.AWS_SECRET_ACCESS_KEY
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: eu-west-2
run: |
cd backend
echo ${{ secrets.ENV_FILE }} > .env
sudo apt-get install musl-tools
export CC_x86_64_unknown_linux_musl=musl-gcc
export CARGO_TARGET_X86_64_UNKNOWN_LINUX_MUSL_LINKER=musl-gcc
rustup target add x86_64-unknown-linux-musl
yarn install --frozen-lockfile
yarn sls deploy --conceal --stage prod
# ! Uncomment the below to deploy to AWS
# deploy:
# if: github.ref == 'refs/heads/release'
# runs-on: ubuntu-latest
# needs: [test]
# steps:
# - name: Set up Rust
# uses: hecrj/setup-rust-action@v1
# - name: Checkout
# uses: actions/checkout@v2
# - name: Deploy
# if: env.AWS_ACCESS_KEY_ID && env.AWS_SECRET_ACCESS_KEY
# env:
# AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
# AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
# AWS_DEFAULT_REGION: eu-west-2
# run: |
# cd backend
# echo ${{ secrets.ENV_FILE }} > .env
# sudo apt-get install musl-tools
# export CC_x86_64_unknown_linux_musl=musl-gcc
# export CARGO_TARGET_X86_64_UNKNOWN_LINUX_MUSL_LINKER=musl-gcc
# rustup target add x86_64-unknown-linux-musl
# yarn install --frozen-lockfile
# yarn sls deploy --conceal --stage prod
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.DS_Store
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# TTR-guide

### Project available at: [ttr.guide](https://ttr.guide)

### Recorded demo available at: [https://youtu.be/5jQCix0P_fE](https://youtu.be/5jQCix0P_fE)

## Sunset Statement

**Sorry to everyone whose been using it, but as of October 2024 this is now sunset.**

Anyone who wants to use this is free to set it up on their own!

I am sunsetting this due to high costs of DynamoDb on AWS, I'd love to make another iteration of this project in the future, but for now it's not feasible.
It definitely needs lots of changes, first of all getting rid of DynamoDb, architecture redesign and drastically improving the code quality.

**_Thank you for understanding and sorry for the inconvenience!_**

#### DynamoDB Dump

As this project is sunset I made a dump of all collected data from the DynamoDB database.

It is available in the [`backend/dynamo-db-dump`](https://github.com/beeinger/TTR-guide/tree/develop/backend/dynamo-db-dump) folder.

## Abstract

The Tools and Technologies Research guide (TTR.guide) project aimed to provide a comprehensive guide to job market analysis and provide valuable insights for both end-users and developers by leveraging the power of natural language processing (NLP) techniques. This open-source project utilised the GPT-3.5 Turbo OpenAI API to extract tools and technologies from job postings. The project followed an Agile methodology, which allowed for continuous iteration and improvement, while careful consideration of ethical, legal, and social aspects related to data handling and user privacy were also prioritised.

The TTR platform encompasses a backend with an API and a frontend with dashboard visualisations and API documentation. The data collection component has so far amassed over 60,000 job posts, primarily focusing on programming and engineering jobs sourced from reed.co.uk API. The GPT model was used for efficient data processing, capable of processing diverse and unstructured data. The system successfully processed three bursts of data, each containing about 10,000 job posts per second.

The TTR system's adaptability and modularity are key strengths, facilitating scalability and potential expansion. The frontend features a landing page with SEO optimisations, metadata tags, and branding, an interactive API documentation page, and a search and statistics page that delivers valuable insights to users. The TTR project's open-source future is secured by licensing it under the GPL-3.0 licence, encouraging contributions from other developers and researchers while fostering growth and contributions within the broader open-source community.

The TTR project's success was due to the effectiveness of GPT-based NLP for data processing, the Agile methodology's adaptability, effective data handling and processing, scalability, open-source nature and best practices, and careful consideration of ethical, legal, and social aspects. The project provided valuable learning experiences, including NLP and GPT techniques, software development, architecture of systems and project management, and ethical, legal, and social considerations.

Future improvements to the TTR platform could include dataset expansion, enhancing NLP and GPT techniques, additional features, routine maintenance and optimization, and collaborations with job posting websites. In conclusion, the TTR project represents a significant achievement in providing a valuable and adaptable tool for job market analysis, with the experiences gained and lessons learned throughout its development serving as a strong foundation for future projects and endeavours.
5 changes: 5 additions & 0 deletions backend/dynamo-db-dump/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# DynamoDB Dump

As this project is sunset I made a dump of all collected data from the DynamoDB database.

Feel free to use it for your own projects or research.
29 changes: 29 additions & 0 deletions backend/dynamo-db-dump/count_items.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import json
import sys

# Check if the user provided a file path
if len(sys.argv) != 2:
print("Usage: python3 count_items.py <file_path>")
sys.exit(1)

# Get the file path from the command line argument
json_file_path = sys.argv[1]

# Load the JSON file
try:
with open(json_file_path, 'r') as json_file:
data = json.load(json_file)
except FileNotFoundError:
print(f"File not found: {json_file_path}")
sys.exit(1)
except json.JSONDecodeError:
print(f"Error decoding JSON from file: {json_file_path}")
sys.exit(1)

# Count the number of items in the JSON (assuming the 'Items' key holds the data)
if 'Items' in data:
item_count = len(data['Items'])
print(f'Total number of items in the JSON file: {item_count}')
else:
print("The JSON file does not contain an 'Items' key.")

3 changes: 3 additions & 0 deletions backend/dynamo-db-dump/dump-job-posts.json
Git LFS file not shown
Loading

0 comments on commit ab60952

Please sign in to comment.