An personal and educational tool designed to scrape job vacancy data from the SSCASN API, which powers the SSCASN portal. The program fetches data, processes it, and exports the results into an Excel file .xlsx
stored in the data
directory. It leverages concurrent processing with goroutines, allowing up to 10 requests per second for efficient data retrieval.
Note
This project is my first learning experience to explore Go and concurrency features.
- Concurrent Data Fetching: Utilizes goroutines to handle multiple requests simultaneously, improving performance.
- Rate Limiting: Ensures compliance with API request limits to prevent server overload.
- Excel Export: Converts and saves the fetched data into a structured Excel file for easy analysis.
- Customizable Filters: Allows filtering by location (province) to tailor the data to specific needs.
- Flexible API Querying: Supports additional parameters such as
pengadaanKd
(type of procurement) andinstansiId
(institution) for more targeted data retrieval. - Enhanced Data Output: Includes additional fields like
jumlah_ms
(count of passed verification) and also detail information of each institution in the exported Excel file.
Before running the SSCASN Scraper, ensure you have Go installed and setup on your system.
- Clone the repository:
git clone https://github.com/rizkyilhampra/sscasn-scraper.git
- Navigate to the project directory:
cd sscasn-scraper
- Install the required dependencies:
go mod tidy
- Run the program with specific parameters. For example, to fetch data for "S1 Pendidikan Keagamaan Katolik":
go run main.go -kodeRefPend=5102656 -namaJurusan="S1 Pendidikan Keagamaan Katolik"
- Optionally, filter results by province using the -provinsi flag:
go run main.go -kodeRefPend=5102656 -namaJurusan="S1 Pendidikan Keagamaan Katolik" -provinsi="Jawa Tengah"
The program supports the following command-line arguments:
-kodeRefPend
: (Required) Code reference for education.-namaJurusan
: (Required) Name of the major or field of study ofkodeRefPend
. It's stand for label or title uses for generate Excel filename.-provinsi
: (Optional) Filter results by province. Example:-provinsi="Jawa Timur"
.-pengadaanKd
: (Optional) Procurement code. Default is 2.-instansiId
: (Optional) Institution ID. Example:-instansiId="A5EB03E23AFBF6A0E040640A040252AD"
for "Kementerian Lingkungan Hidup dan Kehutanan".
Example usage with all parameters:
go run main.go -kodeRefPend=5102656 -namaJurusan="S1 Pendidikan Keagamaan Katolik" -provinsi="Jawa Tengah" -pengadaanKd=2 -instansiId="A5EB03E23AFBF6A0E040640A040252AD"
To use this scraper effectively, you'll need to obtain several parameters from the SSCASN website. Here's a general guide on how to find these parameters:
- Open your browser and navigate to https://sscasn.bkn.go.id/.
- Access the network tab in the browser's developer tools (usually accessible by pressing F12 or right-clicking and selecting "Inspect").
- In the SSCASN website, perform a search for the desired major or use the filters available on the site.
- In the network tab, look for requests to
api-sscasn.bkn.go.id
. These requests contain the parameters we need. - Examine the request URL and query parameters. You'll typically find:
kode_ref_pend
: The education reference codepengadaan_kd
: The procurement codeinstansi_id
: The institution ID (if you're filtering by a specific institution)
- Copy these values for use in the program.
Example of what you might see in the network tab:
https://api-sscasn.bkn.go.id/2024/portal/spf?kode_ref_pend=4480271&instansi_id=A5EB03E23AFBF6A0E040640A040252AD&pengadaan_kd=2&offset=0
In this URL:
kode_ref_pend=4480271
instansi_id=A5EB03E23AFBF6A0E040640A040252AD
pengadaan_kd=2
You can use these values with the corresponding flags when running the scraper.
Note
The exact process might vary slightly depending on how you interact with the SSCASN website. Always ensure you're using the most recent and relevant parameters for your search.