Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloader/Refresher Logic Overlap #290

Open
carlosparadis opened this issue Mar 28, 2024 · 3 comments
Open

Downloader/Refresher Logic Overlap #290

carlosparadis opened this issue Mar 28, 2024 · 3 comments
Assignees

Comments

@carlosparadis
Copy link
Member

I'd like you to give some thought on how much duplicated code you will end up with your refresher logic among Bugzilla, JIRA, and GitHub, and Mbox.

Could you, for example, create a "refresher_function" that can take a few parameters and handle Bugzilla, JIRA, and GitHub logic all in one? How much code are you duplicating by creating a bugzilla_refresh, jira_refresh, github_refresh, mbox_refresh? I'd expect you would still need them, but part of its logic could possibly be reused.

Consider breaking the refresh in conceptual steps:

  • Read the list of files in a directory (check the HADOOP anomaly case on the issue to see if it won't break this without additional information)
  • Locate the filename with latest timestamp
  • Return filepath

I would imagine all your downloader refreshers could use the same function capturing that logic rather than each having them. The unique behavior here lies in a) accessing the file to obtain the most current timestamp, and b) how to update the folder thereafter.

@anthonyjlau
Copy link
Collaborator

As we discussed on call, it is possible to do create an overarching refresher function that covers Bugzilla, JIRA, GitHub, and Mbox. However, the timestamp endpoints that each one uses is different. Mbox is specific up to a month, JIRA is specific to the minute, and GitHub and Bugzilla is specific to the second. This makes it difficult to combine because of the different formats that are needed. Therefore, we agreed that we will not create a general refresh function for this milestone. This function may be addressed at a later time.

@Ssunoo2
Copy link
Collaborator

Ssunoo2 commented Mar 30, 2024

Some of the overlap involved has already been made into a function. Notably the parse_jira_latest_date() function that iterates through filenames to return the filename that contains the latest date. Assuming that the naming convention is Project_..._[Unix time of earliest issue]_[Unix time of most recent issue].json, then this can be used across refreshers.

Beyond this, extracting the value of the latest date often differs between downloaders as the variable that contains the latest date may differ from project to project. For instance, the JIRA downloader uses a parameter called 'created' and the Github Downloader uses a parameter called 'created_at'. Nesting of these values may also differ.

@Ssunoo2
Copy link
Collaborator

Ssunoo2 commented Apr 20, 2024

General idea from our 4/19 notes is that if two functions use the same api call endpoint, you should make a separate function that calls the API, accepting an optional query parameter. The refresher and the download by date functions are essentially wrapper functions that construct the query and then call the function.

# Wrapper Function
Download_by_date(...,lower bound, upperbound,){
# Construct the query
If lowerbound and not upperbound
Construct query > lowerbound
If uppderbound and not lowerbound
Construct query <upperbound
If upperbound and lowerbound
Construct query >lowerbound & <upperbound
# Call the api function
Bugzilla_api_call_function(...,query,...)
}

# Wrapper Function
Refresher_function(...,filepath,...){
# Construct the query
Get the greatest date from the file path
Construct query >created_date
# Call the api function
bugzilla_api_call_function(...,query,...)
}

# API Call function
bugzilla_api_call_function(...,query,...){
If (query) {
		Append query to api call
}
Call API
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants