-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data Export] shorten the oss uploading time in the beginning of every month #1306
Comments
@tyn1998 I think we had the discussion before and the solution is put the update time into meta data of each repo, like in the file https://oss.x-lab.info/open_digger/github/X-lab2017/open-digger/meta.json , there will be a field called |
Hi @frank-zsy, thanks for your reply. I know the existence of I noticed that I also assume that after the cron tasks are executed another set of scripts(not included in this repository) are run to upload the exported files to the aliyun oss. Could those scripts for uploading files be improved to shorten the uploading time? What is the bottleneck now? Computing or uploading? |
Understood, so I will elaborate the tasks here, there are several steps needed for the data update process.
So if we start all the task in the 11am the first day of a month, OpenRank data import, calculation and export may take about 2 hours, then metrics computation and network export may take 5 hours, and the data upload may also take 5-6 hours to complete. So if we can make all the process parallel and automated, the whole process may take about 12-13 hours to complete which is the midnight of the first day of the month. But right now, the process is not fully automated so the data may be updated about the 2nd day of a month like for 2023.5, the data is updated on today's morning. |
@frank-zsy Thanks for your detailed elaboration! This is the first time I have known the complete steps for exporting monthly data and I am convinced that the tasks are indeed time consuming. I recommend to write the steps mentioned above into |
Agreed, I will add the information into README file, and to improve the performance, I think several things can be done.
ossutilmac64 sync ~/github_data/open_digger/github oss://xlab-open-source/open_digger/github --force --job=1000 --meta "Expires:2023-07-01T22:00:00+08:00" --config-file=~/.ossutilconfig-xlab The script upload files in 1000 parallel thread and set |
Description
Hi community,
Is it possible to shorten the oss uploading time in the beggining of every month? Or could you choose a fixed day from a month and anounce it as a due date before which all data exporting tasks are completed?
This is so important for downstream apps who consume OpenDigger's valuable data.
The text was updated successfully, but these errors were encountered: