-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/automated staging publish and configure on PR #165
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More general questions than anything specific I'd change!
id: init-comment | ||
env: | ||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
run: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen a few of our other workflows define shell: bash
- Worth doing for these run
's?
run: | | ||
if [ -z "$WORKFLOWS_URL" ]; then | ||
echo "WORKFLOWS_URL is not set" | ||
exit 1 | ||
fi | ||
|
||
if [ -z "$AUTH_TOKEN" ]; then | ||
echo "AUTH_TOKEN is not set" | ||
exit 1 | ||
fi | ||
|
||
publish_url="${WORKFLOWS_URL%/}/dataset/publish" | ||
bearer_token=$AUTH_TOKEN | ||
|
||
# Track successful publications | ||
all_failed=true | ||
success_collections=() | ||
status_message='### Collection Publication Status | ||
' | ||
|
||
for file in "${ALL_CHANGED_FILES[@]}"; do | ||
echo $file | ||
if [ -f "$file" ]; then | ||
dataset_config=$(jq '.' "$file") | ||
collection_id=$(jq -r '.collection' "$file") | ||
|
||
response=$(curl -s -w "%{http_code}" -o response.txt -X POST "$publish_url" \ | ||
-H "Content-Type: application/json" \ | ||
-H "Authorization: Bearer $AUTH_TOKEN" \ | ||
-d "$dataset_config" | ||
) | ||
|
||
status_code=$(tail -n1 <<< "$response") | ||
|
||
# Update status message based on response code | ||
if [ "$status_code" -eq 200 ] || [ "$status_code" -eq 201 ]; then | ||
echo "$collection_id successfully published ✅" | ||
status_message+="- **$collection_id**: Successfully published ✅ | ||
" | ||
success_collections+=("$file") | ||
all_failed=false | ||
else | ||
echo "$collection_id failed to publish ❌" | ||
status_message+="- **$collection_id**: Failed to publish ❌ | ||
" | ||
fi | ||
else | ||
echo "File $file does not exist" | ||
exit 1 | ||
fi | ||
done | ||
|
||
# Exit workflow if all the requests fail | ||
if [ "$all_failed" = true ]; then | ||
echo "All collections failed to publish." | ||
exit 1 | ||
fi | ||
|
||
# Output only successful collections to be used in subsequent steps | ||
echo "success_collections=$(IFS=','; echo "${success_collections[*]}")" >> $GITHUB_OUTPUT | ||
|
||
# Update PR comment | ||
CURRENT_BODY=$(gh api -H "Authorization: token $GITHUB_TOKEN" /repos/${{ github.repository }}/issues/comments/$COMMENT_ID --jq '.body') | ||
UPDATED_BODY="$CURRENT_BODY | ||
|
||
$status_message" | ||
gh api -X PATCH -H "Authorization: token $GITHUB_TOKEN" /repos/${{ github.repository }}/issues/comments/$COMMENT_ID -f body="$UPDATED_BODY" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question - Do we have a preference for scripting and what language we use? Would we prefer this in Python for example? It's pretty involved.
fi | ||
|
||
# Output only successful collections to be used in subsequent steps | ||
echo "success_collections=$(IFS=','; echo "${success_collections[*]}")" >> $GITHUB_OUTPUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth also publishing the failed ones?
Saves someone having to find those manually?
- name: Update PR comment for PR creation | ||
if: success() | ||
env: | ||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
COMMENT_ID: ${{ steps.init-comment.outputs.COMMENT_ID }} | ||
run: | | ||
CURRENT_BODY=$(gh api -H "Authorization: token $GITHUB_TOKEN" /repos/${{ github.repository }}/issues/comments/$COMMENT_ID --jq '.body') | ||
UPDATED_BODY="$CURRENT_BODY | ||
|
||
**Creating a PR in veda-config...**" | ||
gh api -X PATCH -H "Authorization: token $GITHUB_TOKEN" /repos/${{ github.repository }}/issues/comments/$COMMENT_ID -f body="$UPDATED_BODY" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong, but if this is just adding a note to say it's creating the PR to the comment in the step above, why a separate statement? Seems like this 4 extra run lines could go in there, not sure a separate step gives much extra value?
pip install -r scripts/requirements.txt | ||
for file in "${PUBLISHED_COLLECTION_FILES[@]}" | ||
do | ||
python3 scripts/mdx.py "$file" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the script accept a list of files and do the iteration in that?
@@ -0,0 +1,5 @@ | |||
<Block> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what the purpose of this file is?
From what I can tell, we'll generate a file in the form:
<yaml populated from json>
<This files contents unchanged?>
Does this filler text get changed elsewhere?
@@ -0,0 +1,126 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question again (sorry 😅) - Do we have a decision point on when we do type python code and not?
What about adding a publication PR template with a couple required fields (collection_id and publication stage). This would be enough information for us to select only the collection(s) of that is intended for publication (instead of diffing all of the jsons in a folder) and it would tell us whether this is intended for staging or production. That would let us choose between
|
|
||
# Update status message based on response code | ||
if [ "$status_code" -eq 200 ] || [ "$status_code" -eq 201 ]; then | ||
echo "$collection_id successfully published ✅" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
🤩 |
Changes
Testing
See a test run in this ghgc-data PR - https://github.com/US-GHG-Center/ghgc-data/pull/45
Disclaimer
I haven't tested this in
veda-data
yet, but wanted to open this PR for visibility and discussion.If anyone else wants to test it out, go for it!
Planned future work
We also want to automatically publish to production once everything in staging looks good.
So after review and approval, the merge to
main
would do the following:veda-data-store
iftransfer
is set toTrue
in the dataset config (work needs to be done in airflow dag to support this) before publishing items to STACveda-config
to revert the changes to.env
fileQuestions?
staging/
andproduction/
folders right now, we might need to figure out where thedataset-config
folder is gonna go? ideally we'd avoid the need to create two PRs forstaging
andproduction
ingest