Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/automated staging publish and configure on PR #165

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

slesaad
Copy link
Member

@slesaad slesaad commented Aug 15, 2024

Changes

  • Adds a workflow that automatically publishes data from a dataset-config file to staging and creates a PR in veda-config with a minimal .data.mdx file representing the data.

Testing

See a test run in this ghgc-data PR - https://github.com/US-GHG-Center/ghgc-data/pull/45

Disclaimer

I haven't tested this in veda-data yet, but wanted to open this PR for visibility and discussion.
If anyone else wants to test it out, go for it!

Planned future work

We also want to automatically publish to production once everything in staging looks good.
So after review and approval, the merge to main would do the following:

  1. Automatically transfer the files to veda-data-store if transfer is set to True in the dataset config (work needs to be done in airflow dag to support this) before publishing items to STAC
  2. Update the PR in veda-config to revert the changes to .env file

Questions?

  • Since we have staging/ and production/ folders right now, we might need to figure out where the dataset-config folder is gonna go? ideally we'd avoid the need to create two PRs for staging and production ingest

@ciaransweet
Copy link

image

Never changes wherever I work 😆

Copy link

@ciaransweet ciaransweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More general questions than anything specific I'd change!

id: init-comment
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen a few of our other workflows define shell: bash - Worth doing for these run's?

Comment on lines +103 to +169
run: |
if [ -z "$WORKFLOWS_URL" ]; then
echo "WORKFLOWS_URL is not set"
exit 1
fi

if [ -z "$AUTH_TOKEN" ]; then
echo "AUTH_TOKEN is not set"
exit 1
fi

publish_url="${WORKFLOWS_URL%/}/dataset/publish"
bearer_token=$AUTH_TOKEN

# Track successful publications
all_failed=true
success_collections=()
status_message='### Collection Publication Status
'

for file in "${ALL_CHANGED_FILES[@]}"; do
echo $file
if [ -f "$file" ]; then
dataset_config=$(jq '.' "$file")
collection_id=$(jq -r '.collection' "$file")

response=$(curl -s -w "%{http_code}" -o response.txt -X POST "$publish_url" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-d "$dataset_config"
)

status_code=$(tail -n1 <<< "$response")

# Update status message based on response code
if [ "$status_code" -eq 200 ] || [ "$status_code" -eq 201 ]; then
echo "$collection_id successfully published ✅"
status_message+="- **$collection_id**: Successfully published ✅
"
success_collections+=("$file")
all_failed=false
else
echo "$collection_id failed to publish ❌"
status_message+="- **$collection_id**: Failed to publish ❌
"
fi
else
echo "File $file does not exist"
exit 1
fi
done

# Exit workflow if all the requests fail
if [ "$all_failed" = true ]; then
echo "All collections failed to publish."
exit 1
fi

# Output only successful collections to be used in subsequent steps
echo "success_collections=$(IFS=','; echo "${success_collections[*]}")" >> $GITHUB_OUTPUT

# Update PR comment
CURRENT_BODY=$(gh api -H "Authorization: token $GITHUB_TOKEN" /repos/${{ github.repository }}/issues/comments/$COMMENT_ID --jq '.body')
UPDATED_BODY="$CURRENT_BODY

$status_message"
gh api -X PATCH -H "Authorization: token $GITHUB_TOKEN" /repos/${{ github.repository }}/issues/comments/$COMMENT_ID -f body="$UPDATED_BODY"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question - Do we have a preference for scripting and what language we use? Would we prefer this in Python for example? It's pretty involved.

fi

# Output only successful collections to be used in subsequent steps
echo "success_collections=$(IFS=','; echo "${success_collections[*]}")" >> $GITHUB_OUTPUT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth also publishing the failed ones?

Saves someone having to find those manually?

Comment on lines +172 to +182
- name: Update PR comment for PR creation
if: success()
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
COMMENT_ID: ${{ steps.init-comment.outputs.COMMENT_ID }}
run: |
CURRENT_BODY=$(gh api -H "Authorization: token $GITHUB_TOKEN" /repos/${{ github.repository }}/issues/comments/$COMMENT_ID --jq '.body')
UPDATED_BODY="$CURRENT_BODY

**Creating a PR in veda-config...**"
gh api -X PATCH -H "Authorization: token $GITHUB_TOKEN" /repos/${{ github.repository }}/issues/comments/$COMMENT_ID -f body="$UPDATED_BODY"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong, but if this is just adding a note to say it's creating the PR to the comment in the step above, why a separate statement? Seems like this 4 extra run lines could go in there, not sure a separate step gives much extra value?

pip install -r scripts/requirements.txt
for file in "${PUBLISHED_COLLECTION_FILES[@]}"
do
python3 scripts/mdx.py "$file"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the script accept a list of files and do the iteration in that?

@@ -0,0 +1,5 @@
<Block>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what the purpose of this file is?

From what I can tell, we'll generate a file in the form:

<yaml populated from json>
<This files contents unchanged?>

Does this filler text get changed elsewhere?

@@ -0,0 +1,126 @@
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question again (sorry 😅) - Do we have a decision point on when we do type python code and not?

@anayeaye
Copy link
Contributor

Since we have staging/ and production/ folders right now, we might need to figure out where the dataset-config folder is gonna go? ideally we'd avoid the need to create two PRs for staging and production ingest

What about adding a publication PR template with a couple required fields (collection_id and publication stage). This would be enough information for us to select only the collection(s) of that is intended for publication (instead of diffing all of the jsons in a folder) and it would tell us whether this is intended for staging or production. That would let us choose between vars.STAGING_WORKFLOWS_URL and vars.PRODUCTION_WORKFLOWS_URL based on stage and infer the full path to the json input configuration ingestion-inputs/{stage}/dataset-config/{collection_id}.json

- type: textarea
  attributes:
    label: Collection ID
    description: What is the id of the collection you are publishing, i.e. openveda.cloud/collections/{collection_id}
    placeholder: "collection_id"
    value: collection_id
  validations:
    required: true
- type: dropdown
  attributes:
    label: Publication stage
    description: What stage do you want to publish this collection to
    multiple: false
    options:
      - staging (Default)
      - production
    default: 0
    value: publication_stage
  validations:
    required: true


# Update status message based on response code
if [ "$status_code" -eq 200 ] || [ "$status_code" -eq 201 ]; then
echo "$collection_id successfully published ✅"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@anayeaye
Copy link
Contributor

🤩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants