Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google_bigquery_job re-run after 6 months #9768

Open
onuruluag opened this issue Aug 11, 2021 · 9 comments
Open

google_bigquery_job re-run after 6 months #9768

onuruluag opened this issue Aug 11, 2021 · 9 comments
Labels
forward/linked persistent-bug Hard to diagnose or long lived bugs for which resolutions are more like feature work than bug work service/bigquery size/m
Milestone

Comments

@onuruluag
Copy link

onuruluag commented Aug 11, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v1.0.1
on linux_amd64

  • provider registry.terraform.io/hashicorp/archive v2.2.0
  • provider registry.terraform.io/hashicorp/external v2.1.0
  • provider registry.terraform.io/hashicorp/google v3.77.0
  • provider registry.terraform.io/hashicorp/google-beta v3.77.0
  • provider registry.terraform.io/hashicorp/null v3.1.0
  • provider registry.terraform.io/hashicorp/template v2.2.0

Your version of Terraform is out of date! The latest version
is 1.0.4. You can update by downloading from https://www.terraform.io/downloads.html

Affected Resource(s)

  • google_bigquery_job (load job)

Terraform Configuration Files

resource "google_bigquery_job" "util_tables_load_job" {
  for_each = local.load_job
  job_id   = format("iter2_util_table_load_%s_%s", trimsuffix(each.value, ".json"), base64encode(google_storage_bucket_object.util_files[each.value].md5hash))
  location = "EU"

  load {
    source_uris = [
      format("gs://%s/%s", google_storage_bucket.deployment_staging_bucket.name, google_storage_bucket_object.util_files[each.value].output_name)
    ]

    destination_table {
      project_id = terraform.workspace
      dataset_id = google_bigquery_dataset.util.dataset_id
      table_id   = trimsuffix(each.value, ".json")
    }

    source_format     = "NEWLINE_DELIMITED_JSON"
    write_disposition = "WRITE_TRUNCATE"

    autodetect = true
  }
}

Expected Behavior

Since the job already ran and there is no change in the JSON file, the BQ load job must not be re-created.

Actual Behavior

After 6 months, GCP deletes the BQ job history, and load jobs are re-created even though there is no change in JSON files.

Steps to Reproduce

  1. terraform apply
  2. confirm load job is created
  3. terraform apply
  4. observe job is not in the deployment plan
  5. wait for 6 months
  6. terraform apply
  7. deployment plan includes BQ load job

The issue is caused by the design of Terraform state management for google_bigquery_job load jobs and GCP job history lifetime (6 months).

b/371632037

@onuruluag onuruluag added the bug label Aug 11, 2021
@rileykarson
Copy link
Collaborator

From an- admittedly naive- provider's perspective, this is working as intended. Since the job in the API has disappeared, it should be recreated, and we can't distinguish between a nonexistent job and one that used to exist.

The only potential fix we could consider is to not check if the job is present in the API, but that would actually break import for the resource- we don't know in the Read function how it has been called, so if the resource was just imported Terraform would report success. Admittedly, this isn't a resource where there is much reason to import it.

@rileykarson rileykarson added persistent-bug Hard to diagnose or long lived bugs for which resolutions are more like feature work than bug work and removed bug labels Aug 13, 2021
@onuruluag
Copy link
Author

onuruluag commented Aug 14, 2021

I think storing the hash of the source as a state for the load jobs would be appropriate since it should only run the load job when the source file is changed. We tried to achieve this behavior by adding google_storage_bucket_object's md5hash value as part of job_id but we have failed when the history disappeared.
For a workaround, we may use null_resource and triggers, but this results not to use google_bigquery_job.

@rileykarson rileykarson added this to the Goals milestone Aug 16, 2021
helperbot-recidiviz pushed a commit to Recidiviz/pulse-data that referenced this issue Dec 1, 2021
Applying tf plan broke because of [this](hashicorp/terraform-provider-google#9768) outstanding tf issue. This is a temporary workaround until that's resolved.

GitOrigin-RevId: 6506c3cf5b1d2f3e3604693e835a32a95ef0f311
helperbot-recidiviz pushed a commit to Recidiviz/pulse-data that referenced this issue May 5, 2022
Applying tf plan broke because of [this](hashicorp/terraform-provider-google#9768) outstanding tf issue. This is a temporary workaround until that's resolved.

GitOrigin-RevId: a4ae4e02b30239aae971fe59c2d9c17a495f658a
@SunilPaiEmpire
Copy link

First terraform apply for a BigQuery job 183 days after it was created has caused the same issue for me but instead of re-creating, I get a 404 Not Found and the apply exits with a failure.

Referencing the response by @rileykarson

The only potential fix we could consider is to not check if the job is present in the API

I think this is appropriate given how the provider works natively. A BigQuery Job is created once and cannot be modified. If there is a change to a BigQuery Job resource, this will result in a new Job being created in BigQuery.

but that would actually break import for the resource- we don't know in the Read function how it has been called, so if the resource was just imported Terraform would report success

I think this is appropriate too. Importing a job, whether it exists in BigQuery or has been purged after the 180 period, should indicate to Terraform not to create the resource because it already exists/existed. Isn't this inline with other uses of import?

@wmcg
Copy link

wmcg commented Jan 25, 2023

I'm inclined to agree with @SunilPaiEmpire points above.

It is frustrating because google_bigquery_job could be a really nice way of managing DML changes - but i don't see how it is usable currently on its own?

We are considering a script we can run as part of our workflow that will check for and remove jobs over a certain age but it complicates what would otherwise be a very tidy process.

helperbot-recidiviz pushed a commit to Recidiviz/pulse-data that referenced this issue Apr 19, 2023
Applying tf plan broke because of [this](hashicorp/terraform-provider-google#9768) outstanding tf issue. This is a temporary workaround until that's resolved.

GitOrigin-RevId: 9a62d5d7ab6375d53578769a2a1b3ea3c1d7b624
helperbot-recidiviz pushed a commit to Recidiviz/pulse-data that referenced this issue Apr 19, 2023
…7736)

## Description of the change

Bump job_id in `google_bigquery_job` to `v5`, since it's been ~6 months
since the previous version bump (see issue for context).

## Type of change

> All pull requests must have at least one of the following labels
applied (otherwise the PR will fail):

| Label | Description |
|-----------------------------
|-----------------------------------------------------------------------------------------------------------
|
| Type: Bug | non-breaking change that fixes an issue |
| Type: Feature | non-breaking change that adds functionality |
| Type: Breaking Change | fix or feature that would cause existing
functionality to not work as expected |
| Type: Non-breaking refactor | change addresses some tech debt item or
prepares for a later change, but does not change functionality |
| Type: Configuration Change | adjusts configuration to achieve some end
related to functionality, development, performance, or security |
| Type: Dependency Upgrade | upgrades a project dependency - these
changes are not included in release notes |

## Related issues

hashicorp/terraform-provider-google#9768

## Checklists

### Development

**This box MUST be checked by the submitter prior to merging**:
- [x] **Double- and triple-checked that there is no Personally
Identifiable Information (PII) being mistakenly added in this pull
request**

These boxes should be checked by the submitter prior to merging:
- [ ] Tests have been written to cover the code changed/added as part of
this pull request

### Code review

These boxes should be checked by reviewers prior to merging:

- [x] This pull request has a descriptive title and information useful
to a reviewer
- [x] This pull request has been moved out of a Draft state, has no
"Work In Progress" label, and has assigned reviewers
- [x] Potential security implications or infrastructural changes have
been considered, if relevant

GitOrigin-RevId: e8b29bdd30cdd29a8ce07c495693f3158deaf003
@github-actions github-actions bot added service/bigquery forward/review In review; remove label to forward labels Aug 17, 2023
modular-magician added a commit to modular-magician/terraform-provider-google that referenced this issue Jan 8, 2024
…corp#9768)

Co-authored-by: Shivang Dixit <[email protected]>
[upstream:3e3c54bbd42010a2bf2eb821d2ec397e43a189a1]

Signed-off-by: Modular Magician <[email protected]>
modular-magician added a commit that referenced this issue Jan 8, 2024
#16928)

[upstream:3e3c54bbd42010a2bf2eb821d2ec397e43a189a1]

Signed-off-by: Modular Magician <[email protected]>
@wj-chen
Copy link

wj-chen commented Oct 4, 2024

Revisiting this as the same issue was recently brought up again by another user.

@rileykarson Could you forward it to our internal issue tracker too?

Since the jobs are not persistent past 6 months on the server side, we'll need to think of something on the provider level. Could you elaborate on what "not check if the job is present in the API" is? Is it that we don't read the live state at all as we would never attempt to re-create any BigQuery job in Terraform?

@rileykarson rileykarson removed the forward/review In review; remove label to forward label Oct 4, 2024
@rileykarson
Copy link
Collaborator

rileykarson commented Oct 4, 2024

Yeah- instead of clearing the resource id on a 404 error (this code) we'd and preserve the old state of the resource by immediately exiting Read w/o an error.

We'd have to figure out what updates to a purged job definition mean as part of that, I think. We could track purged jobs with a boolean clientside, and then use a CustomizeDiff to upgrade update plans to deletions, or add a guard to Update so that updates are no-ops.

I think for the import case I was thinking about, we could only allow importing valid jobs by tracking a flag clientside for that- we'd start it false and set it to true after a successful read. Since Import is ultimately a Read call, we'd return an error on a 404 where that flag is still false. Or just take the "import of nonexisting jobs always succeeds" approach from #9768 (comment).

@wj-chen
Copy link

wj-chen commented Oct 16, 2024

Yeah- instead of clearing the resource id on a 404 error (this code) we'd and preserve the old state of the resource by immediately exiting Read w/o an error.

We'd have to figure out what updates to a purged job definition mean as part of that, I think. We could track purged jobs with a boolean clientside, and then use a CustomizeDiff to upgrade update plans to deletions, or add a guard to Update so that updates are no-ops.

I think for the import case I was thinking about, we could only allow importing valid jobs by tracking a flag clientside for that- we'd start it false and set it to true after a successful read. Since Import is ultimately a Read call, we'd return an error on a 404 where that flag is still false. Or just take the "import of nonexisting jobs always succeeds" approach from #9768 (comment).

Thanks again for the details. I'm trying to see if we can simplify the handling by just preserving any existing state without having to rely on additional client-side field. Can you confirm that if we return nil on 404 from Get API response inside JobRead(), the job resource in the state file will remain unchanged, and that will be used for comparison with the user's config file i.e. diff detection?

  • For first-time Create, no change - we save the resource to state if creation is successful and leave it empty if not
  • For Read, if it's a 404 from Get API, print a message indicating the job might have been purged
  • Rerunning a plan with no change after 6 months - because of the no-op on a 404 from Get so there should be no diff or any attempt to recreate, which should solve this issue here
  • For Update - First of all I don't think we support updating jobs today or in the foreseeable future. There is no corresponding API, and the current TF implementation seems to be a no-op (the comment mentions something about labels, but I'm just not seeing that in the code...). We might even just want to throw an error saying we don't support updates at all, for jobs that still exist or not.
  • For Delete - As it stands we only remove the resource from the state file, this behavior can stay without change
  • For Import - I think it's like a "first-time Create" and "Rerunning a plan with no change after 6 months" above - if the job has been purged, we don't try to re-import, basically the "always succeeds" behavior.

@rileykarson Could you correct any misunderstanding above? What do you think of the overall plan?

@wmcg
Copy link

wmcg commented Dec 12, 2024

I wonder if ephemeral resouces might be a good fit for this? https://developer.hashicorp.com/terraform/language/resources/ephemeral

@rileykarson
Copy link
Collaborator

@wj-chen sorry I missed that when you sent it! Looks reasonable, with minor notes below.

Can you confirm that if we return nil on 404 from Get API response inside JobRead(), the job resource in the state file will remain unchanged, and that will be used for comparison with the user's config file i.e. diff detection?

Yes

For Read, if it's a 404 from Get API, print a message indicating the job might have been purged

We can't readily print arbitrary messages unfortunately- we could update a state value on the resource, but it's hard to display that to an end user without triggering a diff.


@wmcg:

I wonder if ephemeral resouces might be a good fit for this?

Ephemeral resources are more akin to a secret value than a job resource! They're ephemeral in the sense that they don't get permanently recorded in state and only exist for the length of terraform apply. Consider an access token source, for example- access tokens are short-lived and secret, and a Terraform resource or datasource prior to ephemeral resources would record a secret/soon stale value in state permanently.

I haven't worked directly with them yet and am not a HashiCorp employee, but the mental model I've used based on what I've read is "temporary datasource" more than anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forward/linked persistent-bug Hard to diagnose or long lived bugs for which resolutions are more like feature work than bug work service/bigquery size/m
Projects
None yet
Development

No branches or pull requests

6 participants