Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removal of .proto files from googleapis-common-protos broke us #13545

Open
1 task done
stewartmiles opened this issue Feb 21, 2025 · 10 comments · May be fixed by #13614
Open
1 task done

Removal of .proto files from googleapis-common-protos broke us #13545

stewartmiles opened this issue Feb 21, 2025 · 10 comments · May be fixed by #13614
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@stewartmiles
Copy link

Determine this is the right repository

  • I determined this is the correct repository in which to report this bug.

Summary of the issue

Context
23e0f96 removed the source .proto files for Google APIs. Since we use these protos in our project to build our own services with gRPC, this broke our build process.

Expected Behavior:
I would have expected the .proto files to have stayed in place or another pip package to be provided that published the source proto files for existing users.

Actual Behavior:
.proto files removed, build broken

API client name and version

No response

Reproduction steps: code

file: main.py

   def reproduce():
    # complete code here

Reproduction steps: supporting files

file: mydata.csv

alpha,1,3
beta,2,5

Reproduction steps: actual results

file: output.txtmydata.csv

Calculated: foo

Reproduction steps: expected results

file: output.txtmydata.csv

Calculated: bar

OS & version + platform

No response

Python environment

No response

Python dependencies

No response

Additional context

No response

@stewartmiles stewartmiles added triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Feb 21, 2025
@stewartmiles
Copy link
Author

I see that the source .proto files are in https://github.com/googleapis/googleapis however that repo is not tagged with the release information applied to https://pypi.org/project/googleapis-common-protos and this repository so it's not possible as far as I can tell to determine which revision of the https://github.com/googleapis/googleapis repo to pull to grab the source that corresponds to a specific version of googleapis-common-protos

@vchudnov-g vchudnov-g added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed triage me I really want to be triaged. labels Feb 21, 2025
@vchudnov-g
Copy link
Contributor

Thanks for reporting this issue! We will take a look and figure out how to best address it.

@micolous
Copy link

micolous commented Feb 25, 2025

I believe this is a duplicate of my previously-reported issue: #13526

@parthea
Copy link
Contributor

parthea commented Mar 2, 2025

Follow up on #13545 (comment), the links in the Github point to the specific version of googleapis used to generate the pb2 files. This does require 2 clicks though.

For example, see release 1.68.0

Image

A new field unversioned_package_disabled is added to message .google.api.PythonSettings (eb554e8)

Click on commit eb554e8 and you can find the specific commit of googleapis used.

Source-Link:
googleapis/googleapis@0a459af

Image

@parthea
Copy link
Contributor

parthea commented Mar 2, 2025

It's recommended to use googleapis/googleapis to retrieve the source protos, as that is the source of truth for the protos. The protos (*.proto) files themselves are not part of the Python library, and cannot be used in Python code. I'm going to close this issue but please feel free to open a new issue if you're blocked on this.

@parthea parthea closed this as completed Mar 2, 2025
@parthea parthea added type: question Request for information or clarification. Not an issue. and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Mar 2, 2025
@0x26res
Copy link

0x26res commented Mar 3, 2025

This does require 2 clicks thought.

@parthea I'm not sure this solution works for existing users of the library. A lot of us have been relying on the fact that the core API proto can be installed with pip, in order to use them when they call protoc.

Say I have a code base with .proto file that export common proto from google apis (eg timestamp).

syntax = "proto3";

package demo;

import "google/type/date.proto";
import "google/protobuf/timestamp.proto";

message Demo {
  google.protobuf.Timestamp timestamp = 1;
  google.type.Date trade_date = 5;
}

With the previous version of goobleapis-common-proto installed

 pip install "googleapis-common-protos<=1.67.0" grpcio-tools

I can simply build my proto from the command line:

python -m grpc_tools.protoc \
  --proto_path=$(python -c 'import pathlib;import grpc_tools;print(pathlib.Path(grpc_tools.__file__).parent.absolute() / "_proto")') \
  --proto_path=$(python -c 'import pathlib;import google;print(pathlib.Path(google.__path__[0]).parent)') \
  --proto_path=./ \
  --python_out=./ \
  demo.proto

This is no longer possible because half of the protos are missing. The suggested solution requires manually getting the proto from github, which isn't suitable for automated CI pipelines.

@parthea parthea reopened this Mar 3, 2025
@parthea
Copy link
Contributor

parthea commented Mar 3, 2025

In terms of adding this to CI, if you know the SHA of the commit in googleapis that you're looking for, you can simply have

wget https://raw.githubusercontent.com/googleapis/googleapis/<SHA>/<PATH TO PROTO>

For example,
wget https://raw.githubusercontent.com/googleapis/googleapis/9415ba048aa587b1b2df2b96fc00aa009c831597/google/api/client.proto

In https://github.com/googleapis/python-api-common-protos/pull/266/files, we were simply cloning googleapis/googleapis, so this is something that technically could be done in CI.

# Clone googleapis
googleapis_url = git.make_repo_clone_url(GOOGLEAPIS_REPO)
subprocess.run(["git", "clone", googleapis_url])

Alternatively, you can maintain your own copy of the protos by syncing with googleapis/googleapis as needed.

If we were to add the protos in google-cloud-python, we would be doing the same thing that is being proposed, which is to clone googleapis/googleapis or download the protos via SHA. There are a few reasons that it would be better to provide a link to the protos via the SHA, rather than duplicate them:

  • Reduce code duplication
  • Reduce unnecessary toil of having to re-review code which has already been reviewed in googleapis/googleapis
  • Reduce the possibility of shipping out of date or out of sync protos
  • Remove to need to have a separate pipeline simply for cloning protos (our existing pipeline doesn't have this functionality). A Github action could work if we identify a need to restore the protos.

@parthea parthea added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. and removed type: question Request for information or clarification. Not an issue. labels Mar 3, 2025
@micolous
Copy link

micolous commented Mar 4, 2025

@parthea You're proposing that clients use a project-specific side-channel (GitHub Releases API) to collect metadata associated with the googleapis-common-proto package that a Python package manager selected, and then go and fetch some arbitrary git commit. This means it is no longer possible to fetch a package from PyPI and have everything we need to build, because GitHub is now a build-time dependency.

This introduces a lot of toil to downstream users, and runs contrary to the repository's stated goal of providing "idiomatic Python".

Why having .protos in Python wheels is important

When protoc builds a _pb2.py module from a .proto that includes external imports, it generates code with (Python) import statements referencing that Python module directly. For protoc to generate correct code, this means the external .protos used by when building your module and the external packaged module need to be the same.

Library users got this effectively for free before – they could point protoc at their Python site-packages directory, and would always get whatever version of googleapis-common-protos the package resolver picked, and build fully functioning _pb2.py files from inside the Python package build process.

The only sources a library would need to consult was PyPI, so it could be run entirely offline using your favourite Python package manager's existing tooling.

The way around this is that every library user builds googleapis-common-protos for themselves from googleapis/googleapis .proto files, and overrides any other package's dependency on it (eg: via gRPC).

Side note: one limitation of this approach is that if you're shipping wheels of something that depends on googleapis-common-protos, it should ideally depend on the exact version of googleapis-common-protos used to build it. But pinning some version of googleapis-common-protos could cause a dependency hell for downstream users if they depend on it via some other path (eg: gRPC).

But if you're building in a workspace/monorepo-like environment or only distributing source packages, it's not a problem: the final "binary" always gets whatever version of googleapis-common-protos the package manager's resolver picked, and builds against that.

How Bazel solves this

Bazel solves this in a different way: you have a proto_library() target for a .proto schema, and then a py_proto_library() which depends proto_library() to generates the *_pb2.py files.

Your schema's proto_library() target is the only thing to have deps on external schemas, via their proto_library() target. Those dependencies are exposed via a ProtoInfo target.

py_proto_library() exposes a py_library() target and a PyProtoInfo target that links a proto_library() target to a py_proto_library()/py_library() target.

At build time, py_proto_library() looks at the deps of your proto_library(), and tries to find:

  • ProtoInfo targets generated by other proto_library() targets to add as dependencies for running protoc.
  • PyProtoInfo targets generated by other py_proto_library() targets to add as runtime dependencies of the final py_library() target.

In effect, Bazel has access to both .proto and _pb2.py files at all times.

The cost of an downstream user taking this approach is that they'd need to vendor in googleapis/googleapis and build it all themselves - they can no longer rely on PyPI or other standard Python package management tooling, so is no longer idiomatic.

Toil for googleapis-common-protos

You've raised a concern about the toil of adding .proto files back into the process, and the risk of skew.

My understanding of the new build process is that it's pulling .protos from googleapis/googleapis (which is in turn, based on a copybara transform of Google3), generating _pb2.py files with Bazel, then committing those files into git and pushing to GitHub as PRs.

The generated code is already not in a reviewable state, because it contains a bunch of binary FileDescriptorSets encoded as Python bytestrings. Ideally, you'd have some CI process that replicates the build process for any generated files, and closely scrutinise changes to code generation scripts.

The thing is that Bazel already has all the .proto files that are needed. All the pipeline needs is a genrule() to copy the .proto files to genfiles like the _pb2.py files are, and then commit them as well. There's probably nicer ways to do this.

Another way to do this without duplicating code or checking in generated code is to implement it with something like hatch-protobuf – that calls protoc as part of the Python package build process and bundles everything into wheels, but doesn't commit any of the _pb2.py or .proto files. This repository would then become a bunch of pyproject.toml files and a submodule reference. This would be would be more similar to how you'd use Protobuf with Bazel, but some Python developers aren't used to the idea of a module only containing generated code.

What is the goal?

If the goal of this repository is to provide idiomatic Python bindings, then it should be easy to use with the tooling that the rest of the Python community uses.

Compared to other Google APIs, google/{api,rpc,type}/*.proto are a special case, in that it's not specific to any GCP service. You'd normally use it in conjunction with another Protobuf schema (eg: annotations for Envoy's gRPC-JSON transcoder or grpc-gateway).

Asking downstream users to introduce build-time dependencies on GitHub that replicate parts of this repository's code generation pipeline when they depend on schemas published via this repository is far from idiomatic. 😉

From our perspective, .protos were removed in a minor release without so much as a release note, which is an exceptionally harsh deprecation strategy.

@parthea
Copy link
Contributor

parthea commented Mar 4, 2025

Thanks for the detailed responses. I agree with the sentiment discussed in this issue. Since *.proto files were shipped in versions prior to 1.67.0, the removal of the protos is technically a breaking change according to open source policies, including https://opensource.google/documentation/policies/library-breaking-change#breaking_change, because there was no documentation to state that usage of the shipped protos was not a supported use case for the library.

From https://opensource.google/documentation/policies/library-breaking-change#breaking_change,

A breaking change is a change to supported functionality between released versions of a library that would require a customer to do work in order to upgrade to the newer version. It is advised that library owners (or "language leads") document for customers what constitutes approved and supported usage of their libraries.

If/When googleapis/gapic-generator-python#2349 is merged and released, I will move forward with applying the necessary changes to BUILD.bazel files in googleapis as per https://gist.github.com/parthea/757f4839877f7509e542596a4f5d72d6. Once changes land in googleapis, we should see the protos restored as per the prototype PR: #13605

@parthea
Copy link
Contributor

parthea commented Mar 4, 2025

Waiting for googleapis/gapic-generator-python#2349 and #13606 to be merged. Done . (Googlers see cl/733376620 Done). (Googlers see cl/733498676 Done). See googleapis/googleapis@ada88fe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants