Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Cloud Tasks with Google App Engine: 14 UNAVAILABLE: read ECONNRESET #5852

Open
AllanOliveiraM opened this issue Dec 3, 2024 Discussed in #5799 · 7 comments
Open
Assignees

Comments

@AllanOliveiraM
Copy link

Discussed in #5799

Originally posted by AllanOliveiraM November 12, 2024
Hi everyone, how's it going?

I need some help with the Cloud Tasks client library. I have an application running on App Engine, and I use Cloud Tasks to queue the processing of some critical application data.

I'm using the createTask method to do this, and everything works normally most of the time. The problem is that occasionally, a task cannot be created, and I encounter a "14 UNAVAILABLE: read ECONNRESET" error in the logs.

I believe the issue is not with the API itself, but with my client library configuration.

I found this issue in the grpc-node GitHub repository, but I'm still having trouble configuring the client correctly.

If anyone could help me understand what I’m doing wrong, I’d be very grateful :)

image

I don’t think the error is here, but I’ll include it as context.

image
image

@sofisl
Copy link
Contributor

sofisl commented Jan 22, 2025

  • What version are you using of:
    • Node
    • tasks
    • gax
    • grpc-js?
  • Is there any opportunity to reuse an existing CloudTasksClient in your code instead of creating a new one (perhaps its entirely optimized, I don't know, but I wonder if it has anything to do with creating new clients).
  • Since you're using NestJS, are you sure you're not using grpc (the deprecated package)?
  • Did you enable logging?

Importantly, does this issue happen outside of using NestJs (can you recreate it just with AppEngine)? And lastly, can you provide a valid reproduction? Without it, I can't really tell it's a client libraries issue and wouldn't be able to help much.

@sofisl sofisl added the needs more info This issue needs more information from the customer to proceed. label Jan 22, 2025
@sofisl sofisl self-assigned this Jan 22, 2025
@BenjaminDish
Copy link

BenjaminDish commented Jan 31, 2025

I'm facing the exact same trouble than @AllanOliveiraM on my app.

I'm using this library to run Google cloud jobs. The calls are made every 5 minutes, triggered by a cron table, and >99% of them works well. But sometimes, randomly, I get this error.

The method I'm calling is :

import { JobsClient, protos } from '@google-cloud/run';
const runClient = new JobsClient();

// Construct request
const request: Partial<protos.google.cloud.run.v2.RunJobRequest> = {
  name: myJobName,
};

// Run request
await runClient.runJob(request);

And sometimes, (< 1% of the calls) I get this error :

[ExceptionsHandler] error: 14 UNAVAILABLE: read ECONNRESET

No more logs in my back-end. There are no footprints of this call on GCP side.

@AllanOliveiraM
Copy link
Author

I'm compiling better data for debugging and feedback shortly.

@github-actions github-actions bot removed the needs more info This issue needs more information from the customer to proceed. label Jan 31, 2025
@sofisl sofisl added the needs more info This issue needs more information from the customer to proceed. label Feb 5, 2025
Copy link
Contributor

This has been closed since a request for information has not been answered for 15 days. It can be reopened when the requested information is provided.

@AllanOliveiraM
Copy link
Author

More info

Node: v23.7.0

Used package versions:

  • @google-cloud/tasks: 5.5.1
  • google-gax: 4.4.1
  • @grpc/grpc-js: 1.9.9 (this version is internally resolved by Yarn through google-gax)

I'm already reusing the same client instance across the project. In NestJS, I'm using a singleton class where the client is instantiated only once.

I'm also sure that I'm not using the deprecated grpc package because I'm directly using the @google-cloud/tasks library, which depends on a recent version of google-gax.

I also use other Google libraries in the project, but I believe none of them should interfere with this:

{
    "@google-cloud/logging-winston": "^6.0.0",
    "@google-cloud/secret-manager": "^5.6.0",
    "@google-cloud/storage": "^7.14.0",
    "@google-cloud/tasks": "^5.5.1",
    "@google/generative-ai": "^0.21.0"
}

Other libraries installed directly in the project:

  • @grpc/grpc-js: ^1.10.10
  • @grpc/proto-loader: ^0.7.13
  • @nestjs/axios: ^3.0.1
  • @nestjs/cache-manager: ^2.1.0
  • @nestjs/common: ^10.2.3
  • @nestjs/config: ^3.0.1
  • @nestjs/core: ^10.2.3
  • @nestjs/microservices: ^10.3.10
  • @nestjs/platform-express: ^10.4.8
  • @nestjs/terminus: ^10.0.1
  • @nestjs/throttler: ^4.2.1

I also use gRPC directly in another service — this is a monorepo.
There are 4 apps in total, and the one experiencing the issue is the one that does not use @grpc/grpc-js directly. The affected app only uses @google-cloud/tasks.

I have already enabled logging, but the only log that appears is the error mentioned above.


After spending a lot of time trying to understand what’s happening, it seems to me that there might be some kind of connection timeout — specifically when running on AppEngine. However, I couldn’t find a way to configure reconnection options.

I came across this in another issue (the one I mentioned above). Please, take a look there. I found this configuration for gRPC, but I couldn’t figure out how to apply these settings when using @google-cloud/tasks:

const channelOptions: ChannelOptions = {
  // Send keepalive pings every 10 seconds, default is 2 hours.
  'grpc.keepalive_time_ms': 10 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

I wasn’t able to reproduce this issue outside the production environment at all.

@github-actions github-actions bot removed the needs more info This issue needs more information from the customer to proceed. label Mar 2, 2025
@AllanOliveiraM
Copy link
Author

@sofisl I can't reopen the issue 😑

@AllanOliveiraM
Copy link
Author

full code in my project:

import { HttpService } from '@nestjs/axios'
import {
  HttpStatus,
  Injectable,
  InternalServerErrorException,
  Logger,
} from '@nestjs/common'
import { ConfigService } from '@nestjs/config'

import { CloudTasksClient } from '@google-cloud/tasks'
import { captureException } from '@sentry/node'
import { AxiosError } from 'axios'
import { catchError, firstValueFrom } from 'rxjs'

import { GlobalEnvVariables } from '@app/common/constants/environment'
import { AvailableQueueDefinition } from '@app/common/definitions/queues'

@Injectable()
export class TaskManagerService {
  private readonly logger = new Logger(TaskManagerService.name)
  private cloudTasksClient: CloudTasksClient | null = null
  private googleCloudProjectId: string | null
  private googleCloudTasksLocation: string | null
  private isCloudTasksEnabled: boolean

  constructor(
    private readonly configService: ConfigService,
    private readonly httpService: HttpService
  ) {
    this.googleCloudProjectId =
      this.configService.get(GlobalEnvVariables.GOOGLE_CLOUD_PROJECT_ID) || null
    this.googleCloudTasksLocation =
      this.configService.get(GlobalEnvVariables.GOOGLE_CLOUD_TASKS_LOCATION) || null

    this.isCloudTasksEnabled = (() => {
      if (this.configService.getOrThrow(GlobalEnvVariables.NODE_ENV) !== 'production') {
        return false
      }

      if (!this.configService.getOrThrow(GlobalEnvVariables.GOOGLE_CLOUD_TASKS_ENABLED)) {
        return false
      }

      return true
    })()

    if (this.isCloudTasksEnabled) {
      this.cloudTasksClient = new CloudTasksClient()
    }
  }

  private async createGoogleCloudTask({
    queue,
    payload,
  }: {
    queue: AvailableQueueDefinition
    payload: Record<string, any>
  }) {
    if (
      [
        this.cloudTasksClient,
        this.googleCloudProjectId,
        this.googleCloudTasksLocation,
      ].some(val => !val)
    ) {
      const err = new InternalServerErrorException({
        statusCode: HttpStatus.INTERNAL_SERVER_ERROR,
        message: 'Google APIs not enabled. ERR: GCT',
      })

      captureException(err, {
        extra: {
          envs: [
            this.cloudTasksClient,
            this.googleCloudProjectId,
            this.googleCloudTasksLocation,
          ],
          queue,
        },
      })

      throw err
    }

    const parent = this.cloudTasksClient.queuePath(
      this.googleCloudProjectId,
      this.googleCloudTasksLocation,
      queue.queueName
    )

    const task = {
      appEngineHttpRequest: {
        headers: {
          'Content-Type': 'application/json',
        },
        httpMethod: 'POST' as const,
        relativeUri: queue.uri,
        body: Buffer.from(JSON.stringify(payload)).toString('base64'),
      },
    }

    try {
      await this.cloudTasksClient.createTask({
        parent,
        task,
      })
    } catch (error) {
      this.logger.error(error)

      captureException(error, {
        extra: {
          queue,
          payload,
        },
      })

      throw new InternalServerErrorException({
        statusCode: HttpStatus.INTERNAL_SERVER_ERROR,
        message: 'Task Manager failed to receive a task.',
      })
    }
  }

  async createTask({
    queue,
    payload,
  }: {
    queue: AvailableQueueDefinition
    payload: Record<string, any>
  }) {
    if (this.isCloudTasksEnabled) {
      await this.createGoogleCloudTask({
        queue,
        payload,
      })

      return
    }

    if (this.configService.getOrThrow(GlobalEnvVariables.NODE_ENV) === 'development') {
      await firstValueFrom(
        this.httpService
          .post(queue.uri, payload, {
            headers: {
              'x-appengine-queuename': queue.queueName,
            },
          })
          .pipe(
            catchError((error: AxiosError) => {
              this.logger.error('[DEV TASK MANAGER ERROR] ', error.response.data)

              throw new InternalServerErrorException({
                statusCode: HttpStatus.INTERNAL_SERVER_ERROR,
                message: 'Development task manager error.',
                _responseData: error.response.data,
              })
            })
          )
      )

      return
    }

    captureException(
      new InternalServerErrorException({
        statusCode: HttpStatus.INTERNAL_SERVER_ERROR,
        message: 'No Task Manager defined to receive tasks.',
      }),
      {
        extra: {
          queue,
          payload,
        },
      }
    )

    throw new InternalServerErrorException({
      statusCode: HttpStatus.INTERNAL_SERVER_ERROR,
      message: 'No Task Manager defined to receive tasks.',
    })
  }
}

@sofisl sofisl reopened this Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants