Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when disk full hit, return an error to clients stating disk full condition #24492

Open
jason-da-redpanda opened this issue Dec 9, 2024 · 2 comments
Labels
kind/enhance New feature or request

Comments

@jason-da-redpanda
Copy link

jason-da-redpanda commented Dec 9, 2024

Who is this for and what problem do they have today?

when storage_min_free_bytes is hit
we get :

  • this in redpanda logs
    rejecting produce request: no disk space; bytes free less than configurable threshold

  • metric redpanda_storage_disk_free_space_alert goes to degraded state

  • clients get a variety of errors/behaviours..... and customers (well at least 2 recently )

with franz/go you just hang....(will timeout)

have to turn on DEBUG to see

4:33:46.086 DEBUG wrote Produce v7 {"broker": "2", "bytes_written": 123, "write_wait": "26.571µs", "time_to_write": "23.094µs", "err": null}
14:33:46.087 DEBUG read Produce v7 {"broker": "2", "bytes_read": 62, "read_wait": "52.466µs", "time_to_read": "958.691µs", "err": null}
14:33:46.087 DEBUG retry batches processed {"wanted_metadata_update": true, "triggering_metadata_update": true, "should_backoff": false}
14:33:46.087 DEBUG produced {"broker": "2", "to": "jason-test[0{retrying@-1,1(BROKER_NOT_AVAILABLE: The broker is not available.)}]"}
14:33:46.087 INFO metadata update triggered {"why": "produce request had retry batches"}
 

txns/..
/app/CharityWorker/Kafka/TitanProducer.cs:line 87 at Confluent.Kafka.Impl.SafeKafkaHandle.CommitTransaction(Int32 millisecondsTimeout)at CharityWorker.Kafka.TitanProducer.BatchWrite(IEnumerable1 titanMessages) in /a....10:59:34.532  CharityWorker  ERROR  CharityWorker.Services.OutboxBackgroundService Error when sending outbox messages, error=Error when writing to kafka System.InvalidOperationException: Error when writing to kafka---> Confluent.Kafka.KafkaTxnRequiresAbortException: 1 message(s) timed out on geo_charity-charity_transaction_detail-itp1 [3]at Confluent.Kafka.Impl.SafeKafkaHandle.CommitTransaction(Int32 millisecondsTimeout)at

What are the success criteria?

In the disk full scenario clients see an error message similar to

cannot write to redpanda - disk full

Why is solving this problem impactful?

Customer can and have spent a fair bit of time troubleshooting not realising that it disk free issue.

Customers have specifically asked is there anything that could be changed in product to make the error more specific for clients...
e.g cannot write to redpanda - disk full type of thing

Additional notes

@jason-da-redpanda jason-da-redpanda added the kind/enhance New feature or request label Dec 9, 2024
@dotnwat
Copy link
Member

dotnwat commented Dec 9, 2024

hey @jason-da-redpanda is this is a tracking ticket for a customer/community user? if not, i think we should close and recreate in jira?

@michael-redpanda
Copy link
Contributor

michael-redpanda commented Dec 12, 2024

Jira automation was slightly broken. Here's the linked Jira Issue: CORE-8496

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhance New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants