-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent flush on Appender triggers corrupt row #339
Comments
Hi @paulsputer, thanks for raising this. This is likely the same issue as duckdb/duckdb-rs#331 (comment). My guess is that this is not an |
This is connected to raising RandomEngine to use 64bit state, but initialization need to be done on full range otherwise entropy is limited when many repeated statement are executed. Test case provided by @taniabogatsch Fixes duckdb/duckdb-rs#331 Fixes marcboeker/go-duckdb#339
This is connected to raising RandomEngine to use 64bit state, but initialization need to be done on full range otherwise entropy is limited when many repeated statement are executed. Test case provided by @taniabogatsch Fixes duckdb/duckdb-rs#331 Fixes marcboeker/go-duckdb#339
This is connected to raising RandomEngine to use 64bit state, see duckdb#13920, but initialization need to be done on full range otherwise entropy is limited when many repeated statement are executed. Note that here we only solve the problem for RandomLocalState. A wider rework of the RandomEngine API migth be also handy. Test case provided by @taniabogatsch Fixes duckdb/duckdb-rs#331 Fixes marcboeker/go-duckdb#339 Note that impe
Thanks @taniabogatsch wasn't aware of that issue with Thought process being that uuids in this reproduction are generated in Go, so the SQL default clause should never be triggered. However, the fact that it is would imply the chunk is somehow null for the uuid and so triggering
To be sure the example is calling the Given the panic: unexpected error while flushing: database/sql/driver: could not flush appender: duckdb error: PRIMARY KEY or UNIQUE constraint violated: duplicate key "0": appended data has been invalidated due to corrupt row Note that in this updated reproduction no call is being made to append package main
import (
"context"
"database/sql/driver"
"fmt"
"sync"
"time"
"github.com/marcboeker/go-duckdb"
)
func main() {
fmt.Println("Start")
db := NewDuckDb()
db.RunEventLoop()
wg := &sync.WaitGroup{}
asyncAppend(db, 1, wg, 1)
asyncAppend(db, 2, wg, 5000)
wg.Wait()
fmt.Println("End")
}
func asyncAppend(db *DuckDbAppender, id int, wg *sync.WaitGroup, offset int) {
wg.Add(1)
go func() {
for i := 0; i < 1000; i++ {
db.Append(offset+i, time.Now(), fmt.Sprintf("row from routine %d iteration %d", id, i))
time.Sleep(time.Millisecond * 20)
}
wg.Done()
}()
}
type DuckDbAppender struct {
ddb *duckdb.Connector
con driver.Conn
app *duckdb.Appender
ticker *time.Ticker
}
func NewDuckDb() *DuckDbAppender {
dbFilepath := "test.duckdb"
tableName := "test"
schema := `CREATE TABLE IF NOT EXISTS test (
id INTEGER PRIMARY KEY,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
description varchar(255) NULL
);`
o := &DuckDbAppender{
ticker: time.NewTicker(time.Second * 5),
}
var err error
o.ddb, err = duckdb.NewConnector(fmt.Sprintf("%s?threads=1", dbFilepath), func(execer driver.ExecerContext) error {
_, err := execer.ExecContext(context.TODO(), schema, nil)
if err != nil {
panic(fmt.Errorf("unexpected error executing connection query: %w", err))
}
return nil
})
o.con, err = o.ddb.Connect(context.TODO())
if err != nil {
panic(fmt.Errorf("unexpected error while connecting: %w", err))
}
o.app, err = duckdb.NewAppenderFromConn(o.con, "", tableName)
if err != nil {
panic(fmt.Errorf("unexpected error while connecting: %w", err))
}
return o
}
func (o *DuckDbAppender) RunEventLoop() {
go func() {
for {
select {
case <-o.ticker.C:
{
err := o.app.Flush()
if err != nil {
panic(fmt.Errorf("unexpected error while flushing: %w", err))
}
}
}
}
}()
}
func (o *DuckDbAppender) Append(id int, date time.Time, desc string) error {
err := o.app.AppendRow(
id,
date,
desc,
)
return err
} |
This is connected to raising RandomEngine to use 64bit state, see #13920, but initialization need to be done on full range otherwise entropy is limited when many repeated statement are executed. Note that here we only solve the problem for RandomLocalState. A wider rework of the RandomEngine API would also be also handy, since currently it takes a int64_t that can be either invalid (-1) or valid (>=0), but out of scope here given interactions with sampling PRs make it a larger change. Test case by @taniabogatsch Fixes duckdb/duckdb-rs#331 Fixes marcboeker/go-duckdb#339
Oh yeah, I missed that! Looking at your updated example again, this is happening on the go-duckdb side, as the asyncAppend(db, 1, wg, 1)
asyncAppend(db, 2, wg, 5000) |
Using the
Appender
I can write from multiple go routines perfectly fine. However, when I set up a ticker to callFlush
at a regular interval it appears a race condition is created that materialises with the following error following error:database/sql/driver: could not flush appender: duckdb error: PRIMARY KEY or UNIQUE constraint violated: duplicate key "80000000-0000-0000-0000-000000000000": appended data has been invalidated due to corrupt row
My reason for calling
Flush
on a regular basis is to minimise the risk of data loss in the event of a process or system issue. Where it does occur I would like to be able to quantify the level of data loss ie. "30 seconds of data".the documentation mentions that concurrency works in a single process which this is:
If I surround all calls to
Append
andFlush
with a calls toLock
andUnlock
for async.Mutex
it operates fine. Though this would then seem cancel out the concurrent benefits. While in my use-case I can switch to a channel. It seems like something that could impact many other users. Digging into the C implementation of the Appender I can't seem to find any race protection there.Happy to look into contributing a PR to fix but keen to scope this out, is this a feature/use-case that simply needs a little extra documentation, or a bug where a fix maybe either a lower level use of a mutex or a
SafeFlush
function be added?Environment
Replicated on: iOS, Linux
Go-DuckDB: v1.8.3
DuckDB: v1.0.0 1f98600c2c
Replication
The following code sample can be used to replicate
The text was updated successfully, but these errors were encountered: