-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
$ cd examples
$ cat >consumer.c <<EOF
#include <assert.h>
#include <stdio.h>
#include "rdkafka.h"
rd_kafka_resp_err_t callback(void *opaque) {
printf("callback called with %p\n", opaque);
return RD_KAFKA_RESP_ERR_NO_ERROR;
}
int main(int argc, char **argv) {
rd_kafka_conf_res_t err; /* librdkafka API error code */
char errstr[512]; /* librdkafka API error reporting buffer */
rd_kafka_conf_t *conf = rd_kafka_conf_new();
err = rd_kafka_conf_set(conf, "security.protocol", "ssl", errstr, sizeof(errstr));
assert(err == RD_KAFKA_CONF_OK);
err = rd_kafka_conf_set(conf, "ssl.ca.location", "/dev/null", errstr, sizeof(errstr));
assert(err == RD_KAFKA_CONF_OK);
rd_kafka_resp_err_t rk_err = rd_kafka_conf_interceptor_add_on_conf_destroy(conf, "testing", callback, NULL);
assert(rk_err == RD_KAFKA_RESP_ERR_NO_ERROR);
rd_kafka_t *rk = rd_kafka_new(RD_KAFKA_CONSUMER, conf, errstr, sizeof(errstr));
assert(rk == NULL);
fprintf(stderr, "%% Failed to create new consumer: %s\n", errstr);
// Since rk==NULL, the conf is still owned by us, and must be cleaned up to avoid leaks.
rd_kafka_conf_destroy(conf);
return 0;
}
EOF
$ make consumer
gcc -g -O2 -fPIC -Wall -Wsign-compare [...] -ldl -lpthread
$ ./consumer
% Failed to create new consumer: ssl.ca.location failed: No further error information available
Segmentation fault: 11
[...]
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
* frame #0: 0x0000000100094950 consumer`rd_kafka_interceptors_on_conf_destroy + 48
frame #1: 0x000000010002ac9f consumer`rd_kafka_conf_destroy + 15
frame #2: 0x00000001000030f9 consumer`main(argc=<unavailable>, argv=<unavailable>) at consumer.c:30:9 [opt]
frame #3: 0x00000001004b152e dyld`start + 462
The problem seems to be that rd_kafka_interceptors_on_conf_destroy is called twice: first (incorrectly) from the failing rd_kafka_new,
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: 0x0000000100094920 consumer`rd_kafka_interceptors_on_conf_destroy
frame #1: 0x000000010002aa3d consumer`rd_kafka_anyconf_destroy + 29
frame #2: 0x000000010000442b consumer`rd_kafka_destroy_final + 827
frame #3: 0x00000001000057a6 consumer`rd_kafka_new + 3062
frame #4: 0x00000001000030cd consumer`main(argc=<unavailable>, argv=<unavailable>) at consumer.c:25:26 [opt]
frame #5: 0x00000001004b152e dyld`start + 462
and then a second time (correctly) from rd_kafka_conf_destroy.
It appears that rd_kafka_new sometimes returns NULL without destroying the conf, and sometimes returns NULL after destroying the conf (i.e. any codepath that passes through goto fail). So the caller cannot know whether to clean up or not.
This is similar to the "should never happen" bug in #4100, but worse.
Checklist
- librdkafka version (release number or git tag): 0e4b551
- Apache Kafka version: doesn't matter, Kafka needn't be running
- librdkafka client configuration: as shown above
- Operating system: Mac OSX, but probably reproduces everywhere
- Critical issue
Metadata
Metadata
Assignees
Labels
No labels