Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create C library wrapper over existing Java library using GraalVM #196

Open
wants to merge 2 commits into
base: native-schema-registry-release
Choose a base branch
from

Conversation

blacktooth
Copy link
Contributor

@blacktooth blacktooth commented Jul 14, 2022

Issue #, if available:
#43

Description of changes:

Port the Glue Schema Registry Java library into multiple languages (C#, Python, NodeJS, Go, etc.). The support is added by compiling the existing Java library using GraalVM into a C shared library (.so, .dll) and building Swig bindings on top of it.

Design Choices

  1. Memory de/allocation - We use AWS SDK Common library for all explicit memory de/allocations. Memory of GraalVM / Java and target lang. are managed by those runtimes.
  2. Exceptions - In case of exceptions, NULL is returned and exception code / message are returned as a pointer holder to glue_schema_registry_error object. mutable_byte_array * encode(..., glue_schema_registry_error **p_err). Swig intercepts this error object and throws as an exception in the target language.
  3. Platforms - Java, C, target lang libraries are supported across Linux, OSX and Windows OSes on x86-64. ARM testing is pending.
  4. Testing - We make use of cmocka to test the C code. There is no good ways to test GraalVM code as we cannot mock network calls. Target lang tests will test this code as well. We will add integration tests that cover all the use-cases.
  5. Build - CMake builds work across all the platforms. We use Google Address and Memory sanitizers to detect memory leaks and problems during builds.
  6. Language Bindings - Swig generates all the language bindings code and CMAKE is used to compile them. The generated binaries are placed in corresponding language projects for those language build to use.

Open questions

  1. Logging - The C code logs to stderr. We have to figure out how to pipe to target language logs.
  2. Review usage of aws-c-common library.

Testing

Tested using unit-tests. Target language tests currently make network calls and need credentials to run. We will update our tests to use local docker based Glue service.

Coverage

Setup code coverage using gcov and build will fail if code coverage is less than 100.

Linting

clang-format and clang-tidy is setup to fail build if code formatting doesn't meet standards.

This is one of the initial PRs to cover the data types and header files. Implementation PRs will follow.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@blacktooth blacktooth force-pushed the native-schema-registry-release branch from 4964d6f to 939148f Compare July 26, 2022 01:03
@blacktooth blacktooth changed the title Add support for serialization for C and C# using GraalVM and Swig Create C library wrapper over existing Java library using GraalVM Jul 26, 2022
native-schema-registry/c/CMakeLists.txt Show resolved Hide resolved
#include "read_only_byte_array.h"
#include <stdbool.h>

typedef struct glue_schema_registry_deserializer {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking anything here, because it's 100% your call, but the typedef struct idiom goes off the rails when used too frequently. I tend to just use 'struct type' declarations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check how it plays with Swig and take a call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It plays well with Swig as Swig generates the class structures in target languages from typefdef structs, https://www.swig.org/Doc4.0/SWIGDocumentation.html#SWIG_nn32
Leaving it as is.

/**
* A mutable byte array that allows write / updating bytes in a fixed array of size `max_len`.
*/
typedef struct mutable_byte_array {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're already using aws-c-common, why not use aws_byte_cursor and aws_byte_buf?

cursor is read only slice, and byte_buf is mutable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I have modified the mutable_byte_array to wrap a aws_byte_buf so that I don't do buffer management myself. The outer structure is still useful to have as it fits our exception handling and binding generation model.

I wanted to do the same by wrapping aws_byte_cursor under read_only_byte_array but couldn't find a method that would return a pointer to it (I might be missing something). I could only find this method.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can return cursor.ptr in that case

#include <stdlib.h>
#include "glue_schema_registry_error.h"

typedef struct read_only_byte_array {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what in the type system is making this read only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention is to create a structure that doesn't manage the memory of underlying array and provides a way to read it.

native-schema-registry/c/include/read_only_byte_array.h Outdated Show resolved Hide resolved
* Uses the default memory allocator from AWS SDK Common library to allocate / free memory.
*/
void *aws_common_malloc(size_t size) {
struct aws_allocator *allocator = aws_default_allocator();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws_default_allocator doesn't change. So you could store it at file scope to save on function calls if you'd like. Though, to be fair, you're allocating memory so a single fn call is unlkely to be noticeable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggestion to create a static member? We don't have a global initializer yet, so couldn't find a way to initialize it once.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do it lazily which that function does under the hood anyways, or use AWS_thread_call_once

1. Mutable byte arrays use aws_byte_buf
2. clang-tidy and clang-format integration
Copy link

@JonathanHenson JonathanHenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above. I don’t have any blockers but I do recommend deciding on relevant to you questions before merging over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants