Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long living client #108

Closed
wants to merge 349 commits into from
Closed

Long living client #108

wants to merge 349 commits into from

Conversation

Dennitz
Copy link

@Dennitz Dennitz commented Sep 11, 2023

As requested in Discord, adding my draft implementation for a long-living client.

This branch is based off an older revision, e503e5b to be precise. So there are conflicts that would need to be resolved.

Also, the code for the chdb Python library is currently commented out, as I'm only using libchdb personally and wanted to avoid compile errors.

Please see this PR as a starting point or just as inspiration, and feel free to make any necessary changes that would be needed to get this integrated.


The main difference of my fork is that it allows creating a “chdb instance” (a LocalServer object), and reusing that for multiple queries.

The difference can probably be seen best based on the new interface:

chdb_init_result * chdb_connect(int argc, char ** argv);
void chdb_disconnect(ChdbLocalServerPtr obj);
chdb_local_result * chdb_query(ChdbLocalServerPtr obj, char * query, char * format);
void chdb_free_result(ChdbLocalServerPtr obj, chdb_local_result * result);

You'd call chdb_connect once which returns a pointer to a client. You'd then call chdb_query (each time followed by chdb_free_result) any number of times, passing in the pointer to that client. And at the end call chdb_disconnect.

I implemented this mainly because it means queries finish quite a bit faster in certain cases. But a nice side effect is that this also adds support for stateful queries like USE some_db.

How do queries get faster? I noticed that the cleanup function of LocalServer can take quite a bit of time. And this gets amplified with each additional created MergeTree table. In my tests (on M1 Mac), each (empty) MergeTree table adds about 0.15s of cleanup time, even when the table isn’t actually used in the respective query. This happens also in clickhouse-local, so it’s not specific to chdb.

The cleanup function normally gets executed after each query before returning results in chdb. In the fork it gets executed only when destroying the instance, i.e. when calling chdb_disconnect. So on each individual query, there is no need to wait for cleanup.

I guess this is essentially the same difference as between running clickhouse local --path . --query "select 1" (cleanup right after the query), and running the query in an interactive session started with clickhouse local --path . (cleanup only when killing the session).


This PR (unfortunately) also contains some other changes, as it would have been hard to separate those now.

The other changes are:

  • Add an error_message field to the result (a similar thing also seems to be implemented in Add has_error and error_message methods #105 now)
  • Scripts to build libchdb for Mac locally
    • Note that the new script also enables the postgres integration with -DENABLE_LIBPQXX=1, as I needed it in my project. I guess you might want to disable it though

lmangani and others added 17 commits August 15, 2023 03:51
* Backport #52643

Backporting changes from CH #52643

https://github.com/ClickHouse/ClickHouse/pull/52643/files

* Update positional options in Client

Backporting changes from CH #52643

https://github.com/ClickHouse/ClickHouse/pull/52643/files
Add legal disclaimer
(cherry picked from commit ccd5c3fa0c3bd9891089e3bcac0191886a18be48)
(cherry picked from commit fef0cd4695aa77280f13c030bb2a98b22f1702cb)
(cherry picked from commit f91fd51e199d814730bd70646a6461f23fa40be9)
(cherry picked from commit 3430ed8c527dc0163b701ca8ca810f57ca105ad7)
(cherry picked from commit 538f0b4018b927044c887c745af4bd08f460a5ef)
(cherry picked from commit 63f865693a2144e9b7a58a71b1b8f7bab0880fc7)
(cherry picked from commit 7ba2bc1639c9a1e75d4bfb72a940b98e04406e79)
@Dennitz
Copy link
Author

Dennitz commented Sep 11, 2023

Also in case it's useful, I use this in Rust through this binding:

use std::ffi::{CStr, CString};
use std::os::raw as libc;

use log::debug;

use crate::Error::GenericError;

#[repr(C)]
struct chdb_local_result {
    buf: *mut libc::c_char,
    len: usize,
    error_message: *mut libc::c_char,
}

#[repr(C)]
struct chdb_init_result
{
    local_server: *mut libc::c_void,
    error_message: *mut libc::c_char,
}

#[link(name = "chdb")]
extern "C" {
    fn chdb_free_result(local_server: *mut libc::c_void, result: *mut chdb_local_result);
    fn chdb_connect(argc: i32, argv: *const *const libc::c_char) -> *mut chdb_init_result;
    fn chdb_disconnect(local_server: *mut libc::c_void);
    fn chdb_query(
        local_server: *mut libc::c_void,
        query: *const libc::c_char,
        format: *const libc::c_char,
    ) -> *mut chdb_local_result;
}

pub struct RawChdbClient {
    local_server: *mut libc::c_void,
}


impl RawChdbClient {
    pub fn connect(path: &str) -> crate::Result<RawChdbClient> {
        let argv: [*const libc::c_char; 3] = [
            CString::new("clickhouse").unwrap().into_raw(),
            CString::new("--multiquery").unwrap().into_raw(),
            CString::new(format!("--path={}", path)).unwrap().into_raw(),
        ];

        let result = unsafe { chdb_connect(3, argv.as_ptr()) };

        unsafe {
            drop(CString::from_raw(argv[0] as *mut libc::c_char));
            drop(CString::from_raw(argv[1] as *mut libc::c_char));
            drop(CString::from_raw(argv[2] as *mut libc::c_char));
        }

        if result.is_null() {
            debug!("connect returned nullptr");
            return Err(GenericError("Could not connect to chdb".to_string()));
        }

        unsafe {
            if !(*result).error_message.is_null() {
                let error = CStr::from_ptr((*result).error_message).to_string_lossy().into_owned();
                debug!("result has error message: {}", error);
                return Err(GenericError(error));
            }
        }

        unsafe {
            if (*result).local_server.is_null() {
                return Err(GenericError("Could not connect to chdb, nullptr returned".to_string()));
            }
        }

        Ok(RawChdbClient { local_server: unsafe { (*result).local_server } })
    }

    pub fn query(&self, query: &str, format: &str) -> crate::Result<Option<Vec<u8>>> {
        debug!("Rust: starting chdb query process for query: {}", query);
        let query_c = CString::new(query).unwrap();
        let format_c = CString::new(format).unwrap();
        let result = unsafe {
            chdb_query(self.local_server, query_c.as_ptr(), format_c.as_ptr())
        };
        debug!("Rust: running query done.");

        if result.is_null() {
            debug!("result is null");
            return Ok(None);
        }

        unsafe {
            if !(*result).error_message.is_null() {
                let error = CStr::from_ptr((*result).error_message).to_string_lossy().into_owned();
                chdb_free_result(self.local_server, result);
                debug!("result has error messsage: {}", error);
                return Err(GenericError(error));
            }
        }

        unsafe {
            if (*result).buf.is_null() {
                chdb_free_result(self.local_server, result);
                return Err(GenericError("Unknown error occurred".to_string()));
            }
        }

        let slice = unsafe { std::slice::from_raw_parts((*result).buf as *const u8, (*result).len) };
        let output = slice.to_vec();

        unsafe { chdb_free_result(self.local_server, result) };

        Ok(Some(output))
    }

    pub fn disconnect(&self) {
        unsafe {
            chdb_disconnect(self.local_server)
        }
    }
}

Where crate::Result is a type wrapping a custom Error from my project:

#[derive(Debug, thiserror::Error)]
pub enum Error {
    #[error("{0}")]
    GenericError(String),
    // More errors here...
}

pub type Result<T> = std::result::Result<T, Error>;

@lmangani
Copy link
Contributor

@Dennitz thanks! 💪 This is useful. We'll most likely use this draft as a reference to reimplement the changes!
As of the rust part, any chance you could merge your binding into chdb-rust to make it a proper crate for libchdb?

@lmangani lmangani added the enhancement New feature or request label Sep 11, 2023
@Dennitz
Copy link
Author

Dennitz commented Sep 11, 2023

As of the rust part, any chance you could merge your binding into chdb-rust to make it a proper crate for libchdb?

Sure, once the other changes make it into chdb, I can do that.

@Dennitz
Copy link
Author

Dennitz commented Sep 11, 2023

One important thing to note: This implementation is not thread safe.

So all chdb_query and chdb_free_result calls and the chdb_disconnect call must be made from the same thread as the respective chdb_connect call.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 3 committers have signed the CLA.

❌ lmangani
❌ nmreadelf
❌ Dennitz
You have signed the CLA already but the status is still pending? Let us recheck it.

@auxten
Copy link
Member

auxten commented Dec 3, 2023

the main branch is rebased, will try to use interactive clickhouse-local to optimize chdb later

@auxten
Copy link
Member

auxten commented Nov 23, 2024

Done in #283

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants