You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Table: CREATE TABLE bench.insert (ClusterName String, Metric String, Time DateTime, Value Float32) ENGINE = MergeTree PARTITION BY (ClusterName, toYYYYMM(Time)) ORDER BY (ClusterName, Metric, Time)
The average time needed to insert the data is 211 ms. Compared to C++ (clickhouse-cpp) where same operation takes around 114 ms, it's really impressive.
However, putting datetime.datetime instead of int increases average time up to 267 ms.
I also tried to insert data by columns, but the script isn't finished. (data = {'ClusterName': [...], 'Metric': [...], ...}; ...; client.execute("INSERT INTO bench.insert (ClusterName, Time, Metric, Value) VALUES", data, columnar=True))
Additionally, I tried to insert dataframe. The result is 3.4 s. (df = pd.DataFrame(get_fake_data(200, 720), columns=['ClusterName', 'Time', 'Metric', 'Value']); ...; client.insert_dataframe("INSERT INTO bench.insert (ClusterName, Time, Metric, Value) VALUES", data))
I'm not an expert on ClickHouse benchmarks, but possibly we should add them (this article is not enough)? Probably, we can compare it with other engines.
The performance is really impressive. Thanks a lot!
I wrote the simple benchmark:
Table:
CREATE TABLE bench.insert (ClusterName String, Metric String, Time DateTime, Value Float32) ENGINE = MergeTree PARTITION BY (ClusterName, toYYYYMM(Time)) ORDER BY (ClusterName, Metric, Time)
The average time needed to insert the data is 211 ms. Compared to C++ (
clickhouse-cpp
) where same operation takes around 114 ms, it's really impressive.However, putting
datetime.datetime
instead ofint
increases average time up to 267 ms.I also tried to insert data by columns, but the script isn't finished. (
data = {'ClusterName': [...], 'Metric': [...], ...}; ...; client.execute("INSERT INTO bench.insert (ClusterName, Time, Metric, Value) VALUES", data, columnar=True)
)Additionally, I tried to insert dataframe. The result is 3.4 s. (
df = pd.DataFrame(get_fake_data(200, 720), columns=['ClusterName', 'Time', 'Metric', 'Value']); ...; client.insert_dataframe("INSERT INTO bench.insert (ClusterName, Time, Metric, Value) VALUES", data)
)I'm not an expert on ClickHouse benchmarks, but possibly we should add them (this article is not enough)? Probably, we can compare it with other engines.
A little bit about performance
I read from there that
The problem is that
struct.pack/struct.unpack
is slow compared toarray.array.tobytes
As I know, today
struct.pack/unpack
is not used, so there's no problem.The text was updated successfully, but these errors were encountered: