-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use flat array for field indexing instead of hashmap #247
base: main
Are you sure you want to change the base?
Conversation
Luacheck Report3 tests 3 ✅ 0s ⏱️ Results for commit 0600b83. ♻️ This comment has been updated with latest results. |
@@ -25,38 +24,73 @@ impl Default for Match { | |||
} | |||
} | |||
|
|||
pub struct Context<'a> { | |||
schema: &'a Schema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema
is no more needed here, since Schema
uses field name as hash key and Context
no longer uses field name to add value, but only index in Router
's fields array
schema: &'a Schema, | ||
values: FnvHashMap<String, Vec<Value>>, | ||
pub struct Context { | ||
values: Vec<Option<Vec<Value>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uses values
array to store all field values, values
index match the index of Router
's fields
array
src/router.rs
Outdated
@@ -13,15 +13,17 @@ struct MatcherKey(usize, Uuid); | |||
pub struct Router<'a> { | |||
schema: &'a Schema, | |||
matchers: BTreeMap<MatcherKey, Expression>, | |||
pub fields: HashMap<String, usize>, | |||
pub fields: Vec<(String, usize)>, // fileds array of tuple(name, count) | |||
pub fields_map: HashMap<String, usize>, // field name -> index map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fields
is an flat array to store all field names and corresponding counts.
Here we uses an hash map fields_map
to store the field's index in fields
array, fields_map
is not used in runtime, but only for router matcher add and delete.
src/router.rs
Outdated
@@ -13,15 +13,17 @@ struct MatcherKey(usize, Uuid); | |||
pub struct Router<'a> { | |||
schema: &'a Schema, | |||
matchers: BTreeMap<MatcherKey, Expression>, | |||
pub fields: HashMap<String, usize>, | |||
pub fields: Vec<(String, usize)>, // fileds array of tuple(name, count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fileds
, just a typo.
lib/resty/router/context.lua
Outdated
}, _MT) | ||
|
||
return c | ||
end | ||
|
||
|
||
function _M:add_value(field, value) | ||
function _M:add_value(index, field, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the caller has to maintain a map from field_name
to index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious about this too. Could we use something like HashMap<String, usize>
from Route
to do the internal field name to index map instead of exposing the internal index to the outside world? But this sounds like making the whole improvement meaningless.
Currently, I am aware that We got three primary different components of ATC-Router, which are Route
, Schema
, and Context
. Route
is created from Schema
and performs execution on Context
, in a relatively loose manner, Context
is related with Schema
but not with Route
.
Is this necessary to not create Context
from Route
but from nothing? The former solution will allow us to integrate the index onto Context
and will allow us to use an updated Schema::fields
, like Route::fields
but with an index inside the map. Since the HashMap lookup is not avoided inside self.schema:get_field_type(field)
, I think it is better to get the index from it as well.
I do not fully understand the ATC-Router, correct me if that's impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller uses router:get_fields
to maintain an array of fields, here is the usage in kong:
function _M:fields_visitor(params, ctx, cb, cb_arg)
for idx, field in ipairs(self.fields) do
local value = self:get_value(field, params, ctx)
-- add_value invoked in the callback
local res, err = cb(field, value, cb_arg, idx)
if not res then
return nil, err
end
end -- for fields
return true
end
This is a loose couple between Context::values
and Router::fields
, only using index to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary to not create
Context
fromRoute
but from nothing? The former solution will allow us to integrate the index ontoContext
and will allow us to use an updatedSchema::fields
, likeRoute::fields
but with an index inside the map. Since the HashMap lookup is not avoided insideself.schema:get_field_type(field)
, I think it is better to get the index from it as well.
Yes the previous Context
implementation keeps Schema
, which is used for type validation. But noticed that Schema::fields
is different from Router::fields
, Router::fields
and exactly used field in configured router rules, like ['http.path', 'http.header.foo', 'http.header.bar']
, but Schema::fields
maintains fixed schema of field, which like: {'http.path': 'string', 'http.headers.*': 'string', 'net.src.port': 'ipaddr'}
, so there are not one-one mapping between router fields and schemas.
in previous version, we keep schema in Context
to do the field type validation, and for now, since we don't pass field name into Context
anymore, the schema is useless now. However, We still check the field type with schema in the Lua side, before invoking add_value
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a loose couple between Context::values and Router::fields, only using index to match.
As the end developer, I don't want to maintain this map by myself, this should be done inside the crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a loose couple between Context::values and Router::fields, only using index to match.
As the end developer, I don't want to maintain this map by myself, this should be done inside the crate.
Sure, if we consider the ease of use, and robustness of this module, it's better to maintain the router fields inside the Context
, assume that we have an fields
(map of name: index) inside the context, and we provide the API to end user like add_value(field, value)
, then the implementation logic maybe like:
function _M:add_value(field_name, value)
local index = self.fields[field_name] -- an hash find occurs here.
...
clib.context_add_value(index, ...)
And what this KAG tries to do is to avoid unnecessary hash find to improve performance, which is inspired by the implementation of nginx's variable indexed fields mechanism.
if we map to nginx, the Context
is basically like r->variables
, which stores a flat array of variable values(without storing the variable name to value, or name to index mapping inside). what r->variables
only knows, is it can use index to get/set variable from an global structure cmcf
(which likes the role Router::fields
).
Context
and r->variables
are more likely to represent some runtime state which changes frequently, we want to transfer and access them as fast as possible(and as smaller as we can). That makes them looks not so "easy to use", but I think that's a trade-off which worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we can store a reference of used Schema::fields
along with those actual index values used in Route
to some field inside Context
and invoke HashMap search here, to get the index and type both?
fields_visitor
invokes add_value
eventually which means the HashMap lookup still happens. Is there any chance to avoid exposing the index by linking the Context
and Route
together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we can store a reference of used
Schema::fields
along with those actual index values used inRoute
to some field insideContext
and invoke HashMap search here, to get the index and type both?
fields_visitor
invokesadd_value
eventually which means the HashMap lookup still happens. Is there any chance to avoid exposing the index by linking theContext
andRoute
together?
The previous implementation has several hashmap finds:
Context::add_value()
:- find
schema
to validate field type by field name(which I think is unnecessary because we already check in lua side) - insert value into
self.values
hashmap by field name
- find
Expression::execute()
:- find
context.values
bylhs
's field name(which is the critical path that will be executed a lot)
- find
I think our initial is to avoid such "heavy" computation as much as possible, that's why I remove hashmap in Context
and replace self.values
with a vector, assume that users get a fields list already, then what we can achieve is:
- field name is no long needed in
Context
'sself.values
, this reducesContext
's size. - insert value into
self.values
no longer need hash find, just vector indexing. - find value in
Expression
executing no long need hash find, just vector indexing.
That would bring considerable performance improvement in runtime.
However, since we remove field name in Context
, we may concern that is if we are using those APIs concurrently(e.g. router fields changed after context values constructed but matching not yet executed), we may have wrong field matching(router fields index changed but context is not aware of).
If we MUST guarantee the correctness if this condition happens, maybe we should use a version in context to match router version(version in Context
, Router
, maybe also in Expression
, that makes the version invasive to every data structure). Or we can change the Context.values
to something like Vec<struct {field_name, field_index, counter}>
, and making sure field name matches the lhs.name
in every matching execution(however, string comparison also takes time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find schema to validate field type by field name(which I think is unnecessary because we already check in lua side)
I believe we checked the type twice for different intents. The Lua side type check guarantees the value passed into the Rust side is always valid, and the Rust side checks for that validity.
Should we just remove the type check on the Lua side and make the Rust side return an error if the type is invalid to reduce the redundant call of the type check? I am not sure if there are some special concerns.
I do see newly added code has internal index storage with Router::fields
. But since the Schema is possibly always outlived with Router
, could we use string references from Schema
to build the fields by linking the Schema
with the Router
? Since we already did link Context
with Schema
on the Lua side. I am not very sure about that.
The current implementation already looks good to me.
@@ -26,37 +25,165 @@ impl Default for Match { | |||
} | |||
|
|||
pub struct Context<'a> { | |||
schema: &'a Schema, | |||
values: FnvHashMap<String, Vec<Value>>, | |||
router: &'a Router<'a>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bind Context
with Router
instead, since Router
can provide more precise fields validation.
@ProBrian This PR need to be rebased |
Local benchmark result attached (MacOS M3 chip with 36GB mem), comparing with PR branch and main branch
|
To resolve KAG-5155