-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support lenient parser #114
Conversation
|
||
// TODO(https://github.com/PyO3/pyo3/issues/1190): Expose this to bindings once trait <-> ABC is | ||
// supported in PyO3. | ||
pub(crate) trait QueryParserError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a base class because it seems base classes make it tough to do things like IntoPy
and there isn't support for upcasting.
See: PyO3/pyo3#1836 and PyO3/pyo3#787
Cargo.toml
Outdated
chrono = "0.4.23" | ||
tantivy = "0.20.1" | ||
tantivy = { git = "https://github.com/quickwit-oss/tantivy.git" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because there hasn't been a release with lenient parser support in tantivy
yet.
@@ -14,8 +14,9 @@ crate-type = ["cdylib"] | |||
pyo3-build-config = "0.19.1" | |||
|
|||
[dependencies] | |||
base64 = "0.21" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary to access the base64
error that tantivy
holds. This dependency won't change much otherwise, since it's also being used by tantivy
anyways.
8dd0639
to
e2d6309
Compare
e2d6309
to
df4497d
Compare
df4497d
to
f8d1ad6
Compare
f8d1ad6
to
cfa93d7
Compare
Ready for review now |
Hoping to get around to it this weekend. |
Cargo.lock
Outdated
@@ -15,9 +15,9 @@ dependencies = [ | |||
|
|||
[[package]] | |||
name = "aho-corasick" | |||
version = "1.0.2" | |||
version = "1.1.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many transitive dependencies are also getting updated in this PR. It makes me a little nervous but I am ok to continue with fixing this up. In future though, it would be better if we can restrict the dep changes to only what is required by the other changes in the same PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've undone these changes now.
src/index.rs
Outdated
pub fn parse_query_lenient( | ||
&self, | ||
query: &str, | ||
default_field_names: Option<Vec<String>>, | ||
) -> PyResult<(Query, Vec<PyObject>)> { | ||
let mut default_fields = vec![]; | ||
let schema = self.index.schema(); | ||
|
||
if let Some(default_field_names_vec) = default_field_names { | ||
for default_field_name in &default_field_names_vec { | ||
if let Ok(field) = schema.get_field(default_field_name) { | ||
let field_entry = schema.get_field_entry(field); | ||
if !field_entry.is_indexed() { | ||
return Err(exceptions::PyValueError::new_err( | ||
format!( | ||
"Field `{default_field_name}` is not set as indexed in the schema." | ||
), | ||
)); | ||
} | ||
default_fields.push(field); | ||
} else { | ||
return Err(exceptions::PyValueError::new_err(format!( | ||
"Field `{default_field_name}` is not defined in the schema." | ||
))); | ||
} | ||
} | ||
} else { | ||
for (field, field_entry) in self.index.schema().fields() { | ||
if field_entry.is_indexed() { | ||
default_fields.push(field); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of mainly curiosity, I looked at removing the mutability of default_fields
by returning the vec from the if-expression:
pub fn parse_query_lenient(
&self,
query: &str,
default_field_names: Option<Vec<String>>,
) -> PyResult<(Query, Vec<PyObject>)> {
let schema = self.index.schema();
let default_fields = if let Some(default_field_names_vec) =
default_field_names
{
default_field_names_vec
.iter()
.map(|field_name| {
schema
.get_field(field_name)
.map_err(|_err| {
exceptions::PyValueError::new_err(format!(
"Field `{field_name}` is not defined in the schema."
))
})
.and_then(|field| {
schema.get_field_entry(field).is_indexed().then_some(field).ok_or(
exceptions::PyValueError::new_err(
format!(
"Field `{field_name}` is not set as indexed in the schema."
),
))
})
}).collect::<Result<Vec<_>, _>>()?
} else {
self.index
.schema()
.fields()
.filter_map(|(f, fe)| fe.is_indexed().then_some(f))
.collect::<Vec<_>>()
};
I don't find it super compelling to make this change though. What do you think? I'm going to approve the PR regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not. I have a preference for functional-styling here.
@GodTamIt you should be good to merge this and ill cut a new release, as the lenient parser was in tantivy 0.21 |
Note: this is a strawman for supporting lenient parser in the bindings. Of course, this can't be landed until there is a new release of the
tantivy
crate with the lenient parser included.Closes #109