Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add ranges support #14

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

thePanz
Copy link
Contributor

@thePanz thePanz commented Apr 4, 2018

  • Support for [a TO b] ranges
  • Support for {a TO b} ranges
  • Support for asymmetric ranges (such as [a TO b})
  • Support for ranges with dates xxxx-xx-xx format
  • Support for * in ranges, example: [123 TO *]
  • Support for quoted string in ranges, example: ["2008-07-28T14:47:31Z" TO NOW]

@codecov-io
Copy link

codecov-io commented Apr 4, 2018

Codecov Report

Merging #14 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##             master    #14   +/-   ##
=======================================
  Coverage       100%   100%           
- Complexity      260    279   +19     
=======================================
  Files            47     49    +2     
  Lines           689    743   +54     
=======================================
+ Hits            689    743   +54
Flag Coverage Δ Complexity Δ
#all 100% <100%> (ø) 279 <17> (+19) ⬆️
Impacted Files Coverage Δ Complexity Δ
lib/Languages/Galach/Tokenizer.php 100% <ø> (ø) 3 <0> (ø) ⬇️
lib/Languages/Galach/TokenExtractor/Full.php 100% <100%> (ø) 10 <2> (+4) ⬆️
lib/Languages/Galach/Values/Token/Range.php 100% <100%> (ø) 3 <3> (?)
lib/Languages/Galach/Generators/Native/Range.php 100% <100%> (ø) 12 <12> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a06fe70...e9ca9e8. Read the comment docs.


/**
* @param string $lexeme
* @param int $position
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No alignment in PHPDoc, it unnecessarily adds to diff when updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just run the php-cs-fixer with the current settings, I guess it was never run before on the code.
I guess it is a good thing to comply to a CS here :)

@@ -46,43 +48,63 @@ protected function getExpressionTypeMap()
protected function createTermToken($position, array $data)
{
$lexeme = $data['lexeme'];
$token = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I think Code Climate is a bit too aggressive, this now looks harder to read than before :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the code climate settings? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I reverted that change! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now disabled argument count as well. Unfortunately it has no separate config for constructors.

@pspanja
Copy link
Member

pspanja commented Apr 5, 2018

This looks great :) Do you plan to go for the exclusive ranges here as well?

@thePanz
Copy link
Contributor Author

thePanz commented Apr 5, 2018

@pspanja the exclusive should be "easy" to achive, but there is something not clear to me in the regexp you're using in the phrase token:
What's the (?<!\\\\) needed for? should it be used in the range too?

And for now, the "208-04-05" format in the range is not supported yet

@thePanz
Copy link
Contributor Author

thePanz commented Apr 5, 2018

@pspanja inclusive/exclusive range added, I'll add later the support for the date ranges

@pspanja
Copy link
Member

pspanja commented Apr 5, 2018

What's the (?<!\\\\) needed for? should it be used in the range too?

It's negative lookbehind, see here: http://www.rexegg.com/regex-lookarounds.html
Basically it forbids preceding backslash, useful in phrase matching to allow everything until unescaped quote.

About left and right value in range matching - I think we should somehow reuse the pattern for matching a word. Will take a closer look in the evening.

@thePanz
Copy link
Contributor Author

thePanz commented Apr 5, 2018

Thanks for the explanation @pspanja!
Should the range implementation be split into inclusive/exclusive (as this patch) and an additional MR for the other cases?

*/
public static function getTypeByStart($startSymbol)
{
if ('[' === $startSymbol) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this check here means if someone customizes the symbol, this class will also need to be changed. So it should be rather done outside, in the Full TokenExtractor implementation.


switch ($token->type) {
case RangeToken::TYPE_INCLUSIVE:
return $domainPrefix . '[' . $token->rangeFrom . ' TO ' . $token->rangeTo . ']';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symbols should be captured by the expression and contained in the token, so if they are customized this class does not need to know about it. And we should support mixed case as well, {a TO b] and [a TO b}.

It will probably mean a truckload of constructor arguments, but I'm OK with that :)

@pspanja
Copy link
Member

pspanja commented Apr 5, 2018

Should the range implementation be split into inclusive/exclusive (as this patch) and an additional MR for the other cases?

It can be split, I'm OK with that.

Thinking about the implementation a bit, handling both two-sided and one-sided ranges seems like too much both for the regex and token, so should probably be handled with a separate pattern and a separate token.

@TomasPilar
Copy link

Hi guys, thank you for this cool PR!
@pspanja Can it be merged?

@pspanja
Copy link
Member

pspanja commented Oct 9, 2018

Hi @TomasPilar, I'll try to find time in the next few days to see how to push this further.

@thePanz
Copy link
Contributor Author

thePanz commented Oct 9, 2018

FYI: this MR has been used in a pre-production system from April 6 :)

@j13k
Copy link

j13k commented Jun 18, 2019

This looks like a very useful extension — any chance it will be merged/released in the near future?

@pspanja
Copy link
Member

pspanja commented Jun 19, 2019

Hi @j13k, I've neglected this because I was too busy, but I'll find time in the next weeks to take care of it. It's obviously needed :)

@j13k
Copy link

j13k commented Jun 20, 2019

Hi @j13k, I've neglected this because I was to busy, but I'll find time in the next weeks to take care of it. It's obviously needed :)

That's fine @pspanja, I know what it's like! I'm struggling to find time for an outstanding PR on a small project that I maintain. ;)

Just so you know, it's a very useful library and really well-executed. I've implemented a set of generator classes that convert queries to Elasticsearch DSL.

@thePanz
Copy link
Contributor Author

thePanz commented Jun 20, 2019

I agree with @j13k , this library was very useful for me to:

  1. validate the user-query (parsing the query from a string and building the corresponding tree)
  2. re-write the query-tree with the specific field names in SOLR (the user doesn't know the real fields the query will be executed on, like the specific SOLR pre- or sub- fixes)

@ilukac
Copy link
Member

ilukac commented Jun 21, 2019

@thePanz @j13k feel free to PR whatever you think is generally useful :)

@drigolin
Copy link

Any plan to merge this PR? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants