Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks ? #5

Open
ddorian opened this issue Jun 13, 2017 · 17 comments
Open

Benchmarks ? #5

ddorian opened this issue Jun 13, 2017 · 17 comments

Comments

@ddorian
Copy link

ddorian commented Jun 13, 2017

People love benchmarks. Especially against lucene.

@markpapadakis
Copy link
Member

Apologies, I just saw this ticket. Well, what would you be interested in ? indexing time? search time? etc

@joelchen
Copy link

I would be interested in both indexing and search time side by side (https://sematext.com/blog/2015/05/27/side-by-side-with-elasticsearch-and-solr-performance-and-scalability). Please include Bleve (https://github.com/blevesearch/bleve) into benchmark, and put them alongside Elasticsearch and Solr in table of features (https://sematext.com/blog/2017/06/19/solr-vs-elasticsearch-differences) if possible.

@ddorian
Copy link
Author

ddorian commented Jul 20, 2017

The guy above is asking too much imho.

Only interested (for now) in in-memory search benchmark compared to lucene + tantivy to see the c++ difference.

@markpapadakis
Copy link
Member

For what it's worth, you may want to check https://www.bestprice.gr/ -- Trinity replaced the previous generation Trinity tech. a few days ago. It's in Greek, but you may get a sense of speed and accuracy. We 'll see to those benchmarks; we don't actively use lucene so we 'd need to build whatever's required to run them. Our ops folks will look into it.

@ddorian Please note that Trinity support any configuration and arrangement of sources; in-memory, on-disk, whatever -- as long as you provide the implementation, it doesn't care. It's very flexible that way -- you could use it to build e.g a Gmail(clone?) where you rely on both on-disk "segments" and in-memory segments, periodically flushing, or not -- with separate segments/indices for e.g operator is:starred where you want to be set/unset for messages but don't want to re-index the whole message every time.

@ddorian
Copy link
Author

ddorian commented Jul 20, 2017

@markpapadakis Was it worth it ? Doing custom library ? Do you have replication/sharding on the site (you might want to look at netflix/dynomite rediscluster/modules if not) ?

I'm only interested to see the difference for fully-in-memory, to see java vs c++(with SIMD etc) vs rust(tantivy) based also in this blog post: http://blog.mikemccandless.com/2013/06/screaming-fast-lucene-searches-using-c.html

@markpapadakis
Copy link
Member

@ddorian Yes, it was absolutely worth it. It took maybe 2-3? months for Trinity, and maybe another month for integrating it on BestPrice.gr (and soon, elsewhere) - which includes figuring out the kinks and building a powerful query rewrite engine (it uses Trinity's queries_rewrite.h functionality).
Have access to this kind of technology is extremely important.

Re: replication, for BestPrice specifically, we flush segments and periodically merge them and just sync them over to the various nodes -- but that's because it was a very simple design and we didn't need anything else at the time, though it should be easy to implement many if not any design.

Other than explicit use of SIMD in Trinity, the compiler is making use of SIMD instructions -- it certainly helps. Trinity also makes use of prefetching which also provided a useful boost. Other than that, if there is a need for it (it's very fast as it is, so no pressing need to get it to now) we will just compile the resulting execution tree (a query is compiled to an execution tree/plan which is then executed) to machine code and execute it directly (like Bing does). It shouldn't be too hard.

@markpapadakis
Copy link
Member

@ddorian Also, re: that blog post, there are 2 codecs included in Trinity. One of them is based on Lucene's default codec -- more or less same design with a few changes. It uses FastPFOR which deals with integers blocks and encoding/decoding relies on SIMD instructions (Lucene's using a PFOR variant itself). It's really really fast. (You can also use MaskedVByte instead -- see LUCENE_USE_MASKEDVBYTE defined in the codebase to use it instead of FastPFOR, which is almost as fast a PFor and faster for some datasets )

@ddorian
Copy link
Author

ddorian commented Jul 20, 2017

Have you ever checked https://github.com/powturbo/TurboPFor ? https://twitter.com/powturbo (mostly interested into search-engine/big-data stuff)

What about sharding ? Or you don't need it yet.

@markpapadakis
Copy link
Member

@ddorian yes, I have. It's excellent. You can replace Lemire's FastPFOR impl. with powturbo's. We have no applications or services that require sharing yet. Queries take a few ms, and if you are dealing with multiple index sources, you can use Trinity::exec_query_par() which parallelises execution across all available CPU cores. You could easily build support for sharing -- Trinity is a framework/library for building search engines and other IR applications; sharing is a higher-level concept - you can use Trinity's lower-level building blocks to implement it.

@ddorian
Copy link
Author

ddorian commented Aug 4, 2017

Mark, just curious if you've looked at something like this: http://www.scylladb.com/2017/07/06/scyllas-approach-improve-performance-cpu-bound-workloads/

@markpapadakis
Copy link
Member

@ddorian I have, I love Scylla's engineers write-ups, they are very pragmatic and insightful. ( I also can't recommend studying the Seastar codebase enough; it's a great engineering achievement, and perhaps the finest C++ codebase, along with Facebook's Folly and Facebook's HHVM, that I have studied ).

As for the benchmarks, we found a way to do this - so will likely release something very soon. In the mean time, a major Trinity upgrade is in the works (lots of performance and usability improvements).

@markpapadakis
Copy link
Member

markpapadakis commented Aug 31, 2017

A benchmarks page will be up soon, but in the mean time, in a not exactly fair comparison that involved indexing (naively) an English wikipedia partial dump of 563MB (XML), into a single segment:
Lucene: 71s
Trinity: 47s

This is a bit unfair though, because Lucene's skipping all stop-words, and that's a lot of stop words really, whereas, for this benchmarks, Trinity's not set to skip anything. Furthermore, the default query parser is used to parse the tokens, which is fairly sophisticated and considers different strategies for properly parsing different types of tokens etc, whereas the default parser of lucene is naive in that it just figures out the beginning and end of a token and that's pretty much all there is to it. With stop words excluded and a simple parse similar to Lucene's, I expect the difference to be closer to 100%, i.e twice as fast -- and this is running on a single thread whereas Lucene's indexer I believe uses multiple threads ( I should look into that and for ways to configure it, if possible, to use a single thread).

For search, there is no practical way to compare performance among Lucene and Trinity, because they have a different scoring model. Lucene simply aggregates the scores of each term or phrases matches (which is almost always a simple TF-IDF that doesn't require access to the positions/hits of a term in the matched document), which is pretty simplistic and very fast to compute(just a sum of values) -- and also allows for optimisations that are just not possible with Trinity's more advanced/richer model. Trinity on the other hand will invoke an application provided function, passing it rich information about every term matched, including all hits for that term in the matched document, so that the application can perform sophisticated ranking. This is vastly more complicated and expensive than just simply aggregating numbers and computing sums, but it's extremely important for computing better and more accurate relevance scores/similarity measure.
With that said, Trinity is as fast or faster than Lucene for most types of queries, except for a query that is either made up of a sequence of OR sub-expressions and 0 or 1 NOT sub-expression (this is because it can just compute sums and not track all kinds of information, like I explained earlier), in which case its is maybe 30% faster. If you have AND or other operators etc, Trinity's faster. Trinity has an ExecFlags::DocumentsOnly, which can be used to only pass the matched documents to the callback, in which case it behaves a lot like Lucene's impl. When that's set, Trinity is over 100% faster for such query comprised of OR sub-queries (100ms(Lucene) vs 44ms(Trinity) ). In the future, there will likely be a new ExecFlag which can be used to score based on TF-IDF or some other such simple model to get a Lucene type relevancy at Trinity's speed (though I still don't think people should rely on those models).

@ddorian
Copy link
Author

ddorian commented Aug 31, 2017

Lucene 6+ uses BM25 which is a little slower than TF-IDF(depreciated on lucene).

Do you intend to create some performance-benchmarks, so you can use yourself to make sure new changes don't introduce slowdowns ? (like lucene does)

@markpapadakis
Copy link
Member

@ddorian we have such a benchmark suit but that's internal to our org., i.e won't be open sourced. I will bundle a Lucene and a Trinity app so that others can see for themselves (i.e the applications that index and search that wikipedia dump) -- thanks to @rkrambovitis for buildingthat Lucene Java application.

BM25 yields somewhat better results, but, just like TF-IDF, is not going to work for "short" documents (e.g products, or records), where you don't really have enough terms in the "document", so those models can't really produce meaningful results.

@markpapadakis
Copy link
Member

Trinity now supports the recently(yesterday) released Streaming VByte encoding scheme by Daniel Lemire. Read the blog post for an introduction.

When selected (see lucene_codec.h), it generally results in at least 30% improvement in query processing time, albeit at the cost of slightly larger indices. That's a pretty impressive performance boost.

@ddorian
Copy link
Author

ddorian commented Sep 28, 2017

Yeah saw that yesterday too. But it didn't also work with SIMD.

I should stress that if you are stuck with a machine without vector instructions, or using a programming language that does not support vector instructions, then it is not hard to code and decode that data efficiently with ordinary (scalar) code. You are never going to get the same performance as with SIMD instructions, but it won’t be terrible.

Vespa looks nice for a clustering solution (assuming you can replace their index with trinity) that got released yesterday and ex-employees that have worked with it say it's a 100x better elastic-search.

@markpapadakis
Copy link
Member

@ddorian you can choose between PFOR, Streaming VByte and Masked VByte, see lucene_codec.h for how to select the encoding scheme. In practice, most systems nowadays (well, CPUs) support SSE 4.0+ so it should work as expected on most environments.

I 've been talking to Jon Bratseth about Vespa. It's really interesting, and I plan to do a deep dive in the codebase to understand the impl. specifics; I just haven't had a chance to do that yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants