Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are the experiments fair enough? #55

Open
jyczwys opened this issue May 5, 2023 · 1 comment
Open

Are the experiments fair enough? #55

jyczwys opened this issue May 5, 2023 · 1 comment

Comments

@jyczwys
Copy link

jyczwys commented May 5, 2023

Hello,

I really enjoy your paper and code. But when I implement the baseline LKH-3 for the CVRP problem, I find out something strange.
In vrp_baseline.py, line 101, the parameter MAX_TRAILS is set to:

"MAX_TRIALS": 10000,

But when I look into the docs of LKH-3, I find the default set of MAX_TRAILS is "number of nodes" (which varies from 20 - 100 in the experiments). So I further experiment on several random graphs with 20 nodes and set MAX_TRAILS to 20 and 10000 respectively, here are the results:

left: MAX_TRAILS = 20, right: MAX_TRAILS = 10000

# Obj Time(s) Obj Time (s)
1 407 0.08 407 17.99
2 432 0.09 432 22.27
3 365 0.08 365 17.81
4 341 0.07 341 15.14
5 221 0.10 221 20.44

It seems that setting MAX_TRAILS to 20 or 10000 will return the same objective value but their run times have a 200x difference. So I have a doubt whether the experiments are fair enough for the baselines? Or is there anything I did wrong?

@wouterkool
Copy link
Owner

Hi @jyczwys,

I'm sorry for my slow reply, as I have little time for maintaining this repository. Thanks a lot for your feedback as you raise a fair point. I am not sure why I set MAX_TRIALS to 10000 (it has been some years), probably this was to improve performance for the 100 node instances.

In any case, I think the 20 node experiments are not really representative: it is trivial to solve TSP (and also VRP) with just 20 nodes to optimality in very short time, so the experiments are mostly there to compare to previous works and show the gap to optimality. I fully acknowledge that LKH is better and I'm sorry if the paper raises a different impression. I think the conclusion is much more fair for 100 node instances (where still LKH is better).

In general it is hard to do a general and fair comparison taking into account hardware differences (CPU/GPU) and tradeoffs (time vs running time) which is acknowledged in the paper. In hindsight, I would do things differently and indeed in a follow up paper, we did a better job by reporting HGS (as being better than LKH) as well and reporting the quality as a function of runtime (see https://arxiv.org/pdf/2102.11756.pdf).

In all, I think your findings are correct (without validating), but I also think they don't affect the main conclusions and value of the paper: we show that it is possible to learn constructive policies which are able to construct good solutions quickly while learning from scratch for different vehicle routing problems.

My compliments for your observations, I think it is great that you point this out as we should keep each other to high standards and fair evaluation of algorithsm. Please let me know if you have further questions/concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants