Are the experiments fair enough? #55

jyczwys · 2023-05-05T05:46:00Z

Hello,

I really enjoy your paper and code. But when I implement the baseline LKH-3 for the CVRP problem, I find out something strange.
In vrp_baseline.py, line 101, the parameter MAX_TRAILS is set to:

"MAX_TRIALS": 10000,

But when I look into the docs of LKH-3, I find the default set of MAX_TRAILS is "number of nodes" (which varies from 20 - 100 in the experiments). So I further experiment on several random graphs with 20 nodes and set MAX_TRAILS to 20 and 10000 respectively, here are the results:

left: MAX_TRAILS = 20, right: MAX_TRAILS = 10000

#	Obj	Time(s)	Obj	Time (s)
1	407	0.08	407	17.99
2	432	0.09	432	22.27
3	365	0.08	365	17.81
4	341	0.07	341	15.14
5	221	0.10	221	20.44

It seems that setting MAX_TRAILS to 20 or 10000 will return the same objective value but their run times have a 200x difference. So I have a doubt whether the experiments are fair enough for the baselines? Or is there anything I did wrong?

The text was updated successfully, but these errors were encountered:

wouterkool · 2023-05-30T14:16:56Z

Hi @jyczwys,

I'm sorry for my slow reply, as I have little time for maintaining this repository. Thanks a lot for your feedback as you raise a fair point. I am not sure why I set MAX_TRIALS to 10000 (it has been some years), probably this was to improve performance for the 100 node instances.

In any case, I think the 20 node experiments are not really representative: it is trivial to solve TSP (and also VRP) with just 20 nodes to optimality in very short time, so the experiments are mostly there to compare to previous works and show the gap to optimality. I fully acknowledge that LKH is better and I'm sorry if the paper raises a different impression. I think the conclusion is much more fair for 100 node instances (where still LKH is better).

In general it is hard to do a general and fair comparison taking into account hardware differences (CPU/GPU) and tradeoffs (time vs running time) which is acknowledged in the paper. In hindsight, I would do things differently and indeed in a follow up paper, we did a better job by reporting HGS (as being better than LKH) as well and reporting the quality as a function of runtime (see https://arxiv.org/pdf/2102.11756.pdf).

In all, I think your findings are correct (without validating), but I also think they don't affect the main conclusions and value of the paper: we show that it is possible to learn constructive policies which are able to construct good solutions quickly while learning from scratch for different vehicle routing problems.

My compliments for your observations, I think it is great that you point this out as we should keep each other to high standards and fair evaluation of algorithsm. Please let me know if you have further questions/concerns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are the experiments fair enough? #55

Are the experiments fair enough? #55

jyczwys commented May 5, 2023

wouterkool commented May 30, 2023

Are the experiments fair enough? #55

Are the experiments fair enough? #55

Comments

jyczwys commented May 5, 2023

wouterkool commented May 30, 2023