Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solver returned SCS_SOLVED with an objective close to 0.0 with wrong results #191

Open
angzhiping opened this issue Dec 24, 2021 · 4 comments

Comments

@angzhiping
Copy link

angzhiping commented Dec 24, 2021

  • Hardware: NVIDIA Tegra X2
  • OS/Kernel: Ubuntu 18.04, 4.9.140-tegra aarch64
  • SCS Version: 3.0.0
  • Compiler: gcc 7.5.0

I'm solving a series of second-order cone programs that are pretty small (<100 unknowns and constraints), and noticed that sometimes the solver returns an SCS_SOLVED status with an objective that is really close to 0.0 (sometimes slightly negative), which does not seem to make sense as my problems are over-determined. The typical objective values for well-behaved problem are around 10-100. The results are also wrong when used in calculations within my model. Here is a Dropbox link to the data dump and logs of 4 such scenarios.

Thank you.

@bodono
Copy link
Member

bodono commented Dec 31, 2021

First, thankyou for submitting a nicely detailed issue!

The first thing I did was run the data you sent on my machine (Mac OSX laptop). As you can see below I get somewhat different behavior. Does the optimal objective of about 7.63 look right? If so, then it is likely that the problem is architecture specific. I did run for longer and the solver was having trouble reducing the primal residual by more than that, so there could be a bug, but more likely there is some numerical issue preventing tighter accuracy.

I wonder if there is some problem running on a Tegra X2 / aarch64, which I don't think we have tested on. It might be worth creating a github action to run on non x86 architectures.

=> out/run_from_file_direct ~/Downloads/scs_solver_data_and_logs/problem.00000000.dat verbose 1
Reading data from /Users/bodonoghue/Downloads/scs_solver_data_and_logs/problem.00000000.dat
Attempting to override verbose with value 1.
Success.
------------------------------------------------------------------
	       SCS v3.0.0 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 37, constraints m: 49
cones: 	  z: primal zero / dual free vars: 21
	  q: soc vars: 28, qsize: 7
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, warm_start: 0
	  acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-direct
	  nnz(A): 154, nnz(P): 37
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 1.27e+07  1.05e+01  8.31e+07 -4.01e+07  1.00e-01  4.27e-05
   250| 1.51e+05  1.00e+00  1.12e+05 -1.31e+05  1.00e-01  8.37e-04
   500| 1.22e+01  3.44e-05  9.79e+00  1.47e+01  1.00e-01  1.61e-03
   750| 5.24e+00  3.60e-06  1.26e+00  1.89e+00  1.00e-01  2.31e-03
  1000| 1.87e+00  2.42e-09  6.14e-03 -9.12e-03  1.00e-01  3.05e-03
  1250| 1.87e+00  1.85e-09  4.94e-03 -7.33e-03  1.00e-01  3.91e-03
  1500| 1.87e+00  1.66e-09  4.56e-03 -6.74e-03  1.00e-01  4.75e-03
  1750| 1.87e+00  4.23e-09  3.05e-03 -1.71e-03  8.95e+00  5.50e-03
  2000| 1.66e+00  6.44e-05  3.26e+01 -1.33e+01  2.41e+03  6.20e-03
  2250| 4.87e-01  2.52e-08  9.73e-02  6.92e+00  2.77e+02  6.87e-03
  2500| 4.83e-01  3.01e-08  1.38e-01  6.94e+00  2.77e+02  7.50e-03
  2750| 4.79e-01  3.50e-08  1.77e-01  6.97e+00  2.77e+02  8.14e-03
  3000| 6.38e-02  1.52e-08  5.25e-03  7.58e+00  2.77e+02  8.77e-03
  3250| 5.59e-02  2.10e-09  2.61e-02  7.58e+00  2.77e+02  9.38e-03
  3500| 5.04e-02  1.82e-09  2.36e-02  7.58e+00  2.77e+02  9.99e-03
  3750| 4.66e-02  1.59e-09  2.13e-02  7.59e+00  2.77e+02  1.06e-02
  4000| 1.21e+00  5.27e-04  6.35e+01  3.94e+01  2.77e+02  1.12e-02
  4250| 4.01e-02  1.22e-09  1.71e-02  7.59e+00  2.77e+02  1.19e-02
  4500| 3.72e-02  1.10e-09  1.54e-02  7.60e+00  2.77e+02  1.26e-02
  4750| 3.45e-02  1.03e-09  1.38e-02  7.60e+00  2.77e+02  1.34e-02
  5000| 2.74e-02  6.35e-09  1.42e-01  7.54e+00  8.79e+02  1.41e-02
  5250| 2.17e-02  4.34e-09  1.06e-01  7.56e+00  8.79e+02  1.48e-02
  5500| 1.72e-02  3.44e-09  7.99e-02  7.58e+00  8.79e+02  1.55e-02
  5750| 1.37e-02  2.81e-09  6.06e-02  7.59e+00  8.79e+02  1.62e-02
  6000| 1.09e-02  2.29e-09  4.62e-02  7.60e+00  8.79e+02  1.69e-02
  6250| 8.62e-03  1.85e-09  3.54e-02  7.61e+00  8.79e+02  1.76e-02
  6450| 1.91e-04  3.56e-10  8.38e-04  7.63e+00  8.79e+02  1.82e-02
------------------------------------------------------------------
status:  solved
timings: total: 1.85e-02s = setup: 2.82e-04s + solve: 1.82e-02s
	 lin-sys: 1.01e-02s, cones: 2.36e-03s, accel: 1.46e-03s
------------------------------------------------------------------
objective = 7.628260
------------------------------------------------------------------

@bodono
Copy link
Member

bodono commented Dec 31, 2021

Oh, and I presume you weren't running on the GPU?

@angzhiping
Copy link
Author

First, thankyou for submitting a nicely detailed issue!

The first thing I did was run the data you sent on my machine (Mac OSX laptop). As you can see below I get somewhat different behavior. Does the optimal objective of about 7.63 look right? If so, then it is likely that the problem is architecture specific. I did run for longer and the solver was having trouble reducing the primal residual by more than that, so there could be a bug, but more likely there is some numerical issue preventing tighter accuracy.

I wonder if there is some problem running on a Tegra X2 / aarch64, which I don't think we have tested on. It might be worth creating a github action to run on non x86 architectures.

=> out/run_from_file_direct ~/Downloads/scs_solver_data_and_logs/problem.00000000.dat verbose 1
Reading data from /Users/bodonoghue/Downloads/scs_solver_data_and_logs/problem.00000000.dat
Attempting to override verbose with value 1.
Success.
------------------------------------------------------------------
	       SCS v3.0.0 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 37, constraints m: 49
cones: 	  z: primal zero / dual free vars: 21
	  q: soc vars: 28, qsize: 7
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, warm_start: 0
	  acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-direct
	  nnz(A): 154, nnz(P): 37
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 1.27e+07  1.05e+01  8.31e+07 -4.01e+07  1.00e-01  4.27e-05
   250| 1.51e+05  1.00e+00  1.12e+05 -1.31e+05  1.00e-01  8.37e-04
   500| 1.22e+01  3.44e-05  9.79e+00  1.47e+01  1.00e-01  1.61e-03
   750| 5.24e+00  3.60e-06  1.26e+00  1.89e+00  1.00e-01  2.31e-03
  1000| 1.87e+00  2.42e-09  6.14e-03 -9.12e-03  1.00e-01  3.05e-03
  1250| 1.87e+00  1.85e-09  4.94e-03 -7.33e-03  1.00e-01  3.91e-03
  1500| 1.87e+00  1.66e-09  4.56e-03 -6.74e-03  1.00e-01  4.75e-03
  1750| 1.87e+00  4.23e-09  3.05e-03 -1.71e-03  8.95e+00  5.50e-03
  2000| 1.66e+00  6.44e-05  3.26e+01 -1.33e+01  2.41e+03  6.20e-03
  2250| 4.87e-01  2.52e-08  9.73e-02  6.92e+00  2.77e+02  6.87e-03
  2500| 4.83e-01  3.01e-08  1.38e-01  6.94e+00  2.77e+02  7.50e-03
  2750| 4.79e-01  3.50e-08  1.77e-01  6.97e+00  2.77e+02  8.14e-03
  3000| 6.38e-02  1.52e-08  5.25e-03  7.58e+00  2.77e+02  8.77e-03
  3250| 5.59e-02  2.10e-09  2.61e-02  7.58e+00  2.77e+02  9.38e-03
  3500| 5.04e-02  1.82e-09  2.36e-02  7.58e+00  2.77e+02  9.99e-03
  3750| 4.66e-02  1.59e-09  2.13e-02  7.59e+00  2.77e+02  1.06e-02
  4000| 1.21e+00  5.27e-04  6.35e+01  3.94e+01  2.77e+02  1.12e-02
  4250| 4.01e-02  1.22e-09  1.71e-02  7.59e+00  2.77e+02  1.19e-02
  4500| 3.72e-02  1.10e-09  1.54e-02  7.60e+00  2.77e+02  1.26e-02
  4750| 3.45e-02  1.03e-09  1.38e-02  7.60e+00  2.77e+02  1.34e-02
  5000| 2.74e-02  6.35e-09  1.42e-01  7.54e+00  8.79e+02  1.41e-02
  5250| 2.17e-02  4.34e-09  1.06e-01  7.56e+00  8.79e+02  1.48e-02
  5500| 1.72e-02  3.44e-09  7.99e-02  7.58e+00  8.79e+02  1.55e-02
  5750| 1.37e-02  2.81e-09  6.06e-02  7.59e+00  8.79e+02  1.62e-02
  6000| 1.09e-02  2.29e-09  4.62e-02  7.60e+00  8.79e+02  1.69e-02
  6250| 8.62e-03  1.85e-09  3.54e-02  7.61e+00  8.79e+02  1.76e-02
  6450| 1.91e-04  3.56e-10  8.38e-04  7.63e+00  8.79e+02  1.82e-02
------------------------------------------------------------------
status:  solved
timings: total: 1.85e-02s = setup: 2.82e-04s + solve: 1.82e-02s
	 lin-sys: 1.01e-02s, cones: 2.36e-03s, accel: 1.46e-03s
------------------------------------------------------------------
objective = 7.628260
------------------------------------------------------------------

An objective of 7.63 sounds reasonable. Might it be the compilation flags that are used? Or due to differences in the BLAS/Lapack libraries? I remember modifying scs.mk to link with the Blas libraries that are shipped with the TX2.

I'm not using GPU. Only the library scs direct is used to run the attached test cases.

@bodono
Copy link
Member

bodono commented Jan 1, 2022

Yes, it may well be due to the BLAS libraries. The only place where it is used for your problem is the acceleration, which you can disable by setting acceleration_lookback = 0. When I do that for your problem it does much worse, but it would be useful to see if you get the same behaviour which can help us debug:

 99750| 1.87e+00  1.82e-08  1.19e-01  1.78e-01  1.00e-01  2.25e-01
100000| 1.87e+00  1.82e-08  1.19e-01  1.79e-01  1.00e-01  2.25e-01
------------------------------------------------------------------
status:  solved (inaccurate - reached max_iters)
timings: total: 2.25e-01s = setup: 2.15e-04s + solve: 2.25e-01s
	 lin-sys: 1.38e-01s, cones: 2.84e-02s, accel: 0.00e+00s
------------------------------------------------------------------
objective = 0.178815 (inaccurate)
------------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants