Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark of integrated probability #55

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

lpsinger
Copy link
Collaborator

Add benchmark of integrated probability within the union of a set of fields.

As reported by @mcoughlin and @Tohuvavohu, this operation is slower than it should be:

pytest --benchmark-group-by=fullfunc --benchmark-sort=fullname --benchmark-json benchmark_results.json healpix_alchemy/tests/benchmarks/test_benchmarks.py ======================================================================================================== test session starts =========================================================================================================
platform darwin -- Python 3.8.13, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/lpsinger/src/healpix-alchemy, configfile: pyproject.toml
plugins: benchmark-3.4.1, doctestplus-0.11.2, postgresql-3.1.2, cov-3.0.0
collected 10 items                                                                                                                                                                                                                   

healpix_alchemy/tests/benchmarks/test_benchmarks.py ..........                                                                                                                                                                 [100%]
Wrote benchmark data in: <_io.BufferedWriter name='benchmark_results.json'>



------------------------------------------------- benchmark 'healpix_alchemy/tests/benchmarks/test_benchmarks.py::test_integrated_probability': 10 tests ------------------------------------------------
Name (time in s)                            Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_integrated_probability[10000]       2.2970 (4.37)       2.3157 (4.17)       2.3075 (4.24)     0.0069 (1.0)        2.3079 (4.16)     0.0081 (1.0)           2;0  0.4334 (0.24)          5           1
test_integrated_probability[1291]      129.9643 (247.25)   131.1182 (235.83)   130.4707 (239.47)   0.4317 (62.57)    130.4522 (235.29)   0.5523 (68.46)         2;0  0.0077 (0.00)          5           1
test_integrated_probability[166]        61.9228 (117.81)    65.9366 (118.60)    62.9024 (115.45)   1.7139 (248.39)    62.0978 (112.00)   1.4208 (176.13)        1;1  0.0159 (0.01)          5           1
test_integrated_probability[1]           0.5256 (1.0)        0.5560 (1.0)        0.5448 (1.0)      0.0146 (2.11)       0.5544 (1.0)      0.0247 (3.07)          1;0  1.8354 (1.0)           5           1
test_integrated_probability[21]         10.4779 (19.93)     10.9063 (19.62)     10.6447 (19.54)    0.1678 (24.32)     10.5684 (19.06)    0.2152 (26.68)         1;0  0.0939 (0.05)          5           1
test_integrated_probability[2]           0.9655 (1.84)       1.0280 (1.85)       0.9987 (1.83)     0.0251 (3.64)       1.0072 (1.82)     0.0389 (4.83)          2;0  1.0013 (0.55)          5           1
test_integrated_probability[3593]       25.7560 (49.00)     25.9841 (46.74)     25.8426 (47.43)    0.0882 (12.79)     25.8226 (46.57)    0.1113 (13.79)         1;0  0.0387 (0.02)          5           1
test_integrated_probability[464]       120.1516 (228.58)   121.1324 (217.87)   120.4673 (221.11)   0.3862 (55.97)    120.3677 (217.10)   0.3683 (45.66)         1;0  0.0083 (0.00)          5           1
test_integrated_probability[59]         27.8548 (52.99)     28.8851 (51.95)     28.3581 (52.05)    0.4449 (64.48)     28.5542 (51.50)    0.7297 (90.45)         2;0  0.0353 (0.02)          5           1
test_integrated_probability[7]           3.5211 (6.70)       3.6586 (6.58)       3.5808 (6.57)     0.0544 (7.88)       3.5570 (6.42)     0.0771 (9.55)          2;0  0.2793 (0.15)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
================================================================================================== 10 passed in 2787.96s (0:46:27) ===================================================================================================

@lpsinger
Copy link
Collaborator Author

lpsinger commented Sep 7, 2022

@mcoughlin, I had time to isolate this test case for the performance of the integrated probability calculation, but I haven't had time yet to try to tune the query and optimize it. Do you have a student who is an SQL wizard that you could assign to take a look at it?

@codecov
Copy link

codecov bot commented Sep 7, 2022

Codecov Report

Base: 96.34% // Head: 94.93% // Decreases project coverage by -1.41% ⚠️

Coverage data is based on head (30181c0) compared to base (8be93e9).
Patch has no changes to coverable lines.

❗ Current head 30181c0 differs from pull request most recent head 21cec74. Consider uploading reports for the commit 21cec74 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #55      +/-   ##
==========================================
- Coverage   96.34%   94.93%   -1.41%     
==========================================
  Files           4        4              
  Lines          82       79       -3     
==========================================
- Hits           79       75       -4     
- Misses          3        4       +1     
Impacted Files Coverage Δ
healpix_alchemy/types.py 93.54% <0.00%> (-1.62%) ⬇️
healpix_alchemy/constants.py 100.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jadalilleboe
Copy link

Hi @lpsinger @mcoughlin I'd be happy to take a look at this and try optimizing the query. I'm definitely not a SQL wizard but I'm taking a class on spatial databases rn so I'm learning more about SQL and such

@mcoughlin
Copy link
Contributor

Great idea @jadalilleboe!

@jadalilleboe
Copy link

so the query looks like this in SQL:

SELECT sum( skymaptile.probdensity * (upper(anon_1.hpx * skymaptile.hpx) - lower(anon_1.hpx * skymaptile.hpx)) * : param_1) AS sum_1 
FROM skymaptile, (SELECT unnest(range_agg(fieldtile.hpx)) AS hpx FROM fieldtile) AS anon_1 
WHERE skymaptile.id = :id_1 AND (anon_1.hpx && skymaptile.hpx)

I think the bottleneck is in the WHERE clause with finding the overlap of a skymap tile with a specific id and the field tile ranges, because removing the SkymapTile.id == 1 in the .filter caused the query to execute much faster (3 minutes vs 46). So I think calculating the union, area, and prob is fast enough and the issue lies with finding overlapping tiles quickly. Any ideas on how to speed that up @lpsinger ? I saw the Tile type uses SP-GiST indexing which is supposedly good for speeding up an OVERLAPS operation but I don't know if that works with the SkymapTile model as well

@lpsinger
Copy link
Collaborator Author

Interestingly, simply removing the SkymapTile.id == 1 filter clause from the query speeds it up to just a few seconds. I think that this is an indication that if we could coax the query optimizer to reorder the scans a bit it would be about as performant.

------------------------------------------------------------------------------------------------ benchmark: 10 tests -------------------------------------------------------------------------------------------------
Name (time in ms)                             Min                   Max                  Mean              StdDev                Median                 IQR            Outliers      OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_integrated_probability[1]            10.2987 (1.0)         16.3231 (1.0)         11.8116 (1.0)        1.1705 (1.0)         11.4592 (1.0)        1.3440 (1.00)          9;2  84.6628 (1.0)          56           1
test_integrated_probability[2]            22.8495 (2.22)        32.3029 (1.98)        25.1355 (2.13)       1.6119 (1.38)        24.8391 (2.17)       1.3385 (1.0)           5;2  39.7844 (0.47)         38           1
test_integrated_probability[7]            88.1038 (8.55)       100.8049 (6.18)        91.7998 (7.77)       3.2549 (2.78)        90.9960 (7.94)       2.8598 (2.14)          2;1  10.8933 (0.13)         12           1
test_integrated_probability[21]          226.6382 (22.01)      236.2924 (14.48)      231.7343 (19.62)      4.2733 (3.65)       233.9832 (20.42)      7.1305 (5.33)          2;0   4.3153 (0.05)          5           1
test_integrated_probability[59]          689.0763 (66.91)      709.6576 (43.48)      697.9505 (59.09)     10.3485 (8.84)       692.1832 (60.40)     19.1763 (14.33)         2;0   1.4328 (0.02)          5           1
test_integrated_probability[3593]      1,471.8220 (142.91)   1,850.3964 (113.36)   1,624.8050 (137.56)   162.2893 (138.65)   1,537.0167 (134.13)   256.3170 (191.50)        1;0   0.6155 (0.01)          5           1
test_integrated_probability[166]       1,655.4875 (160.75)   1,800.0999 (110.28)   1,713.9674 (145.11)    52.8085 (45.12)    1,702.6641 (148.58)    44.0080 (32.88)         2;1   0.5834 (0.01)          5           1
test_integrated_probability[10000]     2,471.0722 (239.94)   2,578.0584 (157.94)   2,537.5146 (214.83)    46.6182 (39.83)    2,563.3955 (223.70)    73.7618 (55.11)         1;0   0.3941 (0.00)          5           1
test_integrated_probability[464]       3,279.4863 (318.44)   3,396.5979 (208.09)   3,354.7103 (284.02)    49.7878 (42.54)    3,370.6874 (294.15)    77.8387 (58.15)         1;0   0.2981 (0.00)          5           1
test_integrated_probability[1291]      3,531.6656 (342.92)   3,621.7460 (221.88)   3,568.7107 (302.14)    35.9082 (30.68)    3,572.5450 (311.76)    51.7857 (38.69)         2;0   0.2802 (0.00)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@lpsinger
Copy link
Collaborator Author

lpsinger commented Oct 28, 2022

Even more curious, "group by" is also fast. It's just filtering that is slow.

Nothing

    query = sa.select(
        prob
    ).filter(
        union.columns.hpx.overlaps(SkymapTile.hpx)
    )
-------------------------------------------------------------------------------------------------- benchmark: 10 tests ---------------------------------------------------------------------------------------------------
Name (time in ms)                             Min                   Max                  Mean                StdDev                Median                   IQR            Outliers      OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_integrated_probability[1]            12.7070 (1.0)         17.6377 (1.0)         13.7699 (1.0)          1.0000 (1.62)        13.3544 (1.0)          1.0893 (1.23)          9;1  72.6223 (1.0)          44           1
test_integrated_probability[2]            27.3178 (2.15)        29.6046 (1.68)        28.3043 (2.06)         0.6174 (1.0)         28.3433 (2.12)         0.8857 (1.0)           9;0  35.3303 (0.49)         31           1
test_integrated_probability[7]           103.4644 (8.14)       114.8462 (6.51)       111.0710 (8.07)         3.7426 (6.06)       111.7471 (8.37)         5.5942 (6.32)          2;0   9.0032 (0.12)          9           1
test_integrated_probability[21]          223.4636 (17.59)      231.4038 (13.12)      226.6374 (16.46)        3.3149 (5.37)       225.0004 (16.85)        5.0741 (5.73)          1;0   4.4123 (0.06)          5           1
test_integrated_probability[59]          687.1763 (54.08)      738.7924 (41.89)      708.6908 (51.47)       22.3360 (36.18)      702.5104 (52.61)       38.5693 (43.55)         1;0   1.4111 (0.02)          5           1
test_integrated_probability[166]       1,705.0963 (134.19)   2,001.6424 (113.49)   1,882.5663 (136.72)     118.1214 (191.31)   1,870.1833 (140.04)     165.8570 (187.26)        2;0   0.5312 (0.01)          5           1
test_integrated_probability[3593]      1,797.3303 (141.44)   2,310.2434 (130.98)   1,997.8205 (145.09)     215.4188 (348.89)   1,986.5051 (148.75)     347.2145 (392.01)        1;0   0.5005 (0.01)          5           1
test_integrated_probability[10000]     2,999.3698 (236.04)   3,304.8324 (187.37)   3,150.1793 (228.77)     135.2509 (219.05)   3,091.9167 (231.53)     232.3760 (262.36)        2;0   0.3174 (0.00)          5           1
test_integrated_probability[464]       3,497.7362 (275.26)   7,492.2488 (424.79)   5,365.0780 (389.62)   1,956.6981 (>1000.0)  4,515.0568 (338.09)   3,683.1984 (>1000.0)       2;0   0.1864 (0.00)          5           1
test_integrated_probability[1291]      4,259.0673 (335.17)   5,203.8290 (295.04)   4,551.1291 (330.51)     393.3545 (637.08)   4,378.1167 (327.84)     496.1256 (560.14)        1;0   0.2197 (0.00)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Group by

    query = sa.select(
        prob
    ).filter(
        union.columns.hpx.overlaps(SkymapTile.hpx)
    ).group_by(
        SkymapTile.id
    )
-------------------------------------------------------------------------------------------------- benchmark: 10 tests ---------------------------------------------------------------------------------------------------
Name (time in ms)                             Min                   Max                  Mean                StdDev                Median                   IQR            Outliers      OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_integrated_probability[1]            14.8917 (1.0)         74.5656 (1.78)        19.9120 (1.0)         11.5436 (5.36)        17.1354 (1.0)          1.9610 (1.0)           1;4  50.2210 (1.0)          26           1
test_integrated_probability[2]            34.0893 (2.29)        41.9271 (1.0)         36.8572 (1.85)         2.1546 (1.0)         36.4172 (2.13)         3.5601 (1.82)          6;0  27.1317 (0.54)         20           1
test_integrated_probability[7]           130.8256 (8.79)       209.1253 (4.99)       169.9596 (8.54)        26.3329 (12.22)      175.7286 (10.26)       35.6039 (18.16)         2;0   5.8838 (0.12)          7           1
test_integrated_probability[21]          299.0502 (20.08)      318.2287 (7.59)       309.7009 (15.55)        8.1037 (3.76)       311.2289 (18.16)       13.9192 (7.10)          2;0   3.2289 (0.06)          5           1
test_integrated_probability[59]          786.4575 (52.81)    1,114.9106 (26.59)      922.2906 (46.32)      127.3165 (59.09)      939.2223 (54.81)      166.1731 (84.74)         2;0   1.0843 (0.02)          5           1
test_integrated_probability[3593]      1,600.3195 (107.46)   1,761.2495 (42.01)    1,686.1349 (84.68)       70.7754 (32.85)    1,703.2191 (99.40)      127.1671 (64.85)         2;0   0.5931 (0.01)          5           1
test_integrated_probability[166]       1,782.4032 (119.69)   2,144.3674 (51.15)    1,945.9419 (97.73)      152.4452 (70.75)    1,931.0694 (112.69)     262.8904 (134.06)        2;0   0.5139 (0.01)          5           1
test_integrated_probability[10000]     2,937.3806 (197.25)   5,028.4893 (119.93)   3,978.6034 (199.81)     913.7850 (424.10)   3,846.8217 (224.50)   1,649.6063 (841.21)        2;0   0.2513 (0.01)          5           1
test_integrated_probability[464]       3,411.5155 (229.09)   6,833.5123 (162.99)   4,487.7287 (225.38)   1,402.2718 (650.82)   4,000.5000 (233.46)   1,717.9066 (876.04)        1;0   0.2228 (0.00)          5           1
test_integrated_probability[1291]      5,614.0634 (376.99)   9,631.9416 (229.73)   7,683.0581 (385.85)   1,479.6320 (686.72)   7,926.8736 (462.60)   1,833.7267 (935.10)        2;0   0.1302 (0.00)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Filter

    query = sa.select(
        prob
    ).filter(
        SkymapTile.id == 1,
        union.columns.hpx.overlaps(SkymapTile.hpx)
    )
------------------------------------------------------------------------------------------------------ benchmark: 10 tests -------------------------------------------------------------------------------------------------------
Name (time in ms)                               Min                     Max                    Mean                StdDev                  Median                    IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_integrated_probability[1]             799.1577 (1.0)          875.7332 (1.0)          831.0050 (1.0)         39.1664 (1.0)          806.7523 (1.0)          71.8397 (1.0)           2;0  1.2034 (1.0)           5           1
test_integrated_probability[2]           1,134.7305 (1.42)       1,292.9483 (1.48)       1,216.3195 (1.46)        67.2266 (1.72)       1,235.5916 (1.53)        115.1157 (1.60)          2;0  0.8222 (0.68)          5           1
test_integrated_probability[10000]       2,522.6801 (3.16)       3,223.5144 (3.68)       2,783.8955 (3.35)       289.6717 (7.40)       2,757.2590 (3.42)        439.1286 (6.11)          1;0  0.3592 (0.30)          5           1
test_integrated_probability[7]           4,036.7135 (5.05)       4,367.3510 (4.99)       4,234.8391 (5.10)       126.5182 (3.23)       4,270.2012 (5.29)        161.3643 (2.25)          2;0  0.2361 (0.20)          5           1
test_integrated_probability[21]         12,200.9976 (15.27)     14,813.3418 (16.92)     13,438.7056 (16.17)    1,108.7471 (28.31)     13,558.6754 (16.81)     1,929.5603 (26.86)         2;0  0.0744 (0.06)          5           1
test_integrated_probability[3593]       28,299.6022 (35.41)     30,119.7907 (34.39)     28,750.3942 (34.60)      768.5056 (19.62)     28,428.5582 (35.24)       505.8970 (7.04)          1;1  0.0348 (0.03)          5           1
test_integrated_probability[59]         34,897.7921 (43.67)     46,224.6987 (52.78)     38,371.6834 (46.18)    4,769.2706 (121.77)    36,361.9995 (45.07)     6,252.2607 (87.03)         1;0  0.0261 (0.02)          5           1
test_integrated_probability[166]        80,958.0754 (101.30)    93,877.5142 (107.20)    86,377.6202 (103.94)   5,556.5277 (141.87)    83,728.7932 (103.79)    9,106.9370 (126.77)        1;0  0.0116 (0.01)          5           1
test_integrated_probability[464]       164,728.8323 (206.13)   177,364.2913 (202.53)   170,980.4794 (205.75)   4,668.1878 (119.19)   171,734.1085 (212.87)    5,836.1636 (81.24)         2;0  0.0058 (0.00)          5           1
test_integrated_probability[1291]      170,526.7840 (213.38)   190,640.8472 (217.69)   182,093.0816 (219.12)   8,005.5448 (204.40)   181,310.8338 (224.74)   11,875.8621 (165.31)        2;0  0.0055 (0.00)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@lpsinger
Copy link
Collaborator Author

lpsinger commented Nov 1, 2022

Here is the EXPLAIN output for each of the above variants of the query.

Nothing

EXPLAIN SELECT sum(skymaptile.probdensity * (upper(anon_1.hpx * skymaptile.hpx) - lower(anon_1.hpx * skymaptile.hpx)) * 3.6331963520923245e-18) AS sum_1 
FROM skymaptile, (SELECT unnest(range_agg(fieldtile.hpx)) AS hpx 
FROM fieldtile) AS anon_1 
WHERE anon_1.hpx && skymaptile.hpx;
Aggregate  (cost=49577.51..49577.52 rows=1 width=8)
  ->  Nested Loop  (cost=47356.70..49365.20 rows=9436 width=72)
        ->  ProjectSet  (cost=47356.43..47356.95 rows=100 width=32)
              ->  Aggregate  (cost=47356.43..47356.44 rows=1 width=32)
                    ->  Seq Scan on fieldtile  (cost=0.00..41545.54 rows=2324354 width=32)
        ->  Index Scan using ix_skymaptile_hpx on skymaptile  (cost=0.28..18.18 rows=189 width=40)
              Index Cond: (hpx && (unnest((range_agg(fieldtile.hpx)))))

Group by

EXPLAIN SELECT sum(skymaptile.probdensity * (upper(anon_1.hpx * skymaptile.hpx) - lower(anon_1.hpx * skymaptile.hpx)) * 3.6331963520923245e-18) AS sum_1 
FROM skymaptile, (SELECT unnest(range_agg(fieldtile.hpx)) AS hpx 
FROM fieldtile) AS anon_1 
WHERE anon_1.hpx && skymaptile.hpx GROUP BY skymaptile.id;
HashAggregate  (cost=49601.10..49603.10 rows=200 width=12)
  Group Key: skymaptile.id
  ->  Nested Loop  (cost=47356.70..49365.20 rows=9436 width=76)
        ->  ProjectSet  (cost=47356.43..47356.95 rows=100 width=32)
              ->  Aggregate  (cost=47356.43..47356.44 rows=1 width=32)
                    ->  Seq Scan on fieldtile  (cost=0.00..41545.54 rows=2324354 width=32)
        ->  Index Scan using ix_skymaptile_hpx on skymaptile  (cost=0.28..18.18 rows=189 width=44)
              Index Cond: (hpx && (unnest((range_agg(fieldtile.hpx)))))

Filter

EXPLAIN SELECT sum(skymaptile.probdensity * (upper(anon_1.hpx * skymaptile.hpx) - lower(anon_1.hpx * skymaptile.hpx)) * 3.6331963520923245e-18) AS sum_1 
FROM skymaptile, (SELECT unnest(range_agg(fieldtile.hpx)) AS hpx 
FROM fieldtile) AS anon_1 
WHERE skymaptile.id = 1 AND (anon_1.hpx && skymaptile.hpx);
Aggregate  (cost=47654.65..47654.66 rows=1 width=8)
  ->  Nested Loop  (cost=47361.44..47653.59 rows=47 width=72)
        Join Filter: ((unnest((range_agg(fieldtile.hpx)))) && skymaptile.hpx)
        ->  ProjectSet  (cost=47356.43..47356.95 rows=100 width=32)
              ->  Aggregate  (cost=47356.43..47356.44 rows=1 width=32)
                    ->  Seq Scan on fieldtile  (cost=0.00..41545.54 rows=2324354 width=32)
        ->  Materialize  (cost=5.02..154.88 rows=94 width=40)
              ->  Bitmap Heap Scan on skymaptile  (cost=5.02..154.41 rows=94 width=40)
                    Recheck Cond: (id = 1)
                    ->  Bitmap Index Scan on skymaptile_pkey  (cost=0.00..4.99 rows=94 width=0)
                          Index Cond: (id = 1)

@lpsinger
Copy link
Collaborator Author

lpsinger commented Nov 1, 2022

...no way. If I just run ANALYZE after populating the database but before doing any queries, it goes fast:

------------------------------------------------------------------------------------------------ benchmark: 10 tests -------------------------------------------------------------------------------------------------
Name (time in ms)                             Min                   Max                  Mean              StdDev                Median                 IQR            Outliers      OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_integrated_probability[1]            18.8879 (1.0)        124.9208 (1.0)         29.1893 (1.0)       19.4207 (1.0)         23.0895 (1.0)        7.3412 (1.0)           2;3  34.2591 (1.0)          31           1
test_integrated_probability[2]            56.2973 (2.98)       181.3801 (1.45)        95.6783 (3.28)      32.8393 (1.69)        91.3968 (3.96)      23.9493 (3.26)          4;1  10.4517 (0.31)         14           1
test_integrated_probability[7]           200.9781 (10.64)      375.3460 (3.00)       243.2496 (8.33)      75.4169 (3.88)       201.7489 (8.74)      70.0991 (9.55)          1;0   4.1110 (0.12)          5           1
test_integrated_probability[21]          360.8992 (19.11)      459.1277 (3.68)       394.2777 (13.51)     39.8368 (2.05)       383.2310 (16.60)     52.5718 (7.16)          1;0   2.5363 (0.07)          5           1
test_integrated_probability[59]          801.1130 (42.41)      890.5245 (7.13)       837.0155 (28.68)     44.3504 (2.28)       811.8800 (35.16)     80.9824 (11.03)         1;0   1.1947 (0.03)          5           1
test_integrated_probability[3593]      1,472.8077 (77.98)    1,582.6067 (12.67)    1,520.1468 (52.08)     44.3360 (2.28)     1,525.8348 (66.08)     68.1387 (9.28)          2;0   0.6578 (0.02)          5           1
test_integrated_probability[166]       2,004.6373 (106.13)   2,258.9078 (18.08)    2,184.3280 (74.83)    102.1068 (5.26)     2,219.2485 (96.11)     77.3767 (10.54)         1;1   0.4578 (0.01)          5           1
test_integrated_probability[10000]     2,741.4926 (145.15)   2,954.1121 (23.65)    2,861.6750 (98.04)     91.6088 (4.72)     2,876.8370 (124.59)   160.6494 (21.88)         2;0   0.3494 (0.01)          5           1
test_integrated_probability[464]       3,466.4168 (183.53)   3,818.0543 (30.56)    3,688.6601 (126.37)   134.8199 (6.94)     3,713.0300 (160.81)   151.8321 (20.68)         1;0   0.2711 (0.01)          5           1
test_integrated_probability[1291]      3,844.2408 (203.53)   4,152.9144 (33.24)    4,048.4129 (138.70)   130.7179 (6.73)     4,123.0790 (178.57)   182.4583 (24.85)         1;0   0.2470 (0.01)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@mcoughlin, do you confirm?

@lpsinger
Copy link
Collaborator Author

lpsinger commented Nov 8, 2022

One difference between this minimal reproducer and Fritz is that this test only has 1 sky map loaded, whereas Fritz has several. Here are the results if in this test I populate the database with 10 sky maps. I do not see a massive slowdown.

With 1 sample sky map

------------------------------------------------------------------------------------------------ benchmark: 10 tests -------------------------------------------------------------------------------------------------
Name (time in ms)                             Min                   Max                  Mean              StdDev                Median                 IQR            Outliers      OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_integrated_probability[1]            11.0577 (1.0)         13.5263 (1.0)         11.3985 (1.0)        0.4287 (1.0)         11.2261 (1.0)        0.2912 (1.0)           5;5  87.7305 (1.0)          59           1
test_integrated_probability[2]            23.5081 (2.13)        26.8126 (1.98)        24.2058 (2.12)       0.7767 (1.81)        23.9705 (2.14)       0.5240 (1.80)          5;5  41.3125 (0.47)         35           1
test_integrated_probability[7]            90.1381 (8.15)       110.2759 (8.15)        94.0476 (8.25)       5.5991 (13.06)       92.9080 (8.28)       2.3952 (8.22)          1;1  10.6329 (0.12)         11           1
test_integrated_probability[21]          232.0645 (20.99)      243.4479 (18.00)      235.7928 (20.69)      4.4554 (10.39)      234.4177 (20.88)      4.2510 (14.60)         1;0   4.2410 (0.05)          5           1
test_integrated_probability[59]          729.5692 (65.98)      746.5468 (55.19)      738.2481 (64.77)      6.8751 (16.04)      739.3323 (65.86)     11.2363 (38.58)         2;0   1.3546 (0.02)          5           1
test_integrated_probability[166]       1,531.3682 (138.49)   1,581.7891 (116.94)   1,558.1001 (136.69)    19.3321 (45.10)    1,554.0402 (138.43)    26.7340 (91.79)         2;0   0.6418 (0.01)          5           1
test_integrated_probability[464]       3,464.9519 (313.35)   3,495.5526 (258.43)   3,474.2212 (304.80)    12.5363 (29.24)    3,467.8746 (308.91)    13.5634 (46.57)         1;0   0.2878 (0.00)          5           1
test_integrated_probability[1291]      3,433.1696 (310.48)   3,623.8435 (267.91)   3,514.6698 (308.34)    73.3296 (171.06)   3,493.7594 (311.22)   100.5699 (345.32)        2;0   0.2845 (0.00)          5           1
test_integrated_probability[3593]      1,486.1203 (134.40)   1,540.4620 (113.89)   1,508.1834 (132.31)    21.1889 (49.43)    1,503.7501 (133.95)    29.6999 (101.98)        2;0   0.6630 (0.01)          5           1
test_integrated_probability[10000]     2,589.9102 (234.22)   3,093.5477 (228.71)   2,883.7734 (252.99)   225.9956 (527.18)   2,948.9611 (262.69)   405.5440 (>1000.0)       1;0   0.3468 (0.00)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

With 10 sample sky maps

-------------------------------------------------------------------------------------------------- benchmark: 10 tests --------------------------------------------------------------------------------------------------
Name (time in ms)                             Min                    Max                   Mean              StdDev                 Median                 IQR            Outliers      OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_integrated_probability[1]            21.1706 (1.0)          22.7905 (1.0)          21.6838 (1.0)        0.3631 (1.0)          21.5935 (1.0)        0.4209 (1.0)          10;2  46.1174 (1.0)          38           1
test_integrated_probability[2]            52.3320 (2.47)         54.3355 (2.38)         53.2419 (2.46)       0.4790 (1.32)         53.2635 (2.47)       0.6622 (1.57)          5;0  18.7822 (0.41)         18           1
test_integrated_probability[7]           218.8883 (10.34)       224.0288 (9.83)        220.5078 (10.17)      2.3000 (6.33)        218.9911 (10.14)      3.3430 (7.94)          1;0   4.5350 (0.10)          5           1
test_integrated_probability[21]          565.1482 (26.70)       573.2150 (25.15)       570.4496 (26.31)      3.2121 (8.85)        571.2060 (26.45)      3.9618 (9.41)          1;0   1.7530 (0.04)          5           1
test_integrated_probability[59]        1,725.4219 (81.50)     2,382.3478 (104.53)    2,035.5994 (93.88)    290.2558 (799.44)    2,030.8478 (94.05)    531.6862 (>1000.0)       2;0   0.4913 (0.01)          5           1
test_integrated_probability[166]       4,087.6836 (193.08)    5,880.2014 (258.01)    4,485.0970 (206.84)   781.9216 (>1000.0)   4,121.7756 (190.88)   543.8086 (>1000.0)       1;1   0.2230 (0.00)          5           1
test_integrated_probability[464]       9,118.3175 (430.71)   11,132.3930 (488.47)   10,332.1933 (476.49)   774.3901 (>1000.0)  10,384.4950 (480.91)   997.4667 (>1000.0)       2;0   0.0968 (0.00)          5           1
test_integrated_probability[1291]      9,218.3386 (435.43)    9,863.1732 (432.77)    9,416.2044 (434.25)   255.8738 (704.75)    9,341.6910 (432.62)   210.5112 (500.09)        1;1   0.1062 (0.00)          5           1
test_integrated_probability[3593]      2,590.5036 (122.36)    2,691.2778 (118.09)    2,639.7896 (121.74)    41.0162 (112.97)    2,633.0964 (121.94)    67.5441 (160.46)        2;0   0.3788 (0.01)          5           1
test_integrated_probability[10000]     2,835.7612 (133.95)    3,270.6751 (143.51)    3,060.5100 (141.14)   158.1637 (435.63)    3,058.3568 (141.63)   185.4849 (440.64)        2;0   0.3267 (0.01)          5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@mcoughlin
Copy link
Contributor

@lpsinger Are these ZTF fields you are trying? How many are in the query?

@lpsinger
Copy link
Collaborator Author

lpsinger commented Nov 8, 2022

@lpsinger Are these ZTF fields you are trying? How many are in the query?

The fields are squares the size of the ZTF field of view, but they are randomly and isotropically distributed. Chip gaps are not represented in the geometry. The number of fields is in the name of the test. For example, in test_integrated_probability[464], there are 464 fields.

@lpsinger
Copy link
Collaborator Author

lpsinger commented Nov 10, 2022

This is what I find using the full ZTF field geometry including the 64 chips.

There is 1 sky map in the database in this example:

------------------------------------------------------------------------------------------------- benchmark: 10 tests -------------------------------------------------------------------------------------------------
Name (time in ms)                             Min                    Max                  Mean              StdDev                Median                 IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_integrated_probability[1]            34.4387 (1.0)          62.4766 (1.0)         37.4379 (1.0)        5.5208 (1.0)         36.1265 (1.0)        1.2000 (1.0)           1;3  26.7109 (1.0)          25           1
test_integrated_probability[2]            91.9531 (2.67)        155.1994 (2.48)        99.3950 (2.65)      18.5475 (3.36)        93.5385 (2.59)       1.6859 (1.40)          1;1  10.0609 (0.38)         11           1
test_integrated_probability[7]           384.1602 (11.15)       423.6755 (6.78)       406.4560 (10.86)     16.0407 (2.91)       413.7318 (11.45)     24.1189 (20.10)         2;0   2.4603 (0.09)          5           1
test_integrated_probability[21]          830.2037 (24.11)       931.9891 (14.92)      875.8630 (23.40)     43.4034 (7.86)       869.2347 (24.06)     75.7489 (63.13)         2;0   1.1417 (0.04)          5           1
test_integrated_probability[59]        2,310.3455 (67.09)     2,678.8383 (42.88)    2,512.1459 (67.10)    144.8364 (26.23)    2,493.1771 (69.01)    215.7189 (179.77)        2;0   0.3981 (0.01)          5           1
test_integrated_probability[166]       4,586.8276 (133.19)    4,863.7372 (77.85)    4,700.9019 (125.57)   113.0962 (20.49)    4,692.7142 (129.90)   179.1712 (149.31)        2;0   0.2127 (0.01)          5           1
test_integrated_probability[464]       8,925.8934 (259.18)    9,590.7493 (153.51)   9,190.6345 (245.49)   282.5487 (51.18)    9,157.2159 (253.48)   471.4945 (392.92)        1;0   0.1088 (0.00)          5           1
test_integrated_probability[1291]      9,604.4272 (278.88)   10,660.5884 (170.63)   9,834.7949 (262.70)   462.6364 (83.80)    9,620.4849 (266.30)   317.8310 (264.87)        1;1   0.1017 (0.00)          5           1
test_integrated_probability[3593]      3,853.1448 (111.88)    3,978.8382 (63.69)    3,896.3338 (104.07)    48.2145 (8.73)     3,881.2212 (107.43)    42.2920 (35.24)         1;1   0.2567 (0.01)          5           1
test_integrated_probability[10000]     7,837.0196 (227.56)    8,017.1056 (128.32)   7,888.3839 (210.71)    73.2443 (13.27)    7,863.0755 (217.65)    62.0016 (51.67)         1;1   0.1268 (0.00)          5           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

And there are 10 sky maps in the database in this example:

--------------------------------------------------------------------------------------------------- benchmark: 10 tests ----------------------------------------------------------------------------------------------------
Name (time in ms)                              Min                    Max                   Mean              StdDev                 Median                   IQR            Outliers      OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_integrated_probability[1]             56.5861 (1.0)          61.4518 (1.0)          59.1155 (1.0)        1.9068 (1.0)          58.7125 (1.0)          3.6693 (1.0)           7;0  16.9160 (1.0)          15           1
test_integrated_probability[2]            150.1879 (2.65)        174.4265 (2.84)        159.5015 (2.70)       7.6776 (4.03)        159.6743 (2.72)         6.3202 (1.72)          2;1   6.2695 (0.37)          7           1
test_integrated_probability[7]            645.7571 (11.41)       714.0363 (11.62)       675.7648 (11.43)     25.1640 (13.20)       677.3949 (11.54)       28.5004 (7.77)          2;0   1.4798 (0.09)          5           1
test_integrated_probability[21]         1,648.6004 (29.13)     1,693.6755 (27.56)     1,664.0826 (28.15)     19.1674 (10.05)     1,654.8210 (28.19)       28.2588 (7.70)          1;0   0.6009 (0.04)          5           1
test_integrated_probability[59]         4,779.0983 (84.46)     5,010.1037 (81.53)     4,862.0501 (82.25)     88.7476 (46.54)     4,843.8398 (82.50)       95.5894 (26.05)         1;0   0.2057 (0.01)          5           1
test_integrated_probability[166]       10,712.0158 (189.30)   11,109.7142 (180.79)   10,903.5250 (184.44)   161.9381 (84.92)    10,919.7859 (185.99)     267.0053 (72.77)         2;0   0.0917 (0.01)          5           1
test_integrated_probability[464]       21,817.7918 (385.57)   23,830.8242 (387.80)   22,591.9119 (382.17)   823.4461 (431.84)   22,442.8427 (382.25)   1,257.7642 (342.78)        1;0   0.0443 (0.00)          5           1
test_integrated_probability[1291]      22,519.3937 (397.97)   23,161.2688 (376.90)   22,926.6532 (387.83)   279.6536 (146.66)   23,035.6077 (392.35)     456.7341 (124.47)        1;0   0.0436 (0.00)          5           1
test_integrated_probability[3593]       5,856.6421 (103.50)    5,995.5162 (97.56)     5,916.0688 (100.08)    50.1478 (26.30)     5,906.3381 (100.60)      43.4767 (11.85)         2;0   0.1690 (0.01)          5           1
test_integrated_probability[10000]      8,136.5761 (143.79)    9,341.2205 (152.01)    8,517.2315 (144.08)   509.3666 (267.13)    8,217.8855 (139.97)     659.2601 (179.67)        1;0   0.1174 (0.01)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@lpsinger
Copy link
Collaborator Author

If I have 10 sky maps in the database, and I use the full ZTF field geometry, and I neglect to run ANALYZE, then this is what I get:

------------------------------------------------------------------------------------------ benchmark: 10 tests -------------------------------------------------------------------------------------------
Name (time in s)                            Min                 Max                Mean            StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_integrated_probability[1]           1.7442 (1.0)        1.8662 (1.0)        1.7792 (1.0)      0.0521 (1.53)       1.7505 (1.0)       0.0636 (1.21)          1;0  0.5621 (1.0)           5           1
test_integrated_probability[2]           3.3370 (1.91)       3.5745 (1.92)       3.4521 (1.94)     0.0905 (2.67)       3.4385 (1.96)      0.1295 (2.46)          2;0  0.2897 (0.52)          5           1
test_integrated_probability[7]          11.7087 (6.71)      11.9278 (6.39)      11.8080 (6.64)     0.0983 (2.89)      11.8124 (6.75)      0.1806 (3.44)          2;0  0.0847 (0.15)          5           1
test_integrated_probability[21]         33.8385 (19.40)     38.4501 (20.60)     36.2667 (20.38)    2.1651 (63.76)     37.3291 (21.32)     3.8827 (73.89)         3;0  0.0276 (0.05)          5           1
test_integrated_probability[59]         87.7415 (50.30)     91.1571 (48.85)     88.8030 (49.91)    1.4680 (43.23)     87.9102 (50.22)     1.9537 (37.18)         1;0  0.0113 (0.02)          5           1
test_integrated_probability[166]       273.7356 (156.94)   285.0346 (152.73)   278.1129 (156.31)   5.7804 (170.22)   274.0334 (156.54)   10.2311 (194.69)        1;0  0.0036 (0.01)          5           1
test_integrated_probability[464]       538.7984 (308.91)   549.9840 (294.70)   545.2591 (306.46)   5.1591 (151.92)   548.1085 (313.11)    8.9501 (170.32)        1;0  0.0018 (0.00)          5           1
test_integrated_probability[1291]      522.7390 (299.70)   533.7231 (285.99)   529.3225 (297.51)   4.6281 (136.29)   529.5399 (302.50)    7.6057 (144.73)        1;0  0.0019 (0.00)          5           1
test_integrated_probability[3593]       59.3302 (34.02)     68.1017 (36.49)     62.2357 (34.98)    4.0168 (118.28)    59.6336 (34.07)     6.2806 (119.52)        1;0  0.0161 (0.03)          5           1
test_integrated_probability[10000]       7.6889 (4.41)       7.7757 (4.17)       7.7330 (4.35)     0.0340 (1.0)        7.7345 (4.42)      0.0526 (1.0)           2;0  0.1293 (0.23)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

And at this point, it does take minutes.

@lpsinger lpsinger mentioned this pull request Nov 10, 2022
@lpsinger lpsinger force-pushed the benchmark-integrated-probability branch 4 times, most recently from 2d9e4b8 to edbbb5c Compare January 27, 2023 20:03
@jvansanten
Copy link

@mcoughlin asked me to take a look at this, and I could reproduce the slowdown that @lpsinger observed with the full ZTF field geometry and ~50 skymaps in the database (following the README and taking the latest 50 superevents from GraceDB). The main problem is the unnest(range_agg(fieldtile.hpx)) subquery, which is a useful optimization, but unfortunately opaque to the query planner. This can be worked around by

  1. Adjusting the expected result size for unnest(multirange)`, and/or
  2. Adjusting the default planner cost constants to better reflect backing storage performance.

A more detailed explanation is below.

Looking at the "bad" plan, I see that the planner expects unnest(range_agg(fieldtile.hpx)) to return 100 rows, when in fact it returns 2062 rows for the sample query:

 Aggregate  (cost=40287.11..40287.12 rows=1 width=8) (actual time=4678.400..4678.402 rows=1 loops=1)
   ->  Nested Loop  (cost=1078.94..40056.71 rows=10240 width=62) (actual time=233.216..4677.648 rows=2495 loops=1)
         Join Filter: ((unnest((range_agg(fieldtile.hpx)))) && skymaptile.hpx)
         Rows Removed by Join Filter: 39587905
         ->  Bitmap Heap Scan on skymaptile  (cost=907.14..9163.15 rows=20480 width=30) (actual time=4.501..7.367 rows=19200 loops=1)
               Recheck Cond: (id = 50)
               Heap Blocks: exact=160
               ->  Bitmap Index Scan on skymaptile_pkey  (cost=0.00..902.02 rows=20480 width=0) (actual time=4.452..4.452 rows=19200 loops=1)
                     Index Cond: (id = 50)
         ->  Materialize  (cost=171.80..173.82 rows=100 width=32) (actual time=0.000..0.083 rows=2062 loops=19200)
               ->  ProjectSet  (cost=171.80..172.32 rows=100 width=32) (actual time=7.955..8.323 rows=2062 loops=1)
 !result size underestimated by a factor 20!  --->  ^^^^^^^^                                     ^^^^^^^^^
                     ->  Aggregate  (cost=171.80..171.81 rows=1 width=32) (actual time=7.943..7.944 rows=1 loops=1)
                           ->  Index Scan using ix_fieldtile_id on fieldtile  (cost=0.42..159.42 rows=4950 width=22) (actual time=0.116..1.888 ro
ws=5661 loops=1)
                                 Index Cond: ((id >= 200) AND (id <= 220))
 Planning Time: 0.312 ms
 Execution Time: 4678.499 ms

Since the query planner expects only 100 32-byte rows (i.e. less than a memory page), it also expects that the outer product of (filtered subsets of) skymaptile and fieldtile will be quicker to form from sequential reads than potentially slow random reads into skymaptile (more on this later). This happens because the query planner treats functions as black boxes by default, and assumes constants for the number of rows they return. These are however configurable and stored in pg_catalog.pg_proc. In a default instance from hub.docker.io/postgres:14, for example:

postgres=# select oid,proname, prosrc,procost,prorows,prosupport from pg_catalog.pg_proc where proname='unnest';
 oid  | proname |      prosrc       | procost | prorows |      prosupport      
------+---------+-------------------+---------+---------+----------------------
 2331 | unnest  | array_unnest      |       1 |     100 | array_unnest_support
 3322 | unnest  | tsvector_unnest   |       1 |      10 | -
 1293 | unnest  | multirange_unnest |       1 |     100 | -

Here you can see that both unnest(array) and unnest(multirange) are expected to return 100 rows (though the array version does have a support function that passes the estimated array size on to the planner). If I increase the default estimate to 2000, the query planner reverts to the "good" plan that is nearly 40x faster (on my M1 MacBook):

postgres=# UPDATE pg_catalog.pg_proc SET prorows = 2000 WHERE prosrc = 'multirange_unnest';
UPDATE 1
postgres=# explain (analyze) SELECT sum(skymaptile.probdensity * (upper(anon_1.hpx * skymaptile.hpx) - lower(anon_1.hpx * skymaptile.hpx)) * 3.6331963520923245e-18) AS sum_1 
        FROM skymaptile, (SELECT unnest(range_agg(fieldtile.hpx)) AS hpx 
        FROM fieldtile 
        WHERE fieldtile.id BETWEEN 200 AND 220) AS anon_1 
        WHERE skymaptile.id = 50 AND (anon_1.hpx && skymaptile.hpx);
                                                                         QUERY PLAN                                                              
           
-------------------------------------------------------------------------------------------------------------------------------------------------
-----------
 Aggregate  (cost=461531.82..461531.83 rows=1 width=8) (actual time=110.478..110.479 rows=1 loops=1)
   ->  Nested Loop  (cost=172.09..456923.82 rows=204800 width=62) (actual time=32.839..109.390 rows=2495 loops=1)
         ->  ProjectSet  (cost=171.80..181.82 rows=2000 width=32) (actual time=32.364..32.648 rows=2062 loops=1)
               ->  Aggregate  (cost=171.80..171.81 rows=1 width=32) (actual time=32.359..32.359 rows=1 loops=1)
                     ->  Index Scan using ix_fieldtile_id on fieldtile  (cost=0.42..159.42 rows=4950 width=22) (actual time=0.099..2.736 rows=566
1 loops=1)
                           Index Cond: ((id >= 200) AND (id <= 220))
         ->  Index Scan using ix_skymaptile_hpx on skymaptile  (cost=0.29..226.31 rows=205 width=30) (actual time=0.017..0.037 rows=1 loops=2062)
               Index Cond: (hpx && (unnest((range_agg(fieldtile.hpx)))))
               Filter: (id = 50)
               Rows Removed by Filter: 62
 Planning Time: 3.445 ms
 JIT:
   Functions: 15
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 10.532 ms, Inlining 0.000 ms, Optimization 3.562 ms, Emission 17.687 ms, Total 31.781 ms
 Execution Time: 121.454 ms

The other factor that the planner uses to find the best plan are the estimated costs of each operation, in an arbitrary time unit. Out of the box Postgres assumes that random reads are on average 4x slower than sequential reads, which will lead the query planner to prefer sequential scans (or their cousins, bitmap index scans) that examine up to 4x as many rows as the equivalent index scan. This makes sense for mechanical storage, but not for solid state, where random reads are not that much different. If you're using storage backed by SSDs in production, it may make sense to adjust the planner constants accordingly. On my system I can recover the "good" plan by setting the row estimate for unnest back to 100, bu the random page cost to 1, but YMMV:

postgres=# UPDATE pg_catalog.pg_proc SET prorows = 100 WHERE prosrc = 'multirange_unnest';
UPDATE 1
postgres=# SET random_page_cost=1;
SET
postgres=# explain (analyze) SELECT sum(skymaptile.probdensity * (upper(anon_1.hpx * skymaptile.hpx) - lower(anon_1.hpx * skymaptile.hpx)) * 3.6331963520923245e-18) AS sum_1 
        FROM skymaptile, (SELECT unnest(range_agg(fieldtile.hpx)) AS hpx 
        FROM fieldtile 
        WHERE fieldtile.id BETWEEN 200 AND 220) AS anon_1 
        WHERE skymaptile.id = 50 AND (anon_1.hpx && skymaptile.hpx);
                                                                           QUERY PLAN                                                            
               
-------------------------------------------------------------------------------------------------------------------------------------------------
---------------
 Aggregate  (cost=33856.22..33856.23 rows=1 width=8) (actual time=116.355..116.357 rows=1 loops=1)
   ->  Nested Loop  (cost=143.09..33625.82 rows=10240 width=62) (actual time=9.596..115.052 rows=2495 loops=1)
         ->  ProjectSet  (cost=142.80..143.32 rows=100 width=32) (actual time=8.963..9.344 rows=2062 loops=1)
               ->  Aggregate  (cost=142.80..142.81 rows=1 width=32) (actual time=8.917..8.918 rows=1 loops=1)
                     ->  Index Only Scan using fieldtile_pkey on fieldtile  (cost=0.42..130.42 rows=4950 width=22) (actual time=0.147..1.976 rows
=5661 loops=1)
                           Index Cond: ((id >= 200) AND (id <= 220))
                           Heap Fetches: 0
         ->  Index Scan using ix_skymaptile_hpx on skymaptile  (cost=0.29..332.77 rows=205 width=30) (actual time=0.022..0.051 rows=1 loops=2062)
               Index Cond: (hpx && (unnest((range_agg(fieldtile.hpx)))))
               Filter: (id = 50)
               Rows Removed by Filter: 62
 Planning Time: 1.176 ms
 Execution Time: 116.647 ms

Hope that helps!

@jvansanten
Copy link

While investigating temporary tables as a workaround for the unnest(multirange) size misestimation detailed above, @lpsinger found that the query planner can still flip back to the bad plan given enough skymaps. I could also reproduce by loading 48 more skymaps than in the test above, for a total of 98, with 1881600 tiles (a number that will become important later). I think this happens because the query planner vastly overestimates the selectivity (fraction of rows selected) of the range && range operator.

Looking at a plan for a similar query (without the skymaptile.id restriction, and that still uses the "good" plan), I see that the planner expects each iteration of the loop to match 18816 rows, whereas it actually matches 125 on average:

postgres=# explain (analyze) SELECT sum(skymaptile.probdensity * (upper(bigfieldtile.hpx * skymaptile.hpx) - lower(bigfieldtile.hpx * skymaptile.hpx)) * 3.6331963520923245e-18) AS sum_1 
        FROM skymaptile, bigfieldtile
        WHERE (bigfieldtile.hpx && skymaptile.hpx);
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1633814.64..1633814.65 rows=1 width=8) (actual time=284.006..284.007 rows=1 loops=1)
->  Nested Loop  (cost=0.41..1197330.48 rows=19399296 width=52) (actual time=77.402..205.777 rows=257065 loops=1)
      ->  Seq Scan on bigfieldtile  (cost=0.00..34.62 rows=2062 width=22) (actual time=0.021..0.182 rows=2062 loops=1)
      ->  Index Scan using ix_skymaptile_hpx on skymaptile  (cost=0.41..392.49 rows=18816 width=30) (actual time=0.013..0.052 rows=125 loops=2062)
      !result size overestimated by a factor 150! ---------------------------> ^^^^^^^^^^                                     ^^^^^^^^ 
            Index Cond: (hpx && bigfieldtile.hpx)

This happens because the selectivity function just returns 0.01 whenever one of one of the operands is not a constant. For a constant operand, it uses sample ranges to estimate the overlap fraction. There may be a way to extend this to non-constant operands, but I imagine it involves quite a bit of extra plumbing.

For this particular case I was able to convince the planner to use the right plan by adding a multicolumn index that includes both id and hpx. This should at least break the scaling of the result set with number of skymaps, since the (properly estimated) selectivity of the integer-equality operator gets multiplied into the total selectivity:

postgres=# create index ix_skymaptile_id_hpx on skymaptile using gist (id gist_int4_ops,hpx);
CREATE INDEX
postgres=# analyze skymaptile;
ANALYZE
postgres=# explain (analyze) SELECT sum(skymaptile.probdensity * (upper(bigfieldtile.hpx * skymaptile.hpx) - lower(bigfieldtile.hpx * skymaptile.hpx)) * 3.6331963520923245e-18) AS sum_1 
        FROM skymaptile, bigfieldtile
        WHERE (bigfieldtile.hpx && skymaptile.hpx) AND skymaptile.id=50;
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=98945.54..98945.55 rows=1 width=8) (actual time=77.283..77.285 rows=1 loops=1)
   ->  Nested Loop  (cost=0.41..94727.08 rows=187487 width=62) (actual time=0.181..75.762 rows=2495 loops=1)
         ->  Seq Scan on bigfieldtile  (cost=0.00..33.04 rows=1904 width=32) (actual time=0.019..0.318 rows=2062 loops=1)
         ->  Index Scan using ix_skymaptile_id_hpx on skymaptile  (cost=0.41..47.76 rows=197 width=30) (actual time=0.034..0.036 rows=1 loops=2062)
               Index Cond: ((id = 50) AND (hpx && bigfieldtile.hpx))
 Planning Time: 3.851 ms
 Execution Time: 77.333 ms

This index is ~50% larger than the existing spgist index (which itself was almost as large as the relation itself), so keeping both around is likely not an option.

@lpsinger lpsinger force-pushed the benchmark-integrated-probability branch from edbbb5c to 21cec74 Compare February 7, 2023 20:22
@lpsinger
Copy link
Collaborator Author

lpsinger commented Feb 7, 2023

@jvansanten, I tried adding a custom function with hardcoded planner stats as you suggested (21cec74), but the plan is still flipping between a "good" one and a "bad" one.

Good

2023-02-07 14:30:21.492 EST [55253] LOG:  duration: 759.789 ms  plan:
	Query Text: SELECT sum(skymaptile.probdensity * (upper(anon_1.hpx * skymaptile.hpx) - lower(anon_1.hpx * skymaptile.hpx)) * 3.6331963520923245e-18) AS sum_1 
	FROM skymaptile, (SELECT multirange_unnest(range_agg(fieldtile.hpx)) AS hpx 
	FROM fieldtile 
	WHERE fieldtile.id BETWEEN 100 AND 120) AS anon_1 
	WHERE skymaptile.id = 1 AND (anon_1.hpx && skymaptile.hpx)
	Aggregate  (cost=30546.21..30546.22 rows=1 width=8) (actual time=759.774..759.775 rows=1 loops=1)
	  ->  Nested Loop  (cost=11634.47..26046.21 rows=200000 width=62) (actual time=19.271..753.897 rows=8008 loops=1)
	        ->  ProjectSet  (cost=11634.19..11644.21 rows=2000 width=32) (actual time=19.200..27.420 rows=13401 loops=1)
	              ->  Aggregate  (cost=11634.19..11634.20 rows=1 width=32) (actual time=19.076..19.077 rows=1 loops=1)
	                    ->  Bitmap Heap Scan on fieldtile  (cost=642.06..11587.94 rows=18501 width=22) (actual time=3.830..5.599 rows=17110 loops=1)
	                          Recheck Cond: ((id >= 100) AND (id <= 120))
	                          Heap Blocks: exact=127
	                          ->  Bitmap Index Scan on fieldtile_pkey  (cost=0.00..637.44 rows=18501 width=0) (actual time=3.812..3.812 rows=17110 loops=1)
	                                Index Cond: ((id >= 100) AND (id <= 120))
	        ->  Index Scan using ix_skymaptile_hpx on skymaptile  (cost=0.28..5.19 rows=200 width=30) (actual time=0.054..0.054 rows=1 loops=13401)
	              Index Cond: (hpx && (multirange_unnest((range_agg(fieldtile.hpx)))))
	              Filter: (id = 1)

Bad

2023-02-07 14:43:46.175 EST [61922] LOG:  duration: 61627.196 ms  plan:
	Query Text: SELECT sum(skymaptile.probdensity * (upper(anon_1.hpx * skymaptile.hpx) - lower(anon_1.hpx * skymaptile.hpx)) * 3.6331963520923245e-18) AS sum_1 
	FROM skymaptile, (SELECT multirange_unnest(range_agg(fieldtile.hpx)) AS hpx 
	FROM fieldtile 
	WHERE fieldtile.id BETWEEN 100 AND 120) AS anon_1 
	WHERE skymaptile.id = 1 AND (anon_1.hpx && skymaptile.hpx)
	Aggregate  (cost=642873.84..642873.85 rows=1 width=8) (actual time=61627.177..61627.180 rows=1 loops=1)
	  ->  Nested Loop  (cost=11689.12..638388.47 rows=199350 width=62) (actual time=23282.378..61622.446 rows=8199 loops=1)
	        Join Filter: ((multirange_unnest((range_agg(fieldtile.hpx)))) && skymaptile.hpx)
	        Rows Removed by Join Filter: 268011801
	        ->  Index Scan using skymaptile_pkey on skymaptile  (cost=0.56..28614.89 rows=19935 width=30) (actual time=0.009..51.661 rows=20000 loops=1)
	              Index Cond: (id = 1)
	        ->  Materialize  (cost=11688.56..11728.58 rows=2000 width=32) (actual time=0.001..0.887 rows=13401 loops=20000)
	              ->  ProjectSet  (cost=11688.56..11698.58 rows=2000 width=32) (actual time=17.491..22.253 rows=13401 loops=1)
	                    ->  Aggregate  (cost=11688.56..11688.57 rows=1 width=32) (actual time=17.362..17.364 rows=1 loops=1)
	                          ->  Bitmap Heap Scan on fieldtile  (cost=596.67..11645.57 rows=17194 width=22) (actual time=3.047..4.924 rows=17110 loops=1)
	                                Recheck Cond: ((id >= 100) AND (id <= 120))
	                                Heap Blocks: exact=127
	                                ->  Bitmap Index Scan on fieldtile_pkey  (cost=0.00..592.37 rows=17194 width=0) (actual time=3.027..3.027 rows=17110 loops=1)
	                                      Index Cond: ((id >= 100) AND (id <= 120))

@jvansanten
Copy link

jvansanten commented Feb 7, 2023

I think the row estimate for unnest() may not have been the primary issue after all. A bigger problem, especially when the skymaptile table becomes large, is that postgres assumes that the selectivity of range && range is 0.01 when neither is a constant, i.e. joining range tables of size N and M will return N*M*0.01 rows. I suspect that since the rows are narrow enough that 100 of them fit in a memory page, the query planner thinks that it will have to read every page anyway, and flips to the bad, sequential-read plan. To test this, I built postgres (16-dev) from source, and explained the query (with 1881600 entries in skymaptile, as above), getting the bad plan as expected:

 Aggregate  (cost=594947.83..594947.84 rows=1 width=8) (actual time=3186.216..3186.219 rows=1 loops=1)
   ->  Nested Loop  (cost=643.97..590648.85 rows=191066 width=62) (actual time=158.270..3185.510 rows=2495 loops=1)
         Join Filter: (bigfieldtile.hpx && skymaptile.hpx)
         Rows Removed by Join Filter: 39587905
         ->  Bitmap Heap Scan on skymaptile  (cost=643.97..17411.85 rows=20070 width=30) (actual time=3.896..7.465 rows=19200 loops=1)
               Recheck Cond: (id = 50)
               Heap Blocks: exact=160
               ->  Bitmap Index Scan on skymaptile_pkey  (cost=0.00..638.95 rows=20070 width=0) (actual time=3.630..3.630 rows=19200 loops=1)
                     Index Cond: (id = 50)
         ->  Materialize  (cost=0.00..42.56 rows=1904 width=32) (actual time=0.000..0.042 rows=2062 loops=19200)
               ->  Seq Scan on bigfieldtile  (cost=0.00..33.04 rows=1904 width=32) (actual time=0.024..0.225 rows=2062 loops=1)
 Planning Time: 3.004 ms
 Execution Time: 3186.332 ms

Reducing the default selectivity for && by a factor 10 restored the good plan:

 Aggregate  (cost=197480.79..197480.80 rows=1 width=8) (actual time=409.447..409.447 rows=1 loops=1)
   ->  Nested Loop  (cost=0.41..193181.80 rows=191066 width=62) (actual time=6.226..408.563 rows=2495 loops=1)
         ->  Seq Scan on bigfieldtile  (cost=0.00..33.04 rows=1904 width=32) (actual time=0.050..0.224 rows=2062 loops=1)
         ->  Index Scan using ix_skymaptile_hpx on skymaptile  (cost=0.41..101.24 rows=20 width=30) (actual time=0.080..0.198 rows=1 loops=2062)
               Index Cond: (hpx && bigfieldtile.hpx)
               Filter: (id = 50)
               Rows Removed by Filter: 123
 Planning Time: 7.473 ms
 Execution Time: 409.583 ms

It might be worth taking this question to pgsql-performance and see what the experts make of it. The answer may well end up being "your ranges are too small; use bigger ones," but maybe there's something we haven't thought of yet.

Another option that may work well enough for now (and in versions of postgres that cloud providers will give you) is to use a multicolumn index that includes both the skymap id and pixel range. In my limited testing that also coaxed postgres to use the "good" plan, but again I'm not sure what impact that has on other queries you might need to support.

@mcoughlin
Copy link
Contributor

@lpsinger This seems like it may be good to raise at some higher level? What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants