Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs
Date: 2023-06-26 18:15:07
Message-ID: 7af1464e-2e24-cfb1-b6d4-1544757f8cfa@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, all!

On 24.06.2023 14:23, Tomas Vondra wrote:
> On 6/24/23 02:08, Tom Lane wrote:
>> Tomas Vondra<tomas(dot)vondra(at)enterprisedb(dot)com> writes:
>>> The problem is that the selectivity for "IS NULL" is estimated using the
>>> table-level statistics. But the LEFT JOIN entirely breaks the idea that
>>> the null_frac has anything to do with NULLs in the join result.
>> Right.
>>
>>> I wonder how to improve this, say by adjusting the IS NULL selectivity
>>> when we know to operate on the outer side of the join. We're able to
>>> do this for antijoins, so maybe we could do that here, somehow?
>> This mess is part of the long-term plan around the work I've been doing
>> on outer-join-aware Vars. We now have infrastructure that can let
>> the estimator routines see "oh, this Var isn't directly from a scan
>> of its table, it's been passed through a potentially-nulling outer
>> join --- and I can see which one". I don't have more than vague ideas
>> about what happens next, but that is clearly an essential step on the
>> road to doing better.
>>
> I was wondering if that work on outer-join-aware Vars could help with
> this, but I wasn't following it very closely. I agree the ability to
> check if the Var could be NULL due to an outer join seems useful, as it
> says whether applying raw attribute statistics makes sense or not.
>
> I was thinking about what to do for the case when that's not possible,
> i.e. when the Var refers to nullable side of the join. Knowing that this
> is happening is clearly not enough - we need to know how many new NULLs
> are "injected" into the join result, and "communicate" that to the
> estimation routines.
>
> Attached is a very ugly experimental patch doing that, and with it the
> estimate changes to this:
>
> QUERY PLAN
> ----------------------------------------------------------------------
> Hash Left Join (cost=3.25..18179.88 rows=999900 width=16)
> (actual time=0.528..596.151 rows=999900 loops=1)
> Hash Cond: (large.id = small.id)
> Filter: ((small.id IS NULL) OR
> (large.a = ANY ('{1000,2000,3000,4000,5000}'::integer[])))
> Rows Removed by Filter: 100
> -> Seq Scan on large (cost=0.00..14425.00 rows=1000000 width=8)
> (actual time=0.069..176.138 rows=1000000 loops=1)
> -> Hash (cost=2.00..2.00 rows=100 width=8)
> (actual time=0.371..0.373 rows=100 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 12kB
> -> Seq Scan on small (cost=0.00..2.00 rows=100 width=8)
> (actual time=0.032..0.146 rows=100 loops=1)
> Planning Time: 3.845 ms
> Execution Time: 712.405 ms
> (10 rows)
>
> Seems nice, but. The patch is pretty ugly, I don't claim it works for
> other queries or that this is exactly what we should do. It calculates
> "unmatched frequency" next to eqjoinsel_inner, stashes that info into
> sjinfo and the estimator (nulltestsel) then uses that to adjust the
> nullfrac it gets from the statistics.
>
> The good thing is this helps even for IS NULL checks on non-join-key
> columns (where we don't switch to an antijoin), but there's a couple
> things that I dislike ...
>
> 1) It's not restricted to outer joins or anything like that (this is
> mostly just my laziness / interest in one particular query, but also
> something the outer-join-aware patch might help with).
>
> 2) We probably don't want to pass this kind of information through
> sjinfo. That was the simplest thing for an experimental patch, but I
> suspect it's not the only piece of information we may need to pass to
> the lower levels of estimation code.
>
> 3) I kinda doubt we actually want to move this responsibility (to
> consider fraction of unmatched rows) to the low-level estimation
> routines (e.g. nulltestsel and various others). AFAICS this just
> "introduces NULLs" into the relation, so maybe we could "adjust" the
> attribute statistics (in examine_variable?) by inflating null_frac and
> modifying the other frequencies in MCV/histogram.
>
> 4) But I'm not sure we actually want to do that in these low-level
> selectivity functions. The outer join essentially produces output with
> two subsets - one with matches on the outer side, one without them. But
> the side without matches has NULLs in all columns. In a way, we know
> exactly how are these columns correlated - if we do the usual estimation
> (even with the null_frac adjusted), we just throw this information away.
> And when there's a lot of rows without a match, that seems bad.
>
> So maybe we should split the join estimate into two parts, one for each
> subset of the join result. One for the rows with a match (and then we
> can just do what we do now, with the attribute stats we already have).
> And one for the "unmatched part" where we know the values on the outer
> side are NULL (and then we can easily "fake" stats with null_frac=1.0).
>
>
> I really hope what I just wrote makes at least a little bit of sense.
>
>
> regards
>
I am also interested in this problem.

I did some refactoring of the source code in the patch, moved the
calculation of unmatched_fraction to eqjoinsel_inner.
I wrote myself in this commit as a co-author, if you don't mind, and I'm
going to continue working.

On 26.06.2023 12:22, Andrey Lepikhov wrote:
> On 24/6/2023 17:23, Tomas Vondra wrote:
>> I really hope what I just wrote makes at least a little bit of sense.
> Throw in one more example:
>
> SELECT i AS id INTO l FROM generate_series(1,100000) i;
> CREATE TABLE r (id int8, v text);
> INSERT INTO r (id, v) VALUES (1, 't'), (-1, 'f');
> ANALYZE l,r;
> EXPLAIN ANALYZE
> SELECT * FROM l LEFT OUTER JOIN r ON (r.id = l.id) WHERE r.v IS NULL;
>
> Here you can see the same kind of underestimation:
> Hash Left Join  (... rows=500 width=14) (... rows=99999 ...)
>
> So the eqjoinsel_unmatch_left() function should be modified for the
> case where nd1<nd2.
>
Unfortunately, this patch could not fix the cardinality calculation in
this request, I'll try to look and figure out what is missing here.

*postgres=# SELECT i AS id INTO l FROM generate_series(1,100000) i;
CREATE TABLE r (id int8, v text);
INSERT INTO r (id, v) VALUES (1, 't'), (-1, 'f');
ANALYZE l,r;
EXPLAIN ANALYZE
SELECT * FROM l LEFT OUTER JOIN r ON (r.id = l.id) WHERE r.v IS NULL;
SELECT 100000
CREATE TABLE
INSERT 0 2
ANALYZE
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=1.04..1819.07 rows=1 width=14) (actual
time=0.143..114.792 rows=99999 loops=1)
   Hash Cond: (l.id = r.id)
   Filter: (r.v IS NULL)
   Rows Removed by Filter: 1
   ->  Seq Scan on l  (cost=0.00..1443.00 rows=100000 width=4) (actual
time=0.027..35.278 rows=100000 loops=1)
   ->  Hash  (cost=1.02..1.02 rows=2 width=10) (actual
time=0.014..0.017 rows=2 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on r  (cost=0.00..1.02 rows=2 width=10) (actual
time=0.005..0.007 rows=2 loops=1)
 Planning Time: 0.900 ms
 Execution Time: 126.180 ms
(10 rows)*

As in the previous query, even with applied the patch, the cardinality
is calculated poorly here, I would even say that it has become worse:

EXPLAIN ANALYZE
  SELECT * FROM large FULL JOIN small ON (large.id = small.id)
WHERE (large.a IS NULL);

MASTER:

*QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Merge Full Join  (cost=127921.69..299941.59 rows=56503 width=16)
(actual time=795.092..795.094 rows=0 loops=1)
   Merge Cond: (small.id = large.id)
   Filter: (large.a IS NULL)
   Rows Removed by Filter: 1000000
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8) (actual
time=0.038..0.046 rows=100 loops=1)
         Sort Key: small.id
         Sort Method: quicksort  Memory: 29kB
         ->  Seq Scan on small  (cost=0.00..32.60 rows=2260 width=8)
(actual time=0.013..0.022 rows=100 loops=1)
   ->  Materialize  (cost=127763.19..132763.44 rows=1000050 width=8)
(actual time=363.016..649.103 rows=1000000 loops=1)
         ->  Sort  (cost=127763.19..130263.31 rows=1000050 width=8)
(actual time=363.012..481.480 rows=1000000 loops=1)
               Sort Key: large.id
               Sort Method: external merge  Disk: 17664kB
               ->  Seq Scan on large (cost=0.00..14425.50 rows=1000050
width=8) (actual time=0.009..111.166 rows=1000000 loops=1)
 Planning Time: 0.124 ms
 Execution Time: 797.139 ms
(15 rows)*

With patch:

*QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Hash Full Join  (cost=3.25..18179.25 rows=999900 width=16) (actual
time=261.480..261.482 rows=0 loops=1)
   Hash Cond: (large.id = small.id)
   Filter: (large.a IS NULL)
   Rows Removed by Filter: 1000000
   ->  Seq Scan on large  (cost=0.00..14425.00 rows=1000000 width=8)
(actual time=0.006..92.827 rows=1000000 loops=1)
   ->  Hash  (cost=2.00..2.00 rows=100 width=8) (actual
time=0.032..0.034 rows=100 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 12kB
         ->  Seq Scan on small  (cost=0.00..2.00 rows=100 width=8)
(actual time=0.008..0.015 rows=100 loops=1)
 Planning Time: 0.151 ms
 Execution Time: 261.529 ms
(10 rows)
*

In addition, I found a few more queries, where the estimation of
cardinality with the patch has become better:

EXPLAIN ANALYZE
  SELECT * FROM small LEFT JOIN large ON (large.id = small.id)
WHERE (small.b IS NULL);

MASTER:

*QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=32.74..18758.45 rows=55003 width=16) (actual
time=0.100..0.104 rows=0 loops=1)
   Hash Cond: (large.id = small.id)
   ->  Seq Scan on large  (cost=0.00..14425.50 rows=1000050 width=8)
(never executed)
   ->  Hash  (cost=32.60..32.60 rows=11 width=8) (actual
time=0.089..0.091 rows=0 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 8kB
         ->  Seq Scan on small  (cost=0.00..32.60 rows=11 width=8)
(actual time=0.088..0.088 rows=0 loops=1)
               Filter: (b IS NULL)
               Rows Removed by Filter: 100
 Planning Time: 0.312 ms
 Execution Time: 0.192 ms
(10 rows)*

With patch:

*QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=2.01..18177.02 rows=1 width=16) (actual
time=0.127..0.132 rows=0 loops=1)
   Hash Cond: (large.id = small.id)
   ->  Seq Scan on large  (cost=0.00..14425.00 rows=1000000 width=8)
(never executed)
   ->  Hash  (cost=2.00..2.00 rows=1 width=8) (actual time=0.112..0.114
rows=0 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 8kB
         ->  Seq Scan on small  (cost=0.00..2.00 rows=1 width=8)
(actual time=0.111..0.111 rows=0 loops=1)
               Filter: (b IS NULL)
               Rows Removed by Filter: 100
 Planning Time: 0.984 ms
 Execution Time: 0.237 ms
(10 rows)*

EXPLAIN ANALYZE
  SELECT * FROM large FULL JOIN small ON (large.id = small.id)
WHERE (small.b IS NULL);

MASTER:

*QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Merge Full Join  (cost=127921.69..299941.59 rows=56503 width=16)
(actual time=339.478..819.232 rows=999900 loops=1)
   Merge Cond: (small.id = large.id)
   Filter: (small.b IS NULL)
   Rows Removed by Filter: 100
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8) (actual
time=0.129..0.136 rows=100 loops=1)
         Sort Key: small.id
         Sort Method: quicksort  Memory: 29kB
         ->  Seq Scan on small  (cost=0.00..32.60 rows=2260 width=8)
(actual time=0.044..0.075 rows=100 loops=1)
   ->  Materialize  (cost=127763.19..132763.44 rows=1000050 width=8)
(actual time=339.260..605.444 rows=1000000 loops=1)
         ->  Sort  (cost=127763.19..130263.31 rows=1000050 width=8)
(actual time=339.254..449.930 rows=1000000 loops=1)
               Sort Key: large.id
               Sort Method: external merge  Disk: 17664kB
               ->  Seq Scan on large (cost=0.00..14425.50 rows=1000050
width=8) (actual time=0.032..104.484 rows=1000000 loops=1)
 Planning Time: 0.324 ms
 Execution Time: 859.705 ms
(15 rows)
*

With patch:

*QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Hash Full Join  (cost=3.25..18179.25 rows=999900 width=16) (actual
time=0.162..349.683 rows=999900 loops=1)
   Hash Cond: (large.id = small.id)
   Filter: (small.b IS NULL)
   Rows Removed by Filter: 100
   ->  Seq Scan on large  (cost=0.00..14425.00 rows=1000000 width=8)
(actual time=0.021..95.972 rows=1000000 loops=1)
   ->  Hash  (cost=2.00..2.00 rows=100 width=8) (actual
time=0.125..0.127 rows=100 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 12kB
         ->  Seq Scan on small  (cost=0.00..2.00 rows=100 width=8)
(actual time=0.030..0.059 rows=100 loops=1)
 Planning Time: 0.218 ms
 Execution Time: 385.819 ms
(10 rows)
*

**

EXPLAIN ANALYZE
  SELECT * FROM large RIGHT JOIN small ON (large.id = small.id)
  WHERE (large.a IS NULL);

MASTER:

*QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=127921.69..299941.59 rows=56503 width=16)
(actual time=345.403..345.404 rows=0 loops=1)
   Merge Cond: (small.id = large.id)
   Filter: (large.a IS NULL)
   Rows Removed by Filter: 100
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8) (actual
time=0.033..0.039 rows=100 loops=1)
         Sort Key: small.id
         Sort Method: quicksort  Memory: 29kB
         ->  Seq Scan on small  (cost=0.00..32.60 rows=2260 width=8)
(actual time=0.012..0.020 rows=100 loops=1)
   ->  Materialize  (cost=127763.19..132763.44 rows=1000050 width=8)
(actual time=345.287..345.315 rows=101 loops=1)
         ->  Sort  (cost=127763.19..130263.31 rows=1000050 width=8)
(actual time=345.283..345.295 rows=101 loops=1)
               Sort Key: large.id
               Sort Method: external merge  Disk: 17664kB
               ->  Seq Scan on large (cost=0.00..14425.50 rows=1000050
width=8) (actual time=0.009..104.648 rows=1000000 loops=1)
 Planning Time: 0.098 ms
 Execution Time: 347.807 ms
(15 rows)*

With patch:

*QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=3.25..18179.25 rows=100 width=16) (actual
time=209.838..209.842 rows=0 loops=1)
   Hash Cond: (large.id = small.id)
   Filter: (large.a IS NULL)
   Rows Removed by Filter: 100
   ->  Seq Scan on large  (cost=0.00..14425.00 rows=1000000 width=8)
(actual time=0.006..91.571 rows=1000000 loops=1)
   ->  Hash  (cost=2.00..2.00 rows=100 width=8) (actual
time=0.034..0.036 rows=100 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 12kB
         ->  Seq Scan on small  (cost=0.00..2.00 rows=100 width=8)
(actual time=0.008..0.016 rows=100 loops=1)
 Planning Time: 0.168 ms
 Execution Time: 209.883 ms
(10 rows)*

Attachment Content-Type Size
0001-Fixed-the-case-of-calculating-underestimated-cardina.patch text/x-patch 4.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2023-06-26 18:16:09 Re: Analyze on table creation?
Previous Message James Coleman 2023-06-26 17:48:37 Re: Analyze on table creation?