Quick Links

Re: Use extended statistics to estimate (Var op Var) clauses

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Use extended statistics to estimate (Var op Var) clauses
Date:	2021-08-11 15:17:11
Message-ID:	7C0F91B5-8A43-428B-8D31-556458720305@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> On Aug 11, 2021, at 7:51 AM, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
> I'll go test random data designed to have mcv lists of significance....

Done. The data for column_i is set to floor(random()^i*20). column_1 therefore is evenly distributed between 0..19, with successive columns weighted more towards smaller values.

This still gives (marginally) worse results than the original test I posted, but better than the completely random data from the last post. After the patch, 72294 estimates got better and 30654 got worse. The biggest losers from this data set are:

better:0, worse:31: A >= B or A = A or not A = A
better:0, worse:31: A >= B or A = A
better:0, worse:31: A >= B or not A <> A
better:0, worse:31: A >= A or A = B or not B = A
better:0, worse:31: A >= B and not A < A or A = A
better:0, worse:31: A = A or not A > B or B <> A
better:0, worse:31: A >= B or not A <> A or not A >= A
better:0, worse:32: B < A and B > C and not C < B <----
better:1, worse:65: A <> C and A <= B <----
better:0, worse:33: B <> A or B >= B
better:0, worse:33: B <> A or B <= B
better:0, worse:33: B <= A or B = B or not B > B
better:0, worse:33: B <> A or not B >= B or not B < B
better:0, worse:33: B = A or not B > B or B = B
better:0, worse:44: A = B or not A > A or A = A
better:0, worse:44: A <> B or A <= A
better:0, worse:44: A <> B or not A >= A or not A < A
better:0, worse:44: A <= B or A = A or not A > A
better:0, worse:44: A <> B or A >= A

Of which, a few do not contain columns compared against themselves, marked with <---- above.

I don't really know what to make of these results. It doesn't bother me that any particular estimate gets worse after the patch. That's just the nature of estimating. But it does bother me a bit that some types of estimates consistently get worse. We should either show that my analysis is wrong about that, or find a way to address it to avoid performance regressions. If I'm right that there are whole classes of estimates that are made consistently worse, then it stands to reason some users will have those data distributions and queries, and could easily notice.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Use extended statistics to estimate (Var op Var) clauses at 2021-08-11 14:51:36 from Mark Dilger

Responses

Re: Use extended statistics to estimate (Var op Var) clauses at 2021-08-11 22:00:12 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2021-08-11 15:17:57	Re: Next Steps with Hash Indexes
Previous Message	Tomas Vondra	2021-08-11 15:13:34	Re: Use extended statistics to estimate (Var op Var) clauses