Implement missing join selectivity estimation for range types

From: Mahmoud Sakr <mahmoud(dot)sakr(at)ulb(dot)be>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: SCHOEMANS Maxime <Maxime(dot)Schoemans(at)ulb(dot)be>, Diogo Repas <diogo(dot)repas(at)gmail(dot)com>, Luo Zhicheng <zhicheng(dot)luo(at)ulb(dot)be>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Implement missing join selectivity estimation for range types
Date: 2022-06-30 14:31:30
Message-ID: CAB4o4asMq3k6HN9WfDsssQ5DDVfAziB4TpiFJ8RBJgZTVuwC7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
Given a query:
SELECT * FROM t1, t2 WHERE t1.r << t2.r
where t1.r, t2.r are of range type,
currently PostgreSQL will estimate a constant selectivity for the << predicate,
which is equal to 0.005, not utilizing the statistics that the optimizer
collects for range attributes.

We have worked out a theory for inequality join selectivity estimation
(http://arxiv.org/abs/2206.07396), and implemented it for range
types it in this patch.

The algorithm in this patch re-uses the currently collected statistics for
range types, which is the bounds histogram. It works fairly accurate for the
operations <<, >>, &&, &<, &>, <=, >= with estimation error of about 0.5%.
The patch also implements selectivity estimation for the
operations @>, <@ (contains and is contained in), but their accuracy is not
stable, since the bounds histograms assume independence between the range
bounds. A point to discuss is whether or not to keep these last two operations.
The patch also includes the selectivity estimation for multirange types,
treating a multirange as a single range which is its bounding box.

The same algorithm in this patch is applicable to inequality joins of scalar
types. We, however, don't implement it for scalars, since more work is needed
to make use of the other statistics available for scalars, such as the MCV.
This is left as a future work.

--
Mahmoud SAKR - Univeristé Libre de Bruxelles
This work is done by Diogo Repas, Zhicheng Luo, Maxime Schoemans, and myself

Attachment Content-Type Size
v1-0001-Join-Selectivity-Estimation-for-Range-types.patch text/x-patch 74.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-06-30 15:23:19 Re: pg_auth_members.grantor is bunk
Previous Message Andrey Lepikhov 2022-06-30 14:11:51 Re: Removing unneeded self joins