Re: Experimental evaluation of PostgreSQL's query optimizer

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Viktor Leis <leis(at)in(dot)tum(dot)de>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Experimental evaluation of PostgreSQL's query optimizer
Date: 2015-12-22 01:49:44
Message-ID: CAMsr+YGpbK_OvSp+gE_Q4PuV2dWHc1ByJSv4MbM2vTHi4pNieg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21 December 2015 at 20:53, Viktor Leis <leis(at)in(dot)tum(dot)de> wrote:

> I think your suggestion amounts to caching the cardinalities of all
> two-way joins. One major issue is that for a query like
>
> select * from r1, r2 where r1.x = r2.y and r1.a = ? and r2.b;
>
> it depends on the specific values of r1.a and r2.b whether there is
> any (anti-)correlation. And obviously one cannot store correction
> factors for each value of a and b.
>
>
I see a parallel with indexing and partial indexes here.

We obviously cannot afford to keep cross-table correlations for every
possible pairing of join conditions across every possible set of joined
tables. Much like we can't afford to keep indexes for every possible set of
columns, but even worse.

Much as we let users CREATE INDEX to tell us what cols to index, maybe we
should let them CREATE a cross-table join statistics collector for a
particular set of tables, optionally qualified with a filter condition just
like we do on partial indexes, and optionally transformed via an immutable
expression like we do for expression indexes, e.g.:

CREATE JOIN STATISTICS ON t1 JOIN t2 ON (t1.col1 = t2.col2);

CREATE JOIN STATISTICS ON t1 JOIN t2 ON (lower(t1.col1) = lower(t2.col2))
WHERE t1.othercol IS NOT NULL;

CREATE JOIN STATISTICS ON t1 JOIN t2 ON (t1.colx = t2.colx AND t1.coly =
t2.coly);

plus a simplified form like

CREATE JOIN STATISTICS ON t1 JOIN t2 USING (somecol);

That way we let an admin who's tuning queries direct effort at problem
areas. It's not automagical, but it's an area where tools could analyze
pg_stat_statements to direct effort, much like is currently done for index
creation. Like index creation I don't think it's practical to do this
entirely automatically and behind the scenes since collecting the stats for
all possibilities rapidly gets prohibitive.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2015-12-22 01:51:56 Re: Declarative partitioning
Previous Message Craig Ringer 2015-12-22 01:40:38 Re: Experimental evaluation of PostgreSQL's query optimizer