Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, david(at)fetter(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, jgh(at)wizmail(dot)org, Евгений Шишкин <itparanoia(at)gmail(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Subject: Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)
Date: 2015-12-31 03:32:34
Message-ID: 5684A1D2.1000601@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/30/2015 10:30 PM, David Rowley wrote:
> On 31 December 2015 at 01:24, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com
> <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> wrote:
>
> Personally I'd like to see automatic plan caching occur in
> PostgreSQL. There is a bit of a problem with it in regards to a query
> such as: select * from t where mySkewedColumn = 1; where the value 1
> appears 99% of the time. Initially we may plan to seqscan, where with
> other values we'd likely prefer to index scan. I imagine with my
> unique joins patch, that it could be expanded to test
> baserestrictinfos to see if they contain quals with a unique index.
> This knowledge could later permit plan caching to occur on queries
> which are safe from having any skew in the results. It might sound
> rather restrictive, but likely this would cover 99% of UPDATE and
> DELETE operations in an OLTP workload, and likely a good deal of the
> SELECTs too. The safety of caching could be analyzed during planning,
> and a flag could be set somewhere, perhaps in PlannedStmt to mark if
> the plan is safe to cache. The planner() function could initially
> hash the query string and check if any cached plan exists with that
> hash, if not, plan the statement, and then check if the "safeToCache"
> flag is set, and if so, stick that plan into a hash table. Also plans
> with no baserestrictinfos could be "safeToCache" too.

Yeah, that's what I meant by "non-trivial". I don't know a good approach
to this problem, or if such a "good" approach even exists, but I'd say
being able to decide whether to rebuild a plan in such cases is a
"must-have" feature. Otherwise we could easily loose any gains from the
more thorough optimization because of poor plans.

In other words, we'd have to come up with a way to decide whether to use
the same plan as before, or try building another plan (for the same
query with different parameter values). I can think of two approaches:

(1) Try to measure how "different" are the parameter values used in the
new query, compared to the existing plan(s). This probably means
difference in terms of probability / frequencies etc.

(2) Compute the cost of the existing plan for the new parameters. I.e.
don't perform the whole optimization, just the costing for the
single plan. If the costs are "close" then use the existing plan.

Of course, none of this is trivial and still may fail for some cases.

I wonder what the other databases do, or if there are papers about this
topic (I'd guess there are, but I haven't looked too much).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-12-31 03:50:29 Re: dynloader.h missing in prebuilt package for Windows?
Previous Message Alvaro Herrera 2015-12-31 02:18:45 Re: dynloader.h missing in prebuilt package for Windows?