Re: Experimental evaluation of PostgreSQL's query optimizer

From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "'Viktor Leis *EXTERN*'" <leis(at)in(dot)tum(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Experimental evaluation of PostgreSQL's query optimizer
Date: 2015-12-21 08:22:45
Message-ID: A737B7A37273E048B164557ADEF4A58B5378A976@ntex2010i.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Viktor Leis wrote:
> We have recently performed an experimental evaluation of PostgreSQL's
> query optimizer. For example, we measured the contributions of
> cardinality estimation and the cost model on the overall query
> performance. You can download the resulting paper here:
> http://www.vldb.org/pvldb/vol9/p204-leis.pdf
>
> Some findings:
> 1. Perhaps unsurprisingly, we found that cardinality
> estimation is the biggest problem in query optimization.
> 2. The quality of Postgres' cardinality estimates is not generally worse
> than that of the major commerical systems.
> 3. It seems to me that one obvious way to avoid many bad situations
> would be to disable nested loop joins when the inner relation is NOT
> an index scan.
>
> I hope this will be of interest to some of you.

I have read the paper with great interest, and I have some comments.

- The paper mentions that the "Join Order Benchmark" has high cross-table
correlation, and this correlation is responsible for bad cardinality
estimates that cause bad plans with all RDBMS.
Wouldn't it be interesting to do the same experiment with a different
real-word data sets to see if that is indeed typical and not an
idiosyncrasy of that specific benchmark?

- The paper suggests that sampling the base tables is preferable to
using statistics because it gives better estimates, but I think that that
is only a win with long running, complicated, data warehouse style queries.
For the typical OLTP query it would incur intolerable planning times.
Any ideas on that?

- From my experience in tuning SQL queries I can confirm your one finding,
namely that bad cardinality estimates are the prime source for bad
plan choices.
Perhaps it would be valuable to start thinking about statistics for
inter-table correlation. What about something as "simple" as a factor
per (joinable) attribute pair that approximates the total row count
of a join on these attributes, divided by the planner's estimate?

- I also can corroborate your finding that nested loop joins are often
harmful, particularly when the inner loop is a sequential scan.
One of the first things I do when investigating bad performance of a query
whose plan has a nestend loop join is to set enable_nestloop to "off"
and see if that makes a difference, and it often does.
Maybe it would be a win to bias the planner against nested loop joins.
This is dreaming, but it might be nice to have some number as to how
reliable a certain estimate is, which is high if the estimate is, say,
derived from a single filter on a base table and sinks as more conditions
are involved or numbers pulled out of thin air.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2015-12-21 08:27:31 Re: Freeze avoidance of very large table.
Previous Message Jesper Pedersen 2015-12-21 07:50:36 Re: Additional LWLOCK_STATS statistics