Quick Links

Re: Experimental evaluation of PostgreSQL's query optimizer

From:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To:	"'Viktor Leis EXTERN'" <leis(at)in(dot)tum(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Experimental evaluation of PostgreSQL's query optimizer
Date:	2015-12-21 08:22:45
Message-ID:	A737B7A37273E048B164557ADEF4A58B5378A976@ntex2010i.host.magwien.gv.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Viktor Leis wrote:
> We have recently performed an experimental evaluation of PostgreSQL's
> query optimizer. For example, we measured the contributions of
> cardinality estimation and the cost model on the overall query
> performance. You can download the resulting paper here:
> http://www.vldb.org/pvldb/vol9/p204-leis.pdf
>
> Some findings:
> 1. Perhaps unsurprisingly, we found that cardinality
> estimation is the biggest problem in query optimization.
> 2. The quality of Postgres' cardinality estimates is not generally worse
> than that of the major commerical systems.
> 3. It seems to me that one obvious way to avoid many bad situations
> would be to disable nested loop joins when the inner relation is NOT
> an index scan.
>
> I hope this will be of interest to some of you.

I have read the paper with great interest, and I have some comments.

- The paper mentions that the "Join Order Benchmark" has high cross-table
correlation, and this correlation is responsible for bad cardinality
estimates that cause bad plans with all RDBMS.
Wouldn't it be interesting to do the same experiment with a different
real-word data sets to see if that is indeed typical and not an
idiosyncrasy of that specific benchmark?

- The paper suggests that sampling the base tables is preferable to
using statistics because it gives better estimates, but I think that that
is only a win with long running, complicated, data warehouse style queries.
For the typical OLTP query it would incur intolerable planning times.
Any ideas on that?

- From my experience in tuning SQL queries I can confirm your one finding,
namely that bad cardinality estimates are the prime source for bad
plan choices.
Perhaps it would be valuable to start thinking about statistics for
inter-table correlation. What about something as "simple" as a factor
per (joinable) attribute pair that approximates the total row count
of a join on these attributes, divided by the planner's estimate?

- I also can corroborate your finding that nested loop joins are often
harmful, particularly when the inner loop is a sequential scan.
One of the first things I do when investigating bad performance of a query
whose plan has a nestend loop join is to set enable_nestloop to "off"
and see if that makes a difference, and it often does.
Maybe it would be a win to bias the planner against nested loop joins.
This is dreaming, but it might be nice to have some number as to how
reliable a certain estimate is, which is high if the estimate is, say,
derived from a single filter on a base table and sinks as more conditions
are involved or numbers pulled out of thin air.

Yours,
Laurenz Albe

In response to

Experimental evaluation of PostgreSQL's query optimizer at 2015-12-16 09:51:36 from Viktor Leis

Responses

Re: Experimental evaluation of PostgreSQL's query optimizer at 2015-12-21 12:53:27 from Viktor Leis
Re: Experimental evaluation of PostgreSQL's query optimizer at 2015-12-21 14:36:59 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyotaro HORIGUCHI	2015-12-21 08:27:31	Re: Freeze avoidance of very large table.
Previous Message	Jesper Pedersen	2015-12-21 07:50:36	Re: Additional LWLOCK_STATS statistics