Re: Bunch o' dead code in GEQO

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bunch o' dead code in GEQO
Date: 2004-01-23 05:50:57
Message-ID: 24457.1074837057@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Where are we going to find a representative test set of
>> dozen-or-more- way SQL join queries?

> As far as getting good lotsa-join queries, I think we can either:

> (1) generate the queries programmatically

> For example, star-schema join queries might be tractable via this
> method.

Sure, we can generate umpteen thousand star joins in no time, but they
are all the same problem. I don't think this is particularly helpful
either for finding optimizer bugs or for making choices about
performance issues.

An example of the kind of thing I'm worried about: I realized just
yesterday that GEQO is actively broken in 7.4 because it cannot generate
"bushy" plans. As of 7.4 there are cases involving IN constructs where
the only valid plans are bushy. For example, in the regression database:

regression=# set geqo_threshold to 3;
SET
regression=# explain select * from tenk1 where
regression-# unique1 in (select unique2 from tenk1 t2, int4_tbl t3 where hundred = f1) and
regression-# unique2 in (select unique1 from tenk1 t4, int4_tbl t5 where hundred = f1);
ERROR: failed to make a valid plan

You could test star joins all day long and not find that bug.

> (2) get the queries manually

> This would involve either writing schema and a bunch of queries for
> an "example app" (a la the Java Web Store), or getting a sanitized
> version of the schema & common queries used by a few large PG
> users. The latter might be the better way to go...

The only thing I'd really trust is a sampling of complex queries from
different real-world applications. This will probably be hard to get,
and we can only hope to have dozens of queries not hundreds or
thousands. We will also need to think about how we will get the
pg_statistic entries to correspond to the real-world situations.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Garamond 2004-01-23 07:41:40 Re: [pgsql-hackers-win32] What's left?
Previous Message Tom Lane 2004-01-23 05:35:14 Re: RFC: bufmgr locking changes