Re: Postgresql 8.1.4 - performance issues for select on

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Dimitri Fontaine <dim(at)dalibo(dot)com>, pgsql-performance(at)postgresql(dot)org, Ioana Danes <ioanasoftware(at)yahoo(dot)ca>
Subject: Re: Postgresql 8.1.4 - performance issues for select on
Date: 2006-10-18 23:05:49
Message-ID: 1161212749.31645.285.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 2006-10-18 at 17:35 -0500, Jim C. Nasby wrote:
> On Wed, Oct 18, 2006 at 03:32:15PM -0700, Jeff Davis wrote:
> > On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby wrote:
> > > Sorry, don't have the earlier part of this thread, but what about...
> > >
> > > SELECT greatest(max(a), max(b)) ...
> > >
> > > ?
> >
> > To fill you in, we're trying to get the max of a union (a view across
> > two physical tables).
>
> UNION or UNION ALL? You definitely don't want to do a plain UNION if you
> can possibly avoid it.

Oops, of course he must be doing UNION ALL, but for some reason I ran my
test queries with plain UNION (thanks for reminding me). However, it
didn't make a difference, see below.

> > It can be done if you're creative with the query; I suggested a query
> > that selected the max of the max()es of the individual tables. Your
> > query could work too. However, the trick would be getting postgresql to
> > recognize that it can transform "SELECT max(x) FROM foo" into that,
> > where foo is a view of a union.
> >
> > If PostgreSQL could sort the result of a union by merging the results of
> > two index scans, I think the problem would be solved. Is there something
> > preventing this, or is it just something that needs to be added to the
> > planner?
>
> Hrm... it'd be worth trying the old ORDER BY ... LIMIT 1 trick just to
> see if that worked in this case, but I don't have much hope for that.

Yeah, that's the solution. Here's the problem:

=> set enable_seqscan = false;
SET
=> EXPLAIN SELECT i FROM (SELECT i FROM t10 UNION ALL SELECT i FROM t11)
t ORDER BY i DESC;
QUERY PLAN
------------------------------------------------------------------------------------
Sort (cost=200026772.96..200027272.96 rows=200000 width=4)
Sort Key: t.i
-> Append (cost=100000000.00..200004882.00 rows=200000 width=4)
-> Seq Scan on t10 (cost=100000000.00..100001441.00
rows=100000 width=4)
-> Seq Scan on t11 (cost=100000000.00..100001441.00
rows=100000 width=4)
(5 rows)

=> EXPLAIN SELECT i FROM (SELECT i FROM t10) t ORDER BY i DESC;
QUERY PLAN
------------------------------------------------------------------------------------
Index Scan Backward using t10_idx on t10 (cost=0.00..1762.00
rows=100000 width=4)
(1 row)

=> EXPLAIN SELECT i FROM (SELECT i FROM t11) t ORDER BY i DESC;
QUERY PLAN
------------------------------------------------------------------------------------
Index Scan Backward using t11_idx on t11 (cost=0.00..1762.00
rows=100000 width=4)
(1 row)

=>

But if PostgreSQL could just merge the index scan results, it could
"ORDER BY i" the result of a UNION ALL without a problem. But it can't
do that, so the syntactical trick introduced for min/max won't work in
his case :(

He'll probably have to change his application to make that query perform
decently if the tables are split.

Ideas?

Regards,
Jeff Davis

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2006-10-19 04:58:47 Re: measuring shared memory usage on Windows
Previous Message Tom Lane 2006-10-18 23:05:17 Re: Postgresql 8.1.4 - performance issues for select on