Re: slow plan for min/max

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: slow plan for min/max
Date: 2003-09-08 19:32:16
Message-ID: 60n0df5b5r.fsf@dev6.int.libertyrms.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

scott(dot)marlowe(at)ihs(dot)com ("scott.marlowe") writes:
> On Sun, 7 Sep 2003, Pailloncy Jean-Gérard wrote:
>
> Asking a question about why max(id) is so much slower than select id order
> by id desc limit 1, Pailloncy said:
>
>> I ask for the same thing. That's better !
>
> This is a Frequently asked question about something that isn't
> likely to change any time soon.
>
> Basically, Postgresql uses an MVCC locking system that makes
> massively parallel operation possible, but costs in certain areas,
> and one of those areas is aggregate performance over large sets.
> MVCC makes it very hard to optimize all but the simplest of
> aggregates, and even those optimzations which are possible would
> wind up being quite ugly at the parser level.

MVCC makes it difficult to optimize aggregates resembling COUNT(*) or
SUM(*), at least vis-a-vis having this available for a whole table
(e.g. - you have to be doing 'SELECT COUNT(*), SUM(SOMEFIELD) FROM
THIS_TABLE' with NO "WHERE" clause).

But there is nothing about MVCC that makes it particularly difficult
to handle the transformation:

select max(field) from some_table where another_field <
still_another_field;

(which isn't particularly efficient) into

select field from some_table where another_field <
still_another_field order by field desc limit 1;

The problems observed are thus:

1. If the query asks for other data, it might be necessary to scan
the table to get the other data, making the optimization
irrelevant;

2. If there's a good index to key on, the transformed version might
be a bunch quicker, but it is nontrivial to determine that, a
priori;

3. It would be a fairly hairy optimization to throw into the query
optimizer, so people are reluctant to try to do so.

Note that MVCC has _nothing_ to do with any of those three problems.

The MVCC-related point is that there is reluctance to create some
special case that will be troublesome to maintain instead of having
some comprehensive handling of _all_ aggregates. It seems a better
idea to "fix them all" rather than to kludge things up by fixing one
after another.
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/lisp.html
Signs of a Klingon Programmer - 10. "A TRUE Klingon Warrior does not
comment his code!"

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message scott.marlowe 2003-09-08 20:37:10 Re: slow plan for min/max
Previous Message Neil Conway 2003-09-08 18:49:12 Re: slow plan for min/max