Re: Speeding up Aggregates

From: Dror Matalon <dror(at)zapatec(dot)com>
To: Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Speeding up Aggregates
Date: 2003-10-08 18:18:19
Message-ID: 20031008181819.GJ2979@rlx11.zapatec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Actually what finally sovled the problem is repeating the
dtstamp > last_viewed
in the sub select

select articlenumber, channel, description, title, link, dtstamp from items i1, my_channels where ((i1.channel = '22222' and
my_channels.id = '22222' and owner = 'drormata' and (dtstamp > last_viewed)) ) and (dtstamp = (select max (dtstamp) from items i2
where channel = '22222' and i1.link = i2.link));

to
explain analyze select articlenumber, channel, description, title, link, dtstamp from items i1, my_channels where ((i1.channel = '22222' and
my_channels.id = '22222' and owner = 'drormata' and (dtstamp > last_viewed)) ) and (dtstamp = (select max (dtstamp) from items i2 where
channel = '22222' and i1.link = i2.link and dtstamp > last_viewed));

Which in the stored procedure looks like this:
CREATE or REPLACE FUNCTION item_max_date (int4, varchar, timestamptz)
RETURNS
timestamptz AS '
select max(dtstamp) from items where channel = $1 and link = $2 and
dtstamp > $3;
' LANGUAGE 'sql';

Basically I have hundreds or thousands of items but only a few that
satisfy "dtstamp > last_viewed". Obviously I want to run the max() only on
on a few items. Repeating "dtstamp > last_viewed" did the trick, but it
seems like there should be a more elegant/clear way to tell the planner
which constraint to apply first.

Dror

On Wed, Oct 08, 2003 at 10:54:24AM -0400, Greg Stark wrote:
> Rod Taylor <rbt(at)rbt(dot)ca> writes:
>
> > On Fri, 2003-10-03 at 17:53, Dror Matalon wrote:
> > > On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
> > > >
> > > > It is too bad the (channel, link) index doesn't have dtstamp at the end
> > > > of it, otherwise the below query would be a gain (might be a small one
> > > > anyway).
> > > >
> > > > select dtstamp
> > > > from items
> > > > where channel = $1
> > > > and link = $2
> > > > ORDER BY dtstamp DESC
> > > > LIMIT 1;
> >
> > It didn't make a difference even with the 3 term index? I guess you
> > don't have very many common values for channel / link combination.
>
> You need to do:
>
> ORDER BY channel DESC, link DESC, dtstamp DESC
>
> This is an optimizer nit. It doesn't notice that since it selected on channel
> and link already the remaining tuples in the index will be ordered simply by
> dtstamp.
>
> (This is the thing i pointed out previously in
> <87el6ckrlu(dot)fsf(at)stark(dot)dyndns(dot)tv> on Feb 13th 2003 on pgsql-general)
>
>
> --
> greg
>

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Neil Conway 2003-10-08 18:28:45 Re: PostgreSQL vs. MySQL
Previous Message Bruce Momjian 2003-10-08 18:15:58 Re: PostgreSQL vs. MySQL