Re: Optimization with dates

From: Jean-Christophe Boggio <cat(at)thefreecat(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Optimization with dates
Date: 2001-11-14 12:02:51
Message-ID: 868508534.20011114130251@thefreecat.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hello,

Thanks for all the answers. A little feedback :

>> I have tried :
>> where dategain>=(select now()-30);
>> and many other, syntactically absurd :-)

TL> dategain >= ago(30)
TL> where "ago" is a function that computes "date(now()) - n" and is
TL> marked "iscachable".

create function ago(interval) returns timestamp as '
select now() - $1
' language 'sql' with (iscachable);

explain select count(*) from gains where dategain>=ago('30 0:00');

Aggregate (cost=180640.90..180640.90 rows=1 width=0)
-> Seq Scan on gains (cost=0.00..179761.71 rows=351676 width=0)

=====

explain select count(*) from gains where dategain>=ago('5 days');

Aggregate (cost=172340.65..172340.65 rows=1 width=0)
-> Index Scan using ix_gains_dategain on gains (cost=0.00..172202.94 rows=55084 width=0)

=====

explain select count(*) from gains where dategain>=ago('6 days');

Aggregate (cost=179929.06..179929.06 rows=1 width=0)
-> Seq Scan on gains (cost=0.00..179761.71 rows=66940 width=0)

TL> Just out of curiosity, do the indexed timestamps correlate closely to
TL> the physical order of the table? I'd expect that to happen if you
TL> are timestamping records by insertion time and there are few or no
TL> updates.

That's right, there are very few updates.

===========================================
Now, for Jason's idea :

set enable_seqscan to off;
SET VARIABLE

explain select count(*) from gains where dategain>=now()-30;

Aggregate (cost=100256770.86..100256770.86 rows=1 width=0)
-> Seq Scan on gains (cost=100000000.00..100250847.08 rows=2369512 width=0)

Strange isn't it ?

Is it possible to do the equivalent of "set enable_seqscan to off"
out of psql (in php or perl code) ?

===========================================

To answer Stephan and Josh :

SS> Is 2367640 a reasonable estimate for the number of
SS> rows that match the condition?

JB> Hmmm... if the number of rows is actually accurate (2M out of 10M in the
JB> last 30 days) then a Seq Scan seems like a good plan to me. If the
JB> numbers aren't accurate, it's time to run a VACUUM ANALYZE.

select avg(cnt) from (select count(*) as cnt from gains group by
date(dategain)) as foo;
avg
------------------
12009.6131756757

If I did it right, this should be the average number of rows per day.
The db exists since April 1st 2000.

select date('now')-date('2000-04-01');
592

select 592*12009;
7109328

select count(*) from gains;
count
---------
7109753

As you see, dategain is *quite* linear !

So to answer your question, a reasonable estimate for the number of
rows that match the condition is :

select 30*12009;
360270

The real answer is :

select count(*) from gains where dategain>=now()-30;
231781

SS> Have you run vacuum analyze?

Every night (and it's a VEERRYYYY long process, even dropping the
indexes before and recreating them afterwards, maybe that's the real
problem ?) Keeping the index makes the VACUUM process several hours.
We'll try 7.2 which should solve part of this problem but since these
are production systems, we wait a little feedback from 7.2 users.

SS> If the estimate is right, you'll probably find that
SS> the sequence scan is actually faster than an index
SS> scan since about 1/4 of the table is being selected.

It should select 1/592 of the table !

Any further advises VERY appreciated. Thanks again everyone for your
help.

--
Jean-Christophe Boggio
cat(at)thefreecat(dot)org -o)
Independant Consultant and Developer /\\
Delphi, Linux, Perl, PostgreSQL, Debian _\_V

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Haller Christoph 2001-11-14 13:02:08 ago()
Previous Message Eaglet 2001-11-14 11:51:30 handling exceptions, really not simple... :-((