Re: I/O on select count(*)

From: Luke Lonergan <llonergan(at)greenplum(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: I/O on select count(*)
Date: 2008-05-15 02:52:01
Message-ID: C451C651.5EE8B%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

BTW ­ we¹ve removed HINT bit checking in Greenplum DB and improved the
visibility caching which was enough to provide performance at the same level
as with the HINT bit optimization, but avoids this whole ³write the data,
write it to the log also, then write it again just for good measure²
behavior.

For people doing data warehousing work like the poster, this Postgres
behavior is miserable. It should be fixed for 8.4 for sure (volunteers?)

BTW ­ for the poster¹s benefit, you should implement partitioning by date,
then load each partition and VACUUM ANALYZE after each load. You probably
won¹t need the date index anymore ­ so your load times will vastly improve
(no indexes), you¹ll store less data (no indexes) and you¹ll be able to do
simpler data management with the partitions.

You may also want to partition AND index if you do a lot of short range
selective date predicates. Example would be: partition by day, index on
date field, queries selective on date ranges by hour will then select out
only the day needed, then index scan to get the hourly values. Typically
time-oriented data is nearly time sorted anyway, so you¹ll also get the
benefit of a clustered index.

- Luke

On 5/15/08 10:40 AM, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> wrote:

> On Thu, May 15, 2008 at 7:51 AM, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
>> >
>> >
>> > So is vacuum helpful here because it will force all that to happen in one
>> > batch? To put that another way: if I've run a manual vacuum, is it true
>> > that it will have updated all the hint bits to XMIN_COMMITTED for all the
>> > tuples that were all done when the vacuum started?
>> >
>
> Yes. For that matter, even a plain SELECT or count(*) on the entire
> table is good enough. That will check every tuple for visibility and
> set it's hint bits.
>
> Another point to note is that the hint bits are checked and set on a
> per tuple basis. So especially during index scan, the same heap page
> may get rewritten many times. I had suggested in the past that
> whenever we set hint bits for a tuple, we should check all other
> tuples in the page and set their hint bits too to avoid multiple
> writes of the same page. I guess the idea got rejected because of lack
> of benchmarks to prove the benefit.
>
> Thanks,
> Pavan
>
> --
> Pavan Deolasee
> EnterpriseDB http://www.enterprisedb.com
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2008-05-15 03:03:37 Re: I/O on select count(*)
Previous Message Pavan Deolasee 2008-05-15 02:40:58 Re: I/O on select count(*)