Re: I/O on select count(*)

From: Jan de Visser <jdevisser(at)digitalfairway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Greg Smith" <gsmith(at)gregsmith(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: I/O on select count(*)
Date: 2008-05-15 13:15:40
Message-ID: 200805150915.40793.jdevisser@digitalfairway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thursday 15 May 2008 03:02:19 Tom Lane wrote:
> "Jan de Visser" <jdevisser(at)digitalfairway(dot)com> writes:
> > Obviously, this issue is tied to the slow count(*) one, as I found out
> > the hard way. Consider the following scenario:
> > * Insert row
> > * Update that row a couple of times
> > * Rinse and repeat many times
> >
> > Now somewhere during that cycle, do a select count(*) just to see
> > where you are. You will be appalled by how slow that is, due to not
> > only the usual 'slow count(*)' reasons. This whole hint bit business
> > makes it even worse, as demonstrated by the fact that running a vacuum
> > before the count(*) makes the latter noticably faster.
>
> Uh, well, you can't blame that entirely on hint-bit updates. The vacuum
> has simply *removed* two-thirds of the rows in the system, resulting in
> a large drop in the number of rows that the select even has to look at.
>
> It's certainly true that hint-bit updates cost something, but
> quantifying how much isn't easy. The off-the-cuff answer is to do the
> select count(*) twice and see how much cheaper the second one is. But
> there are two big holes in that answer: the first is the possible cache
> effects from having already read in the pages, and the second is that
> the follow-up scan gets to avoid the visits to pg_clog that the first
> scan had to make (which after all is the point of the hint bits).
>
> I don't know any easy way to disambiguate the three effects that are at
> work here. But blaming it all on the costs of writing out hint-bit
> updates is wrong.
>
> regards, tom lane

True. But it still contributes to the fact that queries sometimes behave in a
non-deterministic way, which IMHO is the major annoyance when starting to
work with pgsql. And contrary to other causes (vacuum, checkpoints) this is
woefully underdocumented.

jan

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2008-05-15 13:38:48 Re: I/O on select count(*)
Previous Message Matthew Wakeling 2008-05-15 12:54:13 Re: I/O on select count(*)