Quick Links

Re: I/O on select count(*)

From:	Jan de Visser <jdevisser(at)digitalfairway(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Greg Smith" <gsmith(at)gregsmith(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: I/O on select count(*)
Date:	2008-05-15 13:15:40
Message-ID:	200805150915.40793.jdevisser@digitalfairway.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Thursday 15 May 2008 03:02:19 Tom Lane wrote:
> "Jan de Visser" <jdevisser(at)digitalfairway(dot)com> writes:
> > Obviously, this issue is tied to the slow count(*) one, as I found out
> > the hard way. Consider the following scenario:
> > * Insert row
> > * Update that row a couple of times
> > * Rinse and repeat many times
> >
> > Now somewhere during that cycle, do a select count(*) just to see
> > where you are. You will be appalled by how slow that is, due to not
> > only the usual 'slow count(*)' reasons. This whole hint bit business
> > makes it even worse, as demonstrated by the fact that running a vacuum
> > before the count(*) makes the latter noticably faster.
>
> Uh, well, you can't blame that entirely on hint-bit updates. The vacuum
> has simply *removed* two-thirds of the rows in the system, resulting in
> a large drop in the number of rows that the select even has to look at.
>
> It's certainly true that hint-bit updates cost something, but
> quantifying how much isn't easy. The off-the-cuff answer is to do the
> select count(*) twice and see how much cheaper the second one is. But
> there are two big holes in that answer: the first is the possible cache
> effects from having already read in the pages, and the second is that
> the follow-up scan gets to avoid the visits to pg_clog that the first
> scan had to make (which after all is the point of the hint bits).
>
> I don't know any easy way to disambiguate the three effects that are at
> work here. But blaming it all on the costs of writing out hint-bit
> updates is wrong.
>
> regards, tom lane

True. But it still contributes to the fact that queries sometimes behave in a
non-deterministic way, which IMHO is the major annoyance when starting to
work with pgsql. And contrary to other causes (vacuum, checkpoints) this is
woefully underdocumented.

jan

In response to

Re: I/O on select count(*) at 2008-05-15 07:02:19 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Matthew Wakeling	2008-05-15 13:38:48	Re: I/O on select count(*)
Previous Message	Matthew Wakeling	2008-05-15 12:54:13	Re: I/O on select count(*)