Re: I/O on select count(*)

From: "Jan de Visser" <jdevisser(at)digitalfairway(dot)com>
To: "Greg Smith" <gsmith(at)gregsmith(dot)com>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: I/O on select count(*)
Date: 2008-05-15 02:38:08
Message-ID: 1159c1e90805141938j5f85b490rb5c1cfdf9e428bce@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 5/14/08, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
> On Wed, 14 May 2008, Alvaro Herrera wrote:
>
>
> > If neither of the bits is set, then the transaction is either in progress
> (which you can check by examining the list of running transactions in shared
> memory) or your process is the first one to check (in which case, you need
> to consult pg_clog to know the status, and you can update the hint bits if
> you find out a permanent state).
> >
>
> So is vacuum helpful here because it will force all that to happen in one
> batch? To put that another way: if I've run a manual vacuum, is it true
> that it will have updated all the hint bits to XMIN_COMMITTED for all the
> tuples that were all done when the vacuum started?

From my benchmarking experience: Yes, vacuum helps. See also below.

>
>
> > Regarding FAQs, I'm having trouble imagining putting this in the user
> > FAQ; I think it belongs into the developer's FAQ. However, a
> > benchmarker is not going to look there. Maybe we should start "a
> > benchmarker's FAQ"?
> >
>
> On the wiki I've started adding a series of things that are
> performance-related FAQs. There's three of them mixed in the bottom of
> http://wiki.postgresql.org/wiki/Frequently_Asked_Questions
> right now, about slow count(*) and dealing with slow queries.
>
> Here the FAQ would be "Why am I seeing all these writes when I'm just doing
> selects on my table?", and if it's mixed in with a lot of other performance
> related notes people should be able to find it. The answer and suggestions
> should be simple enough to be useful to a user who just noticed this
> behavior, while perhaps going into developer land for those who want to know
> more about the internals.

Obviously, this issue is tied to the slow count(*) one, as I found out
the hard way. Consider the following scenario:
* Insert row
* Update that row a couple of times
* Rinse and repeat many times

Now somewhere during that cycle, do a select count(*) just to see
where you are. You will be appalled by how slow that is, due to not
only the usual 'slow count(*)' reasons. This whole hint bit business
makes it even worse, as demonstrated by the fact that running a vacuum
before the count(*) makes the latter noticably faster.

jan

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pavan Deolasee 2008-05-15 02:40:58 Re: I/O on select count(*)
Previous Message Greg Smith 2008-05-15 02:21:45 Re: I/O on select count(*)