Re: So, is COUNT(*) fast now?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: So, is COUNT(*) fast now?
Date: 2011-10-24 14:51:07
Message-ID: CA+TgmobKtqjhHU6gquNOmhbHcu=zUhBNWwbfv-weKNpS8-NwgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 23, 2011 at 7:01 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Fri, Oct 21, 2011 at 12:52 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> Also, this line is kind of expensive:
>>
>>        if (!visibilitymap_test(scandesc->heapRelation,
>>                                ItemPointerGetBlockNumber(tid),
>>                                &node->ioss_VMBuffer))
>>
>> Around 2%.  But I don't see any way to avoid that, or even make it cheaper.
>
> Could we cache by ItemPointerGetBlockNumber(tid) the results of those
> tests, for groups of tids on the same index page?
>
> How useful this would be would depend on how well-clustered the table
> and index are.

I thought about that, but the existing code is so ridiculously cheap
that it's hard to believe a caching layer would save enough to pay for
itself. I mean, I presume that the cost attributed to that line has
to be associated with either (a) one of the pointer deferences, (b)
the expense of evaluating ItemPointerGetBlockNumber(), (c) setting up
the function call, or perhaps (d) overhead incurred as a result of
branch mis-prediction. The actual time spent *inside*
visibilitymap_test will be attributed to that function, not this one.

If you add up the time for this line and visibilitymap_test(), it's
like 10% of the runtime, which seems pretty significant. But it's 10%
of the runtime that is spent basically a handful of arithmetic
operations and then reading a byte from shared memory. It's
astonishing to find that so expensive on a test with just one backend
running. If you stick some kind of cache in there, it's going to
involve adding a branch that isn't there now, and I think that's going
to be pricey given how hot this code apparently is.

Also, I'm not sure it's safe. Suppose that we lock the index page,
return a tuple, check the visibility map, and find the page all
visible. Another transaction comes along, adds a tuple to the index
page, and clears the visibility map bit. We then go back, relock the
index page, and return another tuple. We'd better notice that the
visibility map bit has now been cleared, or we're in trouble.

I wonder if it might help to create some method for the index to
return all the matching keys on a single index page in one call. If
you're dealing with an MVCC snapshot, any new tuples added after the
start of the scan can't be visible anyway. That would save the
overhead of unlocking and relocking the buffer once per tuple, and
probably some overhead associated with validating and decoding the
possibly-changed page contents each time. If you did it that way, it
would also be safe to do what you're proposing - if a bunch of the
tuples on the index page are also on the same heap page, you could do
one visibility map check for all of them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Omar Bettin 2011-10-24 15:04:35 R: [9.1] unusable for large views (SOLVED)
Previous Message Heikki Linnakangas 2011-10-24 14:46:21 Re: Inserting heap tuples in bulk in COPY