Re: Slow count(*) again...

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: david(at)lang(dot)hm
Cc: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow count(*) again...
Date: 2010-10-12 11:44:59
Message-ID: 4CB44A3B.5090409@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 10/12/2010 04:22 PM, david(at)lang(dot)hm wrote:

> from a PR point of view, speeding up the trivil count(*) case could be
> worth it, just to avoid people complaining about it not being fast.

At the cost of a fair bit more complexity, though, and slowing
everything else down.

The proper solution here remains, IMO, support for visibility
information in indexes, whether by storing it once in the index and once
in the heap (ouch!), storing it out-of-line, or using a covering index
where one or more columns are stored wholly in the index not in the
table heap at all.

Here are a few of the many past discussions about this that have already
covered some of the same ground:

http://stackoverflow.com/questions/839015/postgres-could-an-index-organized-tables-paved-way-for-faster-select-count-fr

http://osdir.com/ml/db.postgresql.performance/2003-10/msg00075.html
(and the rest of the thread)

A decent look with Google will find many, many more.

> in the case where you are doing a count(*) where query and the where is
> on an indexed column, could the search just look at the index + the
> visibility mapping rather than doing an sequential search through the
> table?

Nope, because the visibility map, which is IIRC only one bit per page,
doesn't record how many tuples there are on the page, or enough
information about them to determine how many of them are visible to the
current transaction*.

> as for your worries about the accuracy of a visibility based count in
> the face of other transactions, wouldn't you run into the same issues if
> you are doing a sequential scan with the same transactions in process?

No. Every tuple in a table heap in postgresql has hidden fields, some of
which are used to determine whether the current transaction* can "see"
the tuple - it may have been inserted after this transaction started, or
deleted before this transaction started, so it's not visible to this
transaction but may still be to others.

http://www.postgresql.org/docs/current/static/ddl-system-columns.html

This information isn't available in the visibility map, or in indexes.
That's why PostgreSQL has to hit the heap to find it.

* current transaction should really be "current snapshot". The snapshot
is taken at the start of the whole transaction for SERIALIZABLE
isolation, and at the start of each statement for READ COMMITTED isolation.

--
Craig Ringer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-12 11:59:07 Re: security hook on table creation
Previous Message Radosław Smogura 2010-10-12 09:04:56 Re: [JDBC] Support for JDBC setQueryTimeout, et al.

Browse pgsql-performance by date

  From Date Subject
Next Message Mladen Gogala 2010-10-12 12:27:26 Re: Slow count(*) again...
Previous Message Vitalii Tymchyshyn 2010-10-12 08:34:22 Re: Slow count(*) again...