Re: Slow count(*) again...

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow count(*) again...
Date: 2010-10-12 08:14:58
Message-ID: 4CB41902.1020503@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 10/12/2010 03:56 PM, Vitalii Tymchyshyn wrote:

> BTW: There is a lot of talk about MVCC, but is next solution possible:
> 1) Create a page information map that for each page in the table will
> tell you how may rows are within and if any write (either successful or
> not) were done to this page. This even can be two maps to make second
> one really small (a bit per page) - so that it could be most time
> in-memory.
> 2) When you need to to count(*) or index check - first check if there
> were no writes to the page. If not - you can use count information from
> page info/index data without going to the page itself
> 3) Let vacuum clear the bit after frozing all the tuples in the page (am
> I using terminology correctly?).

Part of this already exists. It's called the visibility map, and is
present in 8.4 and above. It's not currently used for queries, but can
potentially be used to aid some kinds of query.

http://www.postgresql.org/docs/8.4/static/storage-vm.html

> In this case all read-only (archive) data will be this bit off and
> index/count(*) will be really fast.

A count with any joins or filter criteria would still have to scan all
pages with visible tuples in them. So the visibility map helps speed up
scanning of bloated tables, but doesn't provide a magical "fast count"
except in the utterly trivial "select count(*) from tablename;" case,
and can probably only be used for accurate results when there are no
read/write transactions currently open. Even if you kept a count of
tuples in each page along with the mvcc transaction ID information
required to determine for which transactions that count is valid, it'd
only be useful if you didn't have to do any condition checks, and it'd
be yet another thing to update with every insert/delete/update.

Perhaps for some users that'd be worth having, but it seems to me like
it'd have pretty narrow utility. I'm not sure that's the answer.

--
Craig Ringer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message david 2010-10-12 08:22:39 Re: Slow count(*) again...
Previous Message Vitalii Tymchyshyn 2010-10-12 07:56:13 Re: Slow count(*) again...

Browse pgsql-performance by date

  From Date Subject
Next Message david 2010-10-12 08:22:39 Re: Slow count(*) again...
Previous Message Vitalii Tymchyshyn 2010-10-12 07:56:13 Re: Slow count(*) again...