fast count(*) through statistics collector

From: Artem Yazkov <markkrass(at)inbox(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: fast count(*) through statistics collector
Date: 2008-03-19 03:16:54
Message-ID: 264101906.20080319101654@kemnet.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I'm novice in PostgreSQL codebase (and in English too :-)), but I'd be
glad to make a modest contribution to this great project.

By viewing this list, I see a lot of discussions on the problem of
"fast count (*)", but acceptable decision have not been formulated.
Well, I make bold to propose own view on the problem. It can be
described as: "Improve existing infrastructure of statistics collector
and use its for caching the number of rows in the table."

I plan to do the following steps to implement this principle:

1. Eliminate 500 ms lag between transaction commit and statistics
portion transfer from backend to collector. For this purpose
pgstat_report_tabstat() function must call in "force" mode only. We
pay therefor increased backend<-->collector traffic. As some
compensation could be invited to send TabStatusArray items, which have
not changed with the last shipment. This measure will reduce the size
of a messages.
I see here one more pitfall: new transaction can start after the
changes made earlier transaction became visible for other backends,
but before the statistics collector managed to take and process data
(despite the forced transfer). To avoid this, one may transfer
information before the changes made transaction will be visible,
collector, in one's turn, apply this info after that.
It is also possible that the use of shared memory instead of pipes
will help increase productivity.

2. Eliminate 500 ms lag between recieve statistics portion and write
pgstat.stat file. Realize the next todo item for this purpose:
"Allow statistics collector information to be pulled from the
collector process directly, rather than requiring the collector to
write a filesystem file twice a second".
As an additional effect, we will be able to reduce the burden on
I/O channels.

3. Field n_live_tuples of PgStat_StatTabEntry structure now holds the
number of inserted - deleted tuples for successful transactions, which
are known to collector. But we need field, which would contain the
number of inserted - deleted tuples for ALL successful transactions in
the history of the table, or it would be undefined (e.g. -1). If
n_live_tuples not suited for this role, creating additional field. In
any case, I will call this field "live_tuples counter" below.

4. Values in the live_tuples counters be questioned, if there was any
interruption of statistics collection. Therefore, if trac_counts was
set to false in cluster-wide or the collector process crash, then
live_tuples become undefined for all tables in the cluster. If
pg_stat_reset() call, then live_tuples become undefined for all tables
in DB. If pg_stat_clear_snapshot() call, or trac_counts set to false
during user session, then live_tuples counters should undefine for all
tables covered during this transaction/session. If compile such a list
of tables is not possible, well, for all tables in DB.

5. If live_tuples counter contain undefined value, but statistics
collector work normal, the counter must be restored through first
seqscan.

I hope that these steps will give us mvcc-compliant counters and
overhead cost will increase little.

The next step is relatively simple:

6. In the optimizer/plan/planagg.c file add a function similar to
optimize_minmax_aggregates () that return null for undefined
tuples_count counters (and count(*) determine by regular way through
seqscan) or plan for computation such as:

PgStat_StatTabEntry.live_tuples +
PgStat_TableCounts.t_new_lived_tuples +
PgStat_TableXactStatus.tuples_inserted -
PgStat_TableXactStatus.tuples_deleted

Restrictions:
1. Uninterrupted supply of statistics collector necessary for
efficient use of this algorithm.
2. Works only for simplest queries like:
select count (*) from regular_table

Any comments are welcome

--
regards, Artem Yazkov

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Denne 2008-03-19 03:22:43 Re: count(*) performance improvement ideas
Previous Message Tom Lane 2008-03-19 03:03:55 Re: count(*) performance improvement ideas