Autovacuum versus rolled-back transactions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Autovacuum versus rolled-back transactions
Date: 2007-05-25 23:20:38
Message-ID: 28005.1180135238@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The pgstats subsystem does not correctly account for the effects of
failed transactions. Note the live/dead tuple counts in this example:

regression=# create table foo (f1 int);
CREATE TABLE
regression=# insert into foo select x from generate_series(1,1000) x;
INSERT 0 1000
-- wait a second for stats to catch up
regression=# select * from pg_stat_all_tables where relname = 'foo';
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze
--------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+------------------
496849 | public | foo | 0 | 0 | | | 1000 | 0 | 0 | 1000 | 0 | | | |
(1 row)

regression=# begin;
BEGIN
regression=# insert into foo select x from generate_series(1,1000) x;
INSERT 0 1000
regression=# rollback;
ROLLBACK
-- wait a second for stats to catch up
regression=# select * from pg_stat_all_tables where relname = 'foo';
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze
--------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+------------------
496849 | public | foo | 0 | 0 | | | 2000 | 0 | 0 | 2000 | 0 | | | |
(1 row)

This means that a table could easily be full of dead tuples from failed
transactions, and yet autovacuum won't do a thing because it doesn't
know there are any. Perhaps this explains some of the reports we've
heard of tables bloating despite having autovac on.

It seems to me this is a "must fix" if we expect people to rely on
autovacuum for real in 8.3.

I think it's fairly obvious how n_live_tup and n_dead_tup ought to
change in response to a failed xact, but maybe not so obvious for the
other counters. I suggest that the scan/fetch counters (seq_scan,
seq_tup_read, idx_scan, idx_tup_fetch) as well as all the block I/O
counters should increment the same for committed and failed xacts,
since they are meant to count work done regardless of whether the work
was in vain. I am much less sure how we want n_tup_ins, n_tup_upd,
n_tup_del to act though. Should they be advanced "as normal" by a
failed xact? That's what the code is doing now, and if you think they
are counters for work done, it's not so unreasonable.

It may boil down to whether we would like the identity
n_live_tup = n_tup_ins - n_tup_del
to continue to hold, or the similar one for n_dead_tup. The problem
basically is that pgstats is computing n_live_tup and n_dead_tup
using those identities rather than by tracking what really happens.
I don't think we can have those identities if failed xacts update the
counts "normally". Is it worth having separate counters for the numbers
of failed inserts/updates? (Failed deletes perhaps need not be counted,
since they change nothing.) Or we could change the backends so that the
reported n_tup_ins/del/upd are made to still produce the right live/dead
tup counts according to the identities, but then those counts would not
reflect work done. Another alternative is for transactions to tally
the number of live and dead tuples they create, with understanding of
rollbacks, and send those to the stats collector independently of the
action counters.

I don't think I want to add separate failed-insert/update counters,
because that will bloat the stats reporting file, which is uncomfortably
large already when you have lots of tables. The separate-tally method
would avoid that, at the price of more stats UDP traffic.

I'm kind of leaning to the separate-tally method and abandoning the
assumption that the identities hold. I'm not wedded to the idea
though. Any thoughts?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-05-25 23:49:35 Re: Role privileges in PostgreSQL.
Previous Message Akmal Akmalhojaev 2007-05-25 22:22:14 Role privileges in PostgreSQL.