Re: Does VACUUM reorder tables on clustered indices

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Does VACUUM reorder tables on clustered indices
Date: 2005-12-19 15:19:30
Message-ID: 29359.1135005570@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

Andrew Sullivan <ajs(at)crankycanuck(dot)ca> writes:
> On Sun, Dec 18, 2005 at 10:08:22PM -0500, Tom Lane wrote:
>> Just for the record, that behavior is seriously broken: it violates
>> MVCC if any of the deleted tuples are still visible to anyone else.

> Does it remove tuples that VACUUM FULL wouldn't?

Yes. CLUSTER works on SnapshotNow, so it will remove committed-dead
tuples even if there are still open transactions that could see them.
Of course, said transactions couldn't be actively using the table
while the CLUSTER runs, because it takes an exclusive table lock.
But they *could* look at it afterwards. Offhand I think you'd only
be likely to notice the difference if the open transactions were
SERIALIZABLE --- in READ COMMITTED mode, by the time they could look
at the clustered table, they'd likely be using a snapshot that postdates
the DELETE.

[ experiments a bit... ] Hmm. Actually, it's far worse than I
thought. It looks like CLUSTER puts the tuples into the new table with
its own xid, which means that concurrent serializable transactions will
see the new table as completely empty!

<< session 1 >>

regression=# select * from int4_tbl;
f1
-------------
0
123456
-123456
2147483647
-2147483647
(5 rows)

regression=# create index fooi on int4_tbl(f1);
CREATE INDEX
regression=# begin isolation level serializable;
BEGIN
regression=# select 2+2; -- establish transaction snapshot
?column?
----------
4
(1 row)

<< session 2 >>

regression=# delete from int4_tbl where f1 = -123456;
DELETE 1
regression=# cluster fooi on int4_tbl;
CLUSTER

<< back to session 1 >>

regression=# select * from int4_tbl;
f1
----
(0 rows)

regression=# commit;
COMMIT
regression=# select * from int4_tbl;
f1
-------------
-2147483647
0
123456
2147483647
(4 rows)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Luke Lonergan 2005-12-19 17:36:44 Re: Re: Which qsort is used
Previous Message Dave Page 2005-12-19 15:16:38 Re: [pgadmin-hackers] Client-side password encryption

Browse pgsql-sql by date

  From Date Subject
Next Message Achilleus Mantzios 2005-12-19 15:22:25 Re: How to Force Transactions to Process Serially on A Table
Previous Message Andrew Sullivan 2005-12-19 14:28:10 Re: How to Force Transactions to Process Serially on A Table