| From: | Bernice Southey <bernice(dot)southey(at)gmail(dot)com> |
|---|---|
| To: | Bruce Momjian <bruce(at)momjian(dot)us> |
| Cc: | pgsql-docs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: More guidance on ctid |
| Date: | 2025-11-29 22:45:45 |
| Message-ID: | CAEDh4nzDMoRDZMDGGxKHzJ5hZscmvT2FWioTzQ6A0aJVtCO8NQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-docs |
Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> I think the reporter above extrapolated this example to use ctid in
> other, non-appropriate cases. Should we add a warning to the docs to
> explain that the general use of ctid is discouraged?
I've been explaining this badly. Here's an example that I hope will
make clear why I'm so concerned.
(Read Committed default isolation level)
drop table if exists b1, b2;
create table b1 as select 1 id, 100 balance;
create table b2 as select 1 id, 100 balance;
With an immutable column self-join, it works as expected and
serializes the balance changes.
--session1
begin;
with x as (select id from b1 order by id for update)
update b1 set balance = balance + 1 from x where b1.id = x.id;
--session2
with x as (select id from b1 order by id for update)
update b1 set balance = balance - 100 from x where b1.id = x.id;
select * from b1;
--session1
commit;
--session2
UPDATE 1
id | balance
----+---------
1 | 1
(1 row)
But with ctid, the second update is lost.
--session1
begin;
with x as (select ctid from b2 order by id for update)
update b2 set balance = balance + 1 from x where b2.ctid = x.ctid;
--session2
with x as (select ctid from b2 order by id for update)
update b2 set balance = balance - 100 from x where b2.ctid = x.ctid;
select * from b2;
--session1
commit;
--session2
UPDATE 0
id | balance
----+---------
1 | 101
(1 row)
The session2 CTE blocks until it gets the updated version of the row
from session 1 commit. It gets b2.ctid = (0,2) giving x.ctid = (0,2).
But the UPDATE gets b2.ctid = (0,1) at the start of the transaction.
This doesn't change even after session 1 commits. The join fails. This
is correct, but not obvious.
This is why I think the docs should say something different to what
they currently do. The UPDATE example is fine only because it's run
repeatedly until there are no more rows to find.
I learned the primary key self-join pattern for order by/limit in
delete/update soon after I began with postgres. I'm pretty sure ctid
would have confused me. So I think the doc examples with an id column
will be very helpful, and safer.
If users discover ctid on internet forums or LLMs and then check the
docs, I think they should get some notice of the concurrency risk.
Here's a patch with an attempt to do the above.
Forum examples:
https://stackoverflow.com/questions/10245560/deadlocks-in-postgresql-when-running-update/71163671#71163671
https://www.reddit.com/r/PostgreSQL/comments/1cn0q1c/comment/l36ppve/
Thanks, Bernice
| Attachment | Content-Type | Size |
|---|---|---|
| v1-add-ctid-guidance-and-remove-from-examples.patch | text/x-patch | 3.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Daniel Gustafsson | 2025-12-02 09:36:30 | restart point vs restartpoint in the docs |
| Previous Message | Andrew Jackson | 2025-11-27 18:55:11 | Add Restart=on-failure To Example Systemd File |