Re: Adding REPACK [concurrently]

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Adding REPACK [concurrently]
Date: 2026-04-15 14:50:11
Message-ID: 25514.1776264611@localhost
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2026-04-14 15:37:56 +0200, Antonin Houska wrote:
> > Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > > On 2026-04-12 15:31:20 +0200, Mihail Nikalayeu wrote:
> > > > Instead of cancelling the backend entered the deadlock detector - it
> > > > cancel some another (nearest hard edge) until it is possible to get the
> > > > lock (either by
> > > > reordering or directly).
> > >
> > > I don't think that's as good. The problem is that that way you're only
> > > detecting the deadlocks once they have materialized (i.e. once repack actually
> > > does the lock upgrade), rather than cancelling when we know that the problem
> > > starts.
> >
> > This is my hack that tries to do that.
>
> I still think this needs to be in the deadlock detector. The lock cycle just
> needs to be a bit more complicated for a hack in JoinWaitQueue not to work.
> There's no guarantee that the wait that triggers the deadlock is actually on
> the relation being repacked.

ok, I see.

I thought of a "hypothetical graph", which would include the to-be-granted
lock, but the major issue is that it will not work correctly without the
locking the LMGR's LW locks we do in CheckDeadLock():

for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
LWLockAcquire(LockHashPartitionLockByIndex(i), LW_EXCLUSIVE);

And obviously, doing this each time we want to insert a lock into the queue
would be bad for performance. It's even mentioned in the storage/lmgr/README
that the current approach is optimistic, so I think that major rework would be
needed if we wanted to entirely avoid waiting that leads to deadlock.

The approach proposed by Mihail [1] seems the least problematic to me, and
something like that occurred to me when I thought about the problem the first
time. However, when we wake up the other processes in order to run the
deadlock detection, they should do that immediately. I've got no good idea
about implementation at the moment, since latch can be set for unrelated
reasons. (Besides that, I have some more questions about this patch, which I
can post separately.)

[1] https://www.postgresql.org/message-id/CADzfLwURKVNQ%2B%2BDpi7bjoGfj-8pchDQEVex3eWBx0NCYn6TbDQ%40mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2026-04-15 14:59:13 Re: First draft of PG 19 release notes
Previous Message Tom Lane 2026-04-15 14:37:32 Re: Add errdetail() with PID and UID about source of termination signal