| From: | Antonin Houska <ah(at)cybertec(dot)at> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net> |
| Subject: | Re: Adding REPACK [concurrently] |
| Date: | 2026-04-09 08:43:14 |
| Message-ID: | 9539.1775724194@localhost |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Apr 9, 2026 at 5:08 AM Mihail Nikalayeu
> <mihailnikalayeu(at)gmail(dot)com> wrote:
> >
> > On Wed, Apr 8, 2026 at 7:22 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > I don't think this is a viable path. You need to prevent any further lock
> > > acquisitions on the relation to be able to swap it, not just conflicting DDL.
> >
> > AFAIU, Amit's main idea is that we currently upgrade the lock instead
> > of **releasing and re-acquiring** it because we fear DDL between those
> > actions.
> > DML actions are ok for us; REPACK will wait for them while getting
> > AEL. Later changes will be applied by REPACK backend while holding
> > AEL.
> >
>
> Yes, this is exactly what I had in mind.
>
> > One more thing we may prevent from sneaking into that hole is a
> > VACUUM. It will not break anything, but will be huge waste of time and
> > resources.
> >
>
> We can prevent other commands (if required) by checking the
> rel_in_use_by_repack flag but I thought for the initial version it is
> better to do what is minimally required.
>
> > > And you need to wait for all pre-existing locks to have been released. That
> > > doesn't really get easier by what you propose.
> >
> > Do you mean locks from other sessions accessing the table? Is it done
> > automatically while waiting for AEL?
> >
>
> Right and this is already the case with the code where locks
> not-conflicting with ShareUpdateExclusiveLock could be present before
> we try to upgrade the lock. The point was we will not let DDL execute
> on the table after the new flag (rel_in_use_by_repack) is set and we
> released the ShareUpdateExclusiveLock.
>
> > > I don't think CheckTableNotInUse() would work anyway - don't we already hold
> > > locks by the point we call it?
> >
> > Yes, the DDL session already holds locks by the time
> > CheckTableNotInUse is called - but is that really the problem? They
> > will be released on error.
> >
>
> Yes, that is the key to solve this problem. Let me take an example to
> explain this a bit more. Right now, the problem can happen in the
> following kind of sequence.
>
> Session-1:
> REPACK (CONCURRENTLY) foo;
>
> -- now say after the above command has acquired
> ShareUpdateExclusiveLock and is doing the work of copying the table,
> Session-2 did following actions.
>
> Session-2:
> Select * from foo; -- this is allowed
> ALTER TABLE foo ADD COLUMN c2; -- this will blocked as Session-1
> already has acquired ShareUpdateExclusiveLock
>
> Session-1:
> -- continues and tries to upgrade the lock to AEL. This leads to
> deadlock ERROR in the current session doing REPACK (CONCURRENTLY).
>
> Now, with the solution I proposed, because we will release
> ShareUpdateExclusiveLock after setting a flag like
> rel_in_use_by_repack, the ALTER TABLE in session-2 will succeed but
> will error_out by CheckTableNotInUse(or a similar function that checks
> rel_in_use_by_repack). Both with and without this solution, acquiring
> AEL by REPACK (CONCURRENTLY) needs to wait concurrent locks like the
> one for SELECT in above example that could have been acquired during
> the time REPACKing had ShareUpdateExclusiveLock.
>
> > > And even if that were not the case, there are
> > > several paths to locking relations that don't ever go anywhere near
> > > CheckTableNotInUse().
> >
> > But those aren't DDL, so they shouldn't be the problem (CREATE TRIGGER
> > might be - it seems to ignore CheckTableNotInUse, but perhaps it's
> > fine).
> >
> > So, in my undeerstanding Amit's idea has two parts: "set flag and
> > release/re-acquire" + "use CheckTableNotInUse (or some place like
> > that) to check the flag and fail for DDL commands."
> >
>
> True, this is the core of the idea.
This approach LGTM when it comes to concurrent DDLs. However, consider REPACK
holding ShareUpdateExclusiveLock (SUEL) and VACUUM (w/o VACOPT_SKIP_LOCKED)
waiting for the same lock. Once REPACK releases its SUEL, VACUUM gets it and
processes the table, then REPACK finally gets AccessExclusiveLock (AEL) and
finishes too. Nothing went wrong from the data consistency POV, however the
VACUUM proably wasted a lot of resources, because REPACK does "more than
VACUUM". Furthermore, while REPACK was waiting for the AEL (and the VACUUM was
running), a *lot of* DMLs could have been executed on the table, and REPACK
will have to replay those while holding AEL. The point is that REPACK should
hold AEL for as short time as possible.
What Andres proposed (AFAIU) should help to avoid this problem because
REPACK's request for AEL would get in front of the VACUUM's request for SUEL
in the queue. Of course, VACUUM would eventually run too, but - if REPACK
succeeded - it'd be much cheaper because it would (supposedly) find very
little work to do.
Anti-wraparound (failsafe) VACUUM is a bit different case [1] (i.e. it should
possibly have higher priority than REPACK), but I think this prioritization
should be implemented in other way than just letting it get in the way of
REPACK (at the time REPACK is nearly finished).
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Mihail Nikalayeu | 2026-04-09 09:11:16 | Re: Adding REPACK [concurrently] |
| Previous Message | Amit Langote | 2026-04-09 08:40:49 | Re: Eliminating SPI / SQL from some RI triggers - take 3 |