Re: Add 64-bit XIDs into PostgreSQL 15

From: Chris Travers <chris(at)orioledata(dot)com>
To: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
Cc: Chris Travers <chris(dot)travers(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Fedor Sigaev <teodor(at)sigaev(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Aleksander Alekseev <afiskon(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Maxim Orlov <orlovmg(at)gmail(dot)com>
Subject: Re: Add 64-bit XIDs into PostgreSQL 15
Date: 2022-11-22 02:50:07
Message-ID: CAEq-hvsYDXzk4fHR3W7jbaXpqOQaor028tMiJvz955Xcp=Z89Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 21, 2022 at 10:40 AM Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
wrote:

> > I have a very serious concern about the current patch set. as someone
> who has faced transaction id wraparound in the past.
> >
> > I can start by saying I think it would be helpful (if the other issues
> are approached reasonably) to have 64-bit xids, but there is an important
> piece of context in reventing xid wraparounds that seems missing from this
> patch unless I missed something.
> >
> > XID wraparound is a symptom, not an underlying problem. It usually
> occurs when autovacuum or other vacuum strategies have unexpected stalls
> and therefore fail to work as expected. Shifting to 64-bit XIDs
> dramatically changes the sorts of problems that these stalls are likely to
> pose to operational teams. -- you can find you are running out of storage
> rather than facing an imminent database shutdown. Worse, this patch delays
> the problem until some (possibly far later!) time, when vacuum will take
> far longer to finish, and options for resolving the problem are
> diminished. As a result I am concerned that merely changing xids from
> 32-bit to 64-bit will lead to a smaller number of far more serious outages.
> >
> > What would make a big difference from my perspective would be to combine
> this with an inverse system for warning that there is a problem, allowing
> the administrator to throw warnings about xids since last vacuum, with a
> configurable threshold. We could have this at two billion by default as
> that would pose operational warnings not much later than we have now.
> >
> > Otherwise I can imagine cases where instead of 30 hours to vacuum a
> table, it takes 300 hours on a database that is short on space. And I
> would not want to be facing such a situation.
>
> Hi, Chris!
> I had a similar stance when I started working on this patch. Of
> course, it seemed horrible just to postpone the consequences of
> inadequate monitoring, too long running transactions that prevent
> aggressive autovacuum etc. So I can understand your point.
>
> With time I've got to a little bit of another view of this feature i.e.
>
> 1. It's important to correctly set monitoring, the cut-off of long
> transactions, etc. anyway. It's not the responsibility of vacuum
> before wraparound to report inadequate monitoring etc. Furthermore, in
> real life, this will be already too late if it prevents 32-bit
> wraparound and invokes much downtime in an unexpected moment of time
> if it occurs already. (The rough analogy for that is the machine
> running at 120mph turns every control off and applies full brakes just
> because the cooling liquid is low (of course there might be a warning
> previously, but anyway))
>

So I disagree with you on a few critical points here.

Right now the way things work is:
1. Database starts throwing warnings that xid wraparound is approaching
2. Database-owning team initiates an emergency response, may take downtime
or degradation of services as a result
3. People get frustrated with PostgreSQL because this is a reliability
problem.

What I am worried about is:
1. Database is running out of space
2. Database-owning team initiates an emergency response and takes more
downtime to into a good spot
3. People get frustrated with PostgreSQL because this is a reliability
problem.

If that's the way we go, I don't think we've solved that much. And as
humans we also bias our judgments towards newsworthy events, so rarer, more
severe problems are a larger perceived problem than the more routine, less
severe problems. So I think our image as a reliable database would suffer.

An ideal resolution from my perspective would be:
1. Database starts throwing warnings that xid lag has reached severely
abnormal levels
2. Database owning team initiates an effort to correct this, and does not
take downtime or degradation of services as a result
3. People do not get frustrated because this is not a reliability problem
anymore.

Now, 64-big xids are necessary to get us there but they are not
sufficient. One needs to fix the way we handle this sort of problem.
There is existing logic to warn if we are approaching xid wraparound. This
should be changed to check how many xids we have used rather than remaining
and have a sensible default there (optionally configurable).

I agree it is not vacuum's responsibility. It is the responsibility of the
current warnings we have to avoid more serious problems arising from this
change. These should just be adjusted rather than dropped.

> 2. The checks and handlers for the event that is never expected in the
> cluster lifetime (~200 years at constant rate of 1e6 TPS) can be just
> dropped. Of course we still need to do automatic routine maintenance
> like cutting SLRU buffers (but with a much bigger interval if we have
> much disk space e.g.). But I considered that we either can not care
> what will be with cluster after > 200 years (it will be migrated many
> times before this, on many reasons not related to Postgres even for
> the most conservative owners). So the radical proposal is to drop
> 64-bit wraparound at all. The most moderate one is just not taking
> very much care that after 200 years we have more hassle than next
> month if we haven't set up everything correctly. Next month's pain
> will be more significant even if it teaches dba something.
>
> Big thanks for your view on the general implementation of this feature,
> anyway.
>
> Kind regards,
> Pavel Borisov.
> Supabase
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-11-22 02:50:33 Re: wake up logical workers after ALTER SUBSCRIPTION
Previous Message Chris Travers 2022-11-22 02:38:58 Re: Add 64-bit XIDs into PostgreSQL 15