Re: should there be a hard-limit on the number of transactions pending undo?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: should there be a hard-limit on the number of transactions pending undo?
Date: 2019-07-20 04:27:09
Message-ID: CAA4eK1JkX3Kr1_0MnHZyzSice8ZHmWPo-rRu6doZNfC=o3dyGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 19, 2019 at 10:58 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> One other thing that seems worth noting is that we have to consider
> what happens after a restart. After a crash, and depending on exactly
> how we design it perhaps also after a non-crash restart, we won't
> immediately know how many outstanding transactions need undo; we'll
> have to grovel through the undo logs to find out. If we've got a hard
> cap, we can't allow new undo-using transactions to start until we
> finish that work. It's possible that, at the moment of the crash, the
> maximum number of items had already been pushed into the background,
> and every foreground session was busy trying to undo an abort as well.
> If so, we're already up against the limit. We'll have to scan through
> all of the undo logs and examine each transaction to get a count on
> how many transactions are already in a needs-undo-work state; only
> once we have that value do we know whether it's OK to admit new
> transactions to using the undo machinery, and how many we can admit.
> In typical cases, that won't take long at all, because there won't be
> any pending undo work, or not much, and we'll very quickly read the
> handful of transaction headers that we need to consult and away we go.
> However, if the hard limit is pretty big, and we're pretty close to
> it, counting might take a long time. It seems bothersome to have this
> interval between when we start accepting transactions and when we can
> accept transactions that use undo. Instead of throwing an ERROR, we
> can probably just teach the system to wait for the background process
> to finish doing the counting; that's what Amit's patch does currently.
>

Yeah, however, we wait for a certain threshold period of time (one
minute) for counting to finish and then error out. We can wait till
the counting is finished but I am not sure if that is a good idea
because anyway user can try again after some time.

> Or, we could not even open for connections until the counting has been
> completed.
>
> When I first thought about this, I was really concerned about the idea
> of a hard limit, but the more I think about it the less problematic it
> seems. I think in the end it boils down to a question of: when things
> break, what behavior would users prefer?
>

One minor thing I would like to add here is that we are providing some
knobs wherein the systems having more number of rollbacks can
configure to have a much higher value of hard limit such that it won't
hit in their systems. I know it is not always easy to find the right
value, but I guess they can learn from the behavior and then change it
to avoid the same in future.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-07-20 05:17:29 Re: should there be a hard-limit on the number of transactions pending undo?
Previous Message Dilip Kumar 2019-07-20 04:14:59 Re: POC: Cleaning up orphaned files using undo logs