Re: GIN data corruption bug(s) in 9.6devel

From: Noah Misch <noah(at)leadboat(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN data corruption bug(s) in 9.6devel
Date: 2016-04-12 06:43:22
Message-ID: 20160412064322.GA1818418@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 07, 2016 at 05:53:54PM -0700, Jeff Janes wrote:
> On Thu, Apr 7, 2016 at 4:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> >> To summarize the behavior change:
> >
> >> In the released code, an inserting backend that violates the pending
> >> list limit will try to clean the list, even if it is already being
> >> cleaned. It won't accomplish anything useful, but will go through the
> >> motions until eventually it runs into a page the lead cleaner has
> >> deleted, at which point it realizes there is another cleaner and it
> >> stops. This acts as a natural throttle to how fast insertions can
> >> take place into an over-sized pending list.
> >
> > Right.
> >
> >> The proposed change removes that throttle, so that inserters will
> >> immediately see there is already a cleaner and just go back about
> >> their business. Due to that, unthrottled backends could add to the
> >> pending list faster than the cleaner can clean it, leading to
> >> unbounded growth in the pending list and could cause a user backend to
> >> becoming apparently unresponsive to the user, indefinitely. That is
> >> scary to backpatch.
> >
> > It's scary to put into HEAD, either. What if we simply don't take
> > that specific behavioral change? It doesn't seem like this is an
> > essential part of fixing the bug as you described it. (Though I've
> > not read the patch, so maybe I'm just missing the connection.)
>
> There are only 3 fundamental options I see, the cleaner can wait,
> "help", or move on.
>
> "Helping" is what it does now and is dangerous.
>
> Moving on gives the above-discussed unthrottling problem.
>
> Waiting has two problems. The act of waiting will cause autovacuums
> to be canceled, unless ugly hacks are deployed to prevent that. If
> we deploy those ugly hacks, then we have the problem that a user
> backend will end up waiting on an autovacuum to finish the cleaning,
> and the autovacuum is taking its sweet time due to
> autovacuum_vacuum_cost_delay.

Teodor, this thread has been quiet for four days, and the deadline to fix this
open item expired 23 hours ago. Do you have a new plan for fixing it?

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-04-12 06:58:23 Re: Choosing parallel_degree
Previous Message Noah Misch 2016-04-12 06:39:35 Re: Updated backup APIs for non-exclusive backups