Re: Potential GIN vacuum bug

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Potential GIN vacuum bug
Date: 2015-08-31 07:10:55
Message-ID: CAMkU=1w7Uu1GZ8N0bxMykRLgTh-uPH+GPHfhMNeryZPCv7fqdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 30, 2015 at 3:57 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> > On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Your earlier point about how the current design throttles insertions to
> >> keep the pending list from growing without bound seems like a bigger
> deal
> >> to worry about. I think we'd like to have some substitute for that.
> >> ...
>
> > If the goal is to not change existing behavior (like for back patching)
> the
> > margin should be 1, always wait.
>
> The current behavior is buggy, both as to performance and correctness,
> so I'm not sure how come "don't change the behavior" should be a
> requirement.
>

I'm not confident the new behavior, with regards to performance, is an
absolute win.
We usually don't backpatch performance changes unless they have no or very
little
trade off. The only margin I can confidently say that for is the margin of
1.0.

>
> > But we would still have to deal with the
> > fact that unconditional acquire attempt by the backends will cause a
> vacuum
> > to cancel itself, which is undesirable.
>
> Good point.
>
> > If we define a new namespace for
> > this lock (like the relation extension lock has its own namespace) then
> > perhaps the cancellation code could be made to not cancel on that
> > condition. But that too seems like a lot of work to backpatch.
>
> We could possibly teach the autocancel logic to distinguish this lock type
> from others without using a new namespace. That seems a bit klugy, but
> maybe better than adding a new namespace. (For example, there are
> probably only a couple of modes in which we take page-level locks at
> present. Choosing a currently unused, but self-exclusive, mode for taking
> such a lock might serve.)
>

Like the attached? (The conditional variant for user backends was
unceremoniously yanked out.)

>
> > Would we bother to back-patch a theoretical bug which there is no
> evidence
> > is triggering in the field?
>
> What's theoretical about it? You seemed pretty sure that the issue in
>
> http://www.postgresql.org/message-id/flat/CA+bfosGVGVQhMAa=0-mUE6cOo7dBSgAYxb-XsnR5vm-S39hpNg(at)mail(dot)gmail(dot)com
> was exactly this.
>

I was adamant that the broken concurrency was not helping him
performance-wise, and also introduces correctness bugs. But I don't think
the unfortunate performance is a bug, just a issue highly related to one
that is a bug. I don't think a margin of 2, or even 20, would help him.
It would just build a bigger time bomb with a longer fuse. What he needs
is to turn fastupdate off, or get ssd, or get some other improvements that
aren't going to be backpatched. If we don't know what setting to use to
fix one person's performance problem, I'd rather set it to something that
at least is know that it won't cause other people to have problems that
they didn't used to.

>
> > If we want to improve the current behavior rather than fix a bug, then I
> > think that if the list is greater than threshold*2 and the cleaning lock
> is
> > unavailable, what it should do is proceed to insert the tuple's keys into
> > the index itself, as if fastupdate = off. That would require some major
> > surgery to the existing code, as by the time it invokes the clean up, it
> is
> > too late to not insert into the pending list.
>
> Meh. That's introducing a whole new behavioral regime, which quite aside
> from correctness bugs might introduce new performance bugs of its own.
> It certainly doesn't sound like a better back-patch candidate than the
> other thing.
>

Right, the start of the paragraph was meant to transition from backpatch to
forward looking.

Continuing the forward looking part:

I've given up on fastupdate for 9.4, and turned it off for my indexes. As
implemented it seems like a rather niche solution. So it is kind of
unfortunate that it is on by default, and that there is no way to turn it
off except for each individual index separately. Hopefully we can make it
less niche. Or maybe the niche is what I (and apparently Mr. Kehlet)
are trying to do.

There seems to be two ways for fastupdate to help:

1) Let someone else do it, in the background.

That is pretty much useless from my perspective, because there is no way to
get someone else to actually do it as often as it is needed to be done. I
can create an extension to expose the cleanup call to SQL, and then setup a
cron job (or a bgworker) to run that very frequently, or frequently poll to
decide it should be run. That works, kind of, if this type of thing is
important enough for you to go through all that (It is not important enough
to me, for production use, at this point. I'd rather just set fastupdate to
off, but at least it is available if I need it). Except, this is still a
serial job, and there is no way around that without turning fastupdate
off. You can have a RAID of 30 disks, and the clean up process is still
going to have 1 IO outstanding at a time. With fastupdate off, if you have
10 backends all inserting their own keys and needing to read the relevant
leaf pages, you can have 10 outstanding IO read requests at a time rather
than 1. So I try to let someone else do it in the background, and instead I
end up with 1 person doing it in the foreground and they are effectively
blocking 9 other foreground workers while at it.

2) Bulk inserts is more efficient, because you accumulate multiple tids for
the same key value.

I don't think this is generally true for the pending list (it certainly is
true for the initial index creation, but not to the extent it could be), at
least not in the IO limited case. If the pending list was in the
gigabytes, that would probably be true. But for 4MB or 10MB, the reduction
in the number of distinct keys is generally not going to be great. If it
were, that would have to mean that each of the keys is so popular that the
index is probably not all that useful anyway (unless all the documents
inserted around the same time have a strong affinity among them not shared
by the rest of the table). And to the extent they do share keys, consider
that effective_cache_size is almost certainly greater than work_mem, so if
it addressed the keys unaggregated it would probably find the leaf page
they needed were still in memory from the previous time. Of course there
probably is a CPU win, not having to uncompress and recompress the list of
tids several times, but if you are IO bound that doesn't do you much good.

Anyway, if the pending list overflows because it is turning out that bulk
inserts aren't such a efficiency boon, we should consider falling back on
fastupdate=off like behavior, to see if parallel IO threads does the job
that bulk insert is failing to do. But maybe it was a mistake bringing all
of this up, I'd really just like to get the correctness issue resolved so I
can work on the other performance things without having to worry about an
outstanding correctness bug.

Cheers,

Jeff

Attachment Content-Type Size
gin_pending_pagelock.patch application/octet-stream 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2015-08-31 07:14:26 Re: [PROPOSAL] Table Partition
Previous Message Fabien COELHO 2015-08-31 07:10:51 Re: checkpointer continuous flushing