Re: avoid bloat from CREATE INDEX CONCURRENTLY

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: avoid bloat from CREATE INDEX CONCURRENTLY
Date: 2017-02-28 17:54:58
Message-ID: CANP8+jKtgd5qoJK6WoDy_1nSK0Uw_J4iDOGFi05X7mB6VV8qnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 February 2017 at 13:30, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>> On 28 February 2017 at 13:05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Um ... isn't there a transaction boundary there anyway?
>
>> Yes, the patch releases the snapshot early, so it does not hold it
>> once the build scan has completed. This allows the sort and build
>> phases to occur without holding back the xmin.
>
> Oh ... so Alvaro explained it badly. The reason this is specific to
> btree is that it's the only AM with any significant post-scan building
> time.
>
> However, now that I read the patch: this is a horribly ugly hack.
> I really don't like the API (if it even deserves the dignity of that
> name) that you've added to snapmgr. I supposwe the zero documentation
> for it fits in nicely with the fact that it's a badly-thought-out kluge.

WTF. Frankly, knowing it would generate such a ridiculously negative
response was the reason it wasn't me that submitted it and why its not
fully documented. Documentation in this case would be a short
paragraph in the index AM, explaining for the user what is already in
code comments.

You're right to point out that there is significant post-scan build
time and the reduction in bloat during that time is well worth the
trouble. I'm pleased to have thought of it and to have contributed it
to the community.

> I think it would be better to just move the responsibility for snapshot
> popping in this sequence to the index AMs, full stop.

There were two choices: a) leave the responsibility to the index AM,
giving a clean API, or b) don't trust that all index AMs would know or
implement this correctly. If the index AM doesn't implement this
correctly it becomes a crash bug, which seemed unacceptable in an
extensible server.

After implementing (a), I chose (b) and took extra time to implement
the the ugly API in preference to the possibility of a crash bug. I am
open to following consensus on that and to resubmit other patches as
required.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2017-02-28 18:02:02 [PATCH] Generic type subscripting
Previous Message Simon Riggs 2017-02-28 17:49:05 Re: Allow pg_dumpall to work without pg_authid