Re: pgsql: Compute XID horizon for page level index vacuum on primary.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Compute XID horizon for page level index vacuum on primary.
Date: 2019-04-02 01:26:59
Message-ID: 20190402012659.4znhbisezbm7juvf@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Hi,

On 2019-03-30 11:44:36 -0400, Robert Haas wrote:
> On Sat, Mar 30, 2019 at 6:33 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > I didn't understand that last sentence.
> >
> > Here's an attempt to write a suitable comment for the quick fix. And
> > I suppose effective_io_concurrency is a reasonable default.
> >
> > It's pretty hard to think of a good way to get your hands on the real
> > value safely from here. I wondered if there was a way to narrow this
> > to just GLOBALTABLESPACE_OID since that's where pg_tablespace lives,
> > but that doesn't work, we access other catalog too in that path.
> >
> > Hmm, it seems a bit odd that 0 is supposed to mean "disable issuance
> > of asynchronous I/O requests" according to config.sgml, but here 0
> > will prefetch 10 buffers.
>
> Mmmph. I'm starting to think we're not going to get a satisfactory
> result here unless we make this controlled by something other than
> effective_io_concurrency. There's just no reason to suppose that the
> same setting that we use to control prefetching for bitmap index scans
> is also going to be right for what's basically a bulk operation.
>
> Interestingly, Dilip Kumar ran into similar issues recently while
> working on bulk processing for undo records for zheap. In that case,
> you definitely want to prefetch the undo aggressively, because you're
> reading it front to back and backwards scans suck without prefetching.
> And you possibly also want to prefetch the data pages to which the
> undo that you are prefetching applies, but maybe not as aggressively
> because you're going to be doing a WAL write for each data page and
> flooding the system with too many reads could be counterproductive, at
> least if pg_wal and the rest of $PGDATA are not on separate spindles.
> And even if they are, it's possible that as you suck in undo pages and
> the zheap pages that they need to update, you might evict dirty pages,
> generating write activity against the data directory.

I'm not yet convinced it's necessary to create a new GUC, but also not
strongly opposed. I've created an open items issue for it, so we don't
forget.

> Overall I'm inclined to think that we're making the same mistake here
> that we did with work_mem, namely, assuming that you can control a
> bunch of different prefetching behaviors with a single GUC and things
> will be OK. Let's just create a new GUC for this and default it to 10
> or something and go home.

I agree that we needed to split work_mem, but a) that was far less clear
for many years b) there was no logic ot use more work_mem in
maintenance-y cases...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Thomas Munro 2019-04-02 01:47:11 pgsql: Add wal_recycle and wal_init_zero GUCs.
Previous Message Andres Freund 2019-04-01 22:05:05 pgsql: Only allow heap in a number of contrib modules.

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-04-02 01:48:01 Re: patch to allow disable of WAL recycling
Previous Message Amit Langote 2019-04-02 01:26:32 Re: Ordered Partitioned Table Scans