Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2023-03-20 12:33:56
Message-ID: CAFBsxsGwBF+vV+yLabRMPtDFq_p+zVwgHt65EFRuUX8tM4v+Ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 20, 2023 at 12:25 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
>
> On Fri, Mar 17, 2023 at 4:49 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
> >
> > On Fri, Mar 17, 2023 at 4:03 PM John Naylor
> > <john(dot)naylor(at)enterprisedb(dot)com> wrote:
> > >
> > > On Wed, Mar 15, 2023 at 9:32 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
> > > >
> > > > On Tue, Mar 14, 2023 at 8:27 PM John Naylor
> > > > <john(dot)naylor(at)enterprisedb(dot)com> wrote:
> > > > >
> > > > > I wrote:
> > > > >
> > > > > > > > Since the block-level measurement is likely overestimating
quite a bit, I propose to simply reverse the order of the actions here,
effectively reporting progress for the *last page* and not the current one:
First update progress with the current memory usage, then add tids for this
page. If this allocated a new block, only a small bit of that will be
written to. If this block pushes it over the limit, we will detect that up
at the top of the loop. It's kind of like our earlier attempts at a "fudge
factor", but simpler and less brittle. And, as far as OS pages we have
actually written to, I think it'll effectively respect the memory limit, at
least in the local mem case. And the numbers will make sense.
> > >
> > > > > I still like my idea at the top of the page -- at least for
vacuum and m_w_m. It's still not completely clear if it's right but I've
got nothing better. It also ignores the work_mem issue, but I've given up
anticipating all future cases at the moment.
> > >
> > > > IIUC you suggested measuring memory usage by tracking how much
memory
> > > > chunks are allocated within a block. If your idea at the top of the
> > > > page follows this method, it still doesn't deal with the point
Andres
> > > > mentioned.
> > >
> > > Right, but that idea was orthogonal to how we measure memory use, and
in fact mentions blocks specifically. The re-ordering was just to make sure
that progress reporting didn't show current-use > max-use.
> >
> > Right. I still like your re-ordering idea. It's true that the most
> > area of the last allocated block before heap scanning stops is not
> > actually used yet. I'm guessing we can just check if the context
> > memory has gone over the limit. But I'm concerned it might not work
> > well in systems where overcommit memory is disabled.
> >
> > >
> > > However, the big question remains DSA, since a new segment can be as
large as the entire previous set of allocations. It seems it just wasn't
designed for things where memory growth is unpredictable.
>
> aset.c also has a similar characteristic; allocates an 8K block upon
> the first allocation in a context, and doubles that size for each
> successive block request. But we can specify the initial block size
> and max blocksize. This made me think of another idea to specify both
> to DSA and both values are calculated based on m_w_m. For example, we

That's an interesting idea, and the analogous behavior to aset could be a
good thing for readability and maintainability. Worth seeing if it's
workable.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kumar, Sachin 2023-03-20 12:46:54 RE: Initial Schema Sync for Logical Replication
Previous Message Amit Kapila 2023-03-20 12:26:16 Re: logical decoding and replication of sequences, take 2