Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2023-03-13 14:55:29
Message-ID: CAD21AoB_OkFXCGF84USSQ3OXarQDeszODs__B8j0+-J8+BWh1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 13, 2023 at 10:28 PM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
> On Mon, Mar 13, 2023 at 8:41 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, Mar 12, 2023 at 12:54 AM John Naylor
> > <john(dot)naylor(at)enterprisedb(dot)com> wrote:
> > >
> > > On Fri, Mar 10, 2023 at 9:30 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > > > * Additional size classes. It's important for an alternative of path
> > > > compression as well as supporting our decoupling approach. Middle
> > > > priority.
> > >
> > > I'm going to push back a bit and claim this doesn't bring much gain, while it does have a complexity cost. The node1 from Andres's prototype is 32 bytes in size, same as our node3, so it's roughly equivalent as a way to ameliorate the lack of path compression.
> >
> > But does it mean that our node1 would help reduce the memory further
> > since since our base node type (i.e. RT_NODE) is smaller than the base
> > node type of Andres's prototype? The result I shared before showed
> > 1.2GB vs. 1.9GB.
>
> The benefit is found in a synthetic benchmark with random integers. I highly doubt that anyone would be willing to force us to keep binary-searching the 1GB array for one more cycle on account of not adding a size class here. I'll repeat myself and say that there are also maintenance costs.
>
> In contrast, I'm fairly certain that our attempts thus far at memory accounting/limiting are not quite up to par, and lacking enough to jeopardize the feature. We're already discussing that, so I'll say no more.

I agree that memory accounting/limiting stuff is the highest priority.
So what kinds of size classes do you think we need? node3, 15, 32, 61
and 256?

>
> > > I say "roughly" because the loop in node3 is probably noticeably slower. A new size class will by definition still use that loop.
> >
> > I've evaluated the performance of node1 but the result seems to show
> > the opposite.
>
> As an aside, I meant the loop in our node3 might make your node1 slower than the prototype's node1, which was coded for 1 member only.

Agreed.

>
> > > > * Node shrinking support. Low priority.
> > >
> > > This is an architectural wart that's been neglected since the tid store doesn't perform deletion. We'll need it sometime. If we're not going to make this work, why ship a deletion API at all?
> > >
> > > I took a look at this a couple weeks ago, and fixing it wouldn't be that hard. I even had an idea of how to detect when to shrink size class within a node kind, while keeping the header at 5 bytes. I'd be willing to put effort into that, but to have a chance of succeeding, I'm unwilling to make it more difficult by adding more size classes at this point.
> >
> > I think that the deletion (and locking support) doesn't have use cases
> > in the core (i.e. tidstore) but is implemented so that external
> > extensions can use it.
>
> I think these cases are a bit different: Doing anything with a data structure stored in shared memory without a synchronization scheme is completely unthinkable and insane.

Right.

> I'm not yet sure if deleting-without-shrinking is a showstopper, or if it's preferable in v16 than no deletion at all.
>
> Anything we don't implement now is a limit on future use cases, and thus a cause for objection. On the other hand, anything we implement also represents more stuff that will have to be rewritten for high-concurrency.

Okay. Given that adding shrinking support also requires maintenance
costs (and probably new test cases?) and there are no use cases in the
core, I'm not sure it's worth supporting it at this stage. So I prefer
either shipping the deletion API as it is and removing the deletion
API. I think that it's a discussion point that we'd like to hear
feedback from other hackers.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yurii Rashkovskii 2023-03-13 14:57:47 [PATCH] Extend the length of BackgroundWorker.bgw_library_name
Previous Message Tom Lane 2023-03-13 14:42:59 Re: Progress report of CREATE INDEX for nested partitioned tables