Re: Unhappy about API changes in the no-fsm-for-small-rels patch

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Date: 2019-04-25 03:20:40
Message-ID: CAA4eK1JFD=H0iPbmjYXKiek0x1gJEh7E_4rqPLW8rqkUp4_=og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 24, 2019 at 9:49 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2019-04-24 11:28:32 +0530, Amit Kapila wrote:
> > On Tue, Apr 23, 2019 at 10:59 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > > > On 2019-04-22 18:49:44 +0530, Amit Kapila wrote:
> > > > I think we should first try to see in this new scheme (a) when to set
> > > > the map, (b) when to clear it, (c) how to use. I have tried to
> > > > summarize my thoughts about it, let me know what do you think about
> > > > the same?
> > > >
> > > > When to set the map.
> > > > At the beginning (the first time relation is used in the backend), the
> > > > map will be clear. When the first time in the backend, we find that
> > > > FSM doesn't exist and the number of blocks is lesser than
> > > > HEAP_FSM_CREATION_THRESHOLD, we set the map for the total blocks that
> > > > exist at that time and mark all or alternate blocks as available.
> > >
> > > I think the alternate blocks scheme has to go. It's not defensible.
> > >
> >
> > Fair enough, I have changed it in the attached patch. However, I
> > think we should test it once the patch is ready as we have seen a
> > small performance regression due to that.
>
> Sure, but that was because you re-scanned from scratch after every
> insertion, no?
>

Possible.

>
> > > And sure, leaving that aside we could store one byte per block
> >
> > Hmm, I think you mean to say one-bit per block, right?
>
> No, I meant byte.
>

Upthread you have said: "Hm, given realistic
HEAP_FSM_CREATION_THRESHOLD, and the fact that we really only need one
bit per relation, it seems like map should really be just a uint32
with one bit per page. It'd allow different AMs to have different
numbers of dont-create-fsm thresholds without needing additional
memory (up to 32 blocks)."

I can understand the advantage of one-bit per-page suggestion, but now
you are telling one-byte per-page. I am confused between those two
options. Am, I missing something?

> The normal FSM saves how much space there is for a
> block, but the current local fsm doesn't. That means pages are marked as
> unavailble even though other tuples would possibly fit.
>

Sure, in regular FSM, the vacuum can update the available space, but
we don't have such a provision for local map unless we decide to keep
it in shared memory.

>
> > > It's possible that'd come with
> > > some overhead - I've not thought sufficiently about it: I assume we'd
> > > still start out in each backend assuming each page is empty, and we'd
> > > then rely on RelationGetBufferForTuple() to update that. What I wonder
> > > is if we'd need to check if an on-disk FSM has been created every time
> > > the space on a page is reduced? I think not, if we use invalidations to
> > > notify others of the existance of an on-disk FSM. There's a small race,
> > > but that seems ok.
>
> > Do you mean to say that vacuum or some backend should invalidate in
> > case it first time creates the FSM for a relation?
>
> Right.
>
>
> > I think it is quite possible that the vacuum takes time to trigger
> > such invalidation if there are fewer deletions. And we won't be able
> > to use newly added page/s.
>
> I'm not sure I understand what you mean by that? If the backend that
> ends up creating the FSM - because it extended the relation beyond 4
> pages - issues an invalidation, the time till other backends pick that
> up should be minimal?
>

Consider when a backend-1 starts inserting into a relation, it has
just two pages and we create a local map in the relation which has two
pages. Now, backend-2 extends the relation by one-page, how and when
will backend-1 comes to know about that. One possibility is that once
all the pages present in backend-1's relation becomes invalid
(used-up), we again check the number of blocks and update the local
map.

>
> > IIUC, you are suggesting to issue invalidations when the (a) vacuum
> > finds there is no FSM and page has more space now, (b) invalidation to
> > notify the existence of FSM. IT seems to me that issuing more
> > invalidations for the purpose of FSM can lead to an increased number
> > of relation builds in the overall system. I think this can have an
> > impact when there are many small relations in the system which in some
> > scenarios might not be uncommon.
>
> If this becomes an issue I'd just create a separate type of
> invalidation, one that just signals that the FSM is being invalidated.
>

Oh, clever idea, but I guess that will be some work unless we already
do something similar elsewhere. Anyway, we can look into it later.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2019-04-25 03:21:21 Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Previous Message Paul Guo 2019-04-25 02:41:31 Re: [Patch] Check file type before calling AllocateFile() for files out of pg data directory to avoid potential issues (e.g. hang).