Re: WIP: Avoid creation of the free space map for small tables

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: jcnaylor(at)gmail(dot)com
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Avoid creation of the free space map for small tables
Date: 2018-11-19 05:06:02
Message-ID: CAA4eK1Jq4TFr-Fna61irAgR+DnWP=ghyH+VDEmh=4HhaaEFdaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 19, 2018 at 7:30 AM John Naylor <jcnaylor(at)gmail(dot)com> wrote:
>
> On 11/16/18, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > +/* Status codes for the local map. */
> > +#define FSM_LOCAL_ZERO 0x00 /* Beyond the end of the relation */
> > +#define FSM_LOCAL_AVAIL 0x01 /* Available to try */
> > +#define FSM_LOCAL_TRIED 0x02 /* Already tried, not enough space */
> >
> > Instead of maintaining three states, can't we do with two states
> > (Available and Not Available), basically combine 0 and 2 in your case.
> > I think it will save some cycles in
> > fsm_local_set, where each time you need to initialize all the entries
> > in the map. I think we can argue that it is not much overhead, but I
> > think it is better code-wise also if we can make it happen with fewer
> > states.
>
> That'd work too, but let's consider this scenario: We have a 2-block
> table that has no free space. After trying each block, the local cache
> looks like
>
> 0123
> TT00
>
> Let's say we have to wait to acquire a relation extension lock,
> because another backend had already started extending the heap by 1
> block. We call GetPageWithFreeSpace() and now the local map looks like
>
> 0123
> TTA0
>
> By using bitwise OR to set availability, the already-tried blocks
> remain as they are. With only 2 states, the map would look like this
> instead:
>
> 0123
> AAAN
>

I expect below part of code to go-away.
+fsm_local_set(Relation rel, BlockNumber nblocks)
{
..
+ /*
+ * If the blkno is beyond the end of the relation, the status should
+ * be zero already, but make sure it is. If the blkno is within the
+ * relation, mark it available unless it's already been tried.
+ */
+ for (blkno = 0; blkno < HEAP_FSM_CREATION_THRESHOLD; blkno++)
+ {
+ if (blkno < nblocks)
+ FSMLocalMap[blkno] |= FSM_LOCAL_AVAIL;
+ else
+ FSMLocalMap[blkno] = FSM_LOCAL_ZERO;
+ }
..
}

In my mind for such a case it should look like below:
0123
NNAN

> If we assume that an insert into the newly-created block 2 will almost
> always succeed, we don't have to worry about wasting time re-checking
> the first 2 full blocks. Does that sound right to you?
>

As explained above, such a situation won't exist.

>
> > Some assorted comments:
> > 1.
> > <para>
> > -Each heap and index relation, except for hash indexes, has a Free Space
> > Map
> > +Each heap relation, unless it is very small, and each index relation,
> > +except for hash indexes, has a Free Space Map
> > (FSM) to keep track of available space in the relation. It's stored
> >
> > It appears that line has ended abruptly.
>
> Not sure what you're referring to here.
>

There is a space after "has a Free Space Map " so you can combine next line.

> > 2.
> > page = BufferGetPage(buffer);
> > + targetBlock = BufferGetBlockNumber(buffer);
> >
> > if (!PageIsNew(page))
> > elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
> > - BufferGetBlockNumber(buffer),
> > + targetBlock,
> > RelationGetRelationName(relation));
> >
> > PageInit(page, BufferGetPageSize(buffer), 0);
> > @@ -623,7 +641,18 @@ loop:
> > * current backend to make more insertions or not, which is probably a
> > * good bet most of the time. So for now, don't add it to FSM yet.
> > */
> > - RelationSetTargetBlock(relation, BufferGetBlockNumber(buffer));
> > + RelationSetTargetBlock(relation, targetBlock);
> >
> > Is this related to this patch? If not, I suggest let's do it
> > separately if required.
>
> I will separate this out.
>
> > 3.
> > static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
> > - uint8 newValue, uint8 minValue);
> > + uint8 newValue, uint8 minValue);
> >
> > This appears to be a spurious change.
>
> It was intentional, but I will include it separately as above.
>
> > 4.
> > @@ -378,24 +386,15 @@ RelationGetBufferForTuple(Relation relation, Size
> > len,
> > * target.
> > */
> > targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
> > +
> > + /*
> > + * In case we used an in-memory map of available blocks, reset
> > + * it for next use.
> > + */
> > + if (targetBlock < HEAP_FSM_CREATION_THRESHOLD)
> > + ClearLocalMap();
> >
> > How will you clear the local map during error? I think you need to
> > clear it in abort path and you can name the function as
> > FSMClearLocalMap or something like that.
>
> That sounds right, and I will rename the function that way. For the
> abort path, were you referring to this or somewhere else?
>
> if (!PageIsNew(page))
> elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
> targetBlock,
> RelationGetRelationName(relation));
>

I think it might come from any other place between when you set it and
before it got cleared (like any intermediate buffer and pin related
API's).

> > 5.
> > +/*#define TRACE_TARGETBLOCK */
> >
> > Debugging leftover, do you want to retain this and related stuff
> > during the development of patch?
>
> I modeled this after TRACE_VISIBILITYMAP in visibilitymap.c. It's
> useful for development, but I don't particularly care whether it's in
> the final verision.
>

Okay, so if you want to retain it for the period of development, then
I am fine with it. We can see at the end if it makes sense to retain
it.

> Also, I found an off-by-one error that caused an unnecessary
> smgrexists() call in tables with threshold + 1 pages. This will be
> fixed in the next version.
>

Thanks.

One other thing that slightly bothers me is the call to
RelationGetNumberOfBlocks via fsm_allow_writes. It seems that call
will happen quite frequently in this code-path and can have some
performance impact. As of now, I don't have any idea to avoid it or
reduce it more than what you already have in the patch, but I think we
should try some more to avoid it. Let me know if you have any ideas
around that?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2018-11-19 05:18:02 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
Previous Message Thomas Munro 2018-11-19 04:48:05 Re: Postgres, fsync, and OSs (specifically linux)