Re: Faster inserts with mostly-monotonically increasing values

From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com>
Subject: Re: Faster inserts with mostly-monotonically increasing values
Date: 2018-03-23 04:49:38
Message-ID: CAA8=A78bTsWX1Agm7UB6tSsVbbU-_towWdxggLb0JzqpHubp5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 23, 2018 at 10:20 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Thu, Mar 22, 2018 at 3:29 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>>
>> On Thu, Mar 22, 2018 at 7:22 AM, Claudio Freire <klaussfreire(at)gmail(dot)com>
>> wrote:
>>>
>>>
>>> If you're not planning on making any further changes, would you mind
>>> posting a coalesced patch?
>>>
>>> Notice that I removed the last offset thing only halfway, so it would
>>> need some cleanup.
>>
>>
>> Here is an updated patch. I removed the last offset caching thing completely
>> and integrated your changes for conditional lock access. Some other
>> improvements to test cases too. I realised that we must truncate and
>> re-insert to test index fastpath correctly.

This patch looks in pretty good shape. I have been trying hard to
think of some failure mode but haven't been able to come up with one.

We can put off for another day consideration of if/when e can skip the
check for serialization conflict.

> Some comments
>
> + /*
> + * It's very common to have an index on an auto-incremented or
> + * monotonically increasing value. In such cases, every insertion happens
> + * towards the end of the index. We try to optimise that case by caching
> + * the right-most block of the index. If our cached block is still the
> + * rightmost block, has enough free space to accommodate a new entry and
> + * the insertion key is greater or equal to the first key in this page,
> + * then we can safely conclude that the new key will be inserted in the
> + * cached block. So we simply search within the cached page and insert the
> + * key at the appropriate location. We call it a fastpath.
>
> It should say "the insertion key is strictly greater than the first key"

right.

>
> Equal cases cannot be accepted since it would break uniqueness checks,
> so the comment should say that. The code already is correctly checking
> for strict inequality, it's just the comment that is out of sync.
>
> You might accept equal keys for non-unique indexes, but I don't
> believe making that distinction in the code is worth the hassle.
>
> Also, "rightmost block" != "rightmost leaf page" ("leaf" being the key
> difference). So it should say "rightmost leaf page".

right.

[...]
>
> Setting "offset = InvalidOffsetNumber" in that contorted way is
> unnecessary. You can remove the first assignment and instead
> initialize unconditionally right after the fastpath block (that
> doesn't make use of offset anyway):

Yes, I noticed that and it's confusing, Just set it at the top.

>
> In indexing.out:
>
> +explain select sum(a) from fastpath where a = 6456;
> + QUERY PLAN
> +------------------------------------------------------------------------------------
> + Aggregate (cost=4.31..4.32 rows=1 width=8)
> + -> Index Only Scan using fpindex1 on fastpath (cost=0.29..4.30
> rows=1 width=4)
> + Index Cond: (a = 6456)
> +(3 rows)
>
> Having costs in explain tests can be fragile. Better use "explain
> (costs off)". If you run "make check" continuously in a loop, you
> should get some failures related to that pretty quickly.
>

Agree about costs off, but I'm fairly dubious of the value of using
EXPLAIN at all here.Nothing in the patch should result in any change
in EXPLAIN output.

I would probably just have a few regression lines that should be sure
to exercise the code path and leave it at that.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-03-23 04:55:22 Re: Re: csv format for psql
Previous Message Michael Paquier 2018-03-23 04:46:39 Re: [PoC PATCH] Parallel dump to /dev/null