Re: Deleting older versions in unique indexes to avoid page splits

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Victor Yegorov <vyegorov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2020-12-10 02:35:39
Message-ID: CALNJ-vQONwmPM1-A7VN-175aer-eFgBVJge-dFemzfgC7PA45g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
In v11-0001-Pass-down-logically-unchanged-index-hint.patch :

+ if (hasexpression)
+ return false;
+
+ return true;

The above can be written as 'return !hasexpression;'

For +index_unchanged_by_update_var_walker:

+ * Returns true when Var that appears within allUpdatedCols located.

the sentence seems incomplete.

Currently the return value of index_unchanged_by_update_var_walker() is the
reverse of index_unchanged_by_update().
Maybe it is easier to read the code if their return values have the same
meaning.

Cheers

On Wed, Dec 9, 2020 at 5:13 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Tue, Dec 1, 2020 at 2:18 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > On Tue, Dec 1, 2020 at 1:50 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> wrote:
> > > This is a wholly new concept with a lot of heuristics. It's a lot of
> > > swallow.
>
> Attached is v11, which cleans everything up around the tableam
> interface. There are only two patches in v11, since the tableam
> refactoring made it impossible to split the second patch into a heapam
> patch and nbtree patch (there is no reduction in functionality
> compared to v10).
>
> Most of the real changes in v11 (compared to v10) are in heapam.c.
> I've completely replaced the table_compute_xid_horizon_for_tuples()
> interface with a new interface that supports all existing requirements
> (from index deletions that support LP_DEAD deletion), while also
> supporting these new requirements (e.g. bottom-up index deletion). So
> now heap_compute_xid_horizon_for_tuples() becomes
> heap_compute_delete_for_tuples(), which has different arguments but
> the same overall structure. All of the new requirements can now be
> thought of as additive things that we happen to use for nbtree
> callers, that could easily also be used in other index AMs later on.
> This means that there is a lot less new code in heapam.c.
>
> Prefetching of heap blocks for the new bottom-up index deletion caller
> now works in the same way as it has worked in
> heap_compute_xid_horizon_for_tuples() since Postgres 12 (more or
> less). This is a significant improvement compared to my original
> approach.
>
> Chaning heap_compute_xid_horizon_for_tuples() rather than adding a
> sibling function started to make sense when v10 of the patch taught
> regular nbtree LP_DEAD deletion (the thing that has been around since
> PostgreSQL 8.2) to add "extra" TIDs to check in passing, just in case
> we find that they're also deletable -- why not just have one function?
> It also means that hash indexes and GiST indexes now use the
> heap_compute_delete_for_tuples() function, though they get the same
> behavior as before. I imagine that it would be pretty straightforward
> to bring that same benefit to those other index AMs, since their
> implementations are already derived from the nbtree implementation of
> LP_DEAD deletion (and because adding extra TIDs to check in passing in
> these other index AMs should be fairly easy).
>
> > > > +} TM_IndexDelete;
> >
> > > > +} TM_IndexStatus;
> >
> > > Is it really significantly faster to have two arrays? If you had just
> > > one array, each element would be only 12 bytes long. That's not much
> > > smaller than TM_IndexDeletes, which is 8 bytes.
>
> I kept this facet of the design in v11, following some deliberation. I
> found that the TID sort operation appeared quite prominently in
> profiles, and so the optimizations mostly seemed to still make sense.
> I also kept one shellsort specialization. However, I removed the
> second specialized sort implementation, so at least there is only one
> specialization now (which is small anyway). I found that the second
> sort specialization (added to heapam.c in v10) really wasn't pulling
> its weight, even in more extreme cases of the kind that justified the
> optimizations in the first place.
>
> --
> Peter Geoghegan
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2020-12-10 02:46:08 RE: [Patch] Optimize dropping of relation buffers using dlist
Previous Message tsunakawa.takay@fujitsu.com 2020-12-10 02:23:54 RE: Parallel INSERT (INTO ... SELECT ...)