Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-11-12 23:21:53
Message-ID: CAH2-Wz=_XUu4j=vqGLmU=Re=caPy49yG4kwvyaeQiWKug4djKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 8, 2019 at 10:35 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> There is more bitrot, so I attach v22.

The patch has stopped applying once again, so I attach v23.

One reason for the bitrot is that I pushed preparatory commits,
including today's "Make _bt_keep_natts_fast() use datum_image_eq()"
commit. Good to get that out of the way.

Other changes:

* Decided to go back to turning deduplication on by default with
non-unique indexes, and off by default using unique indexes.

The unique index stuff was regressed enough with INSERT-heavy
workloads that I was put off, despite my initial enthusiasm for
enabling deduplication everywhere.

* Disabled deduplication in system catalog indexes by deeming it
generally unsafe.

I realized that it would be impossible to provide a way to disable
deduplication in system catalog indexes if it was enabled at all. The
reason for this is simple: in general, it's not possible to set
storage parameters for system catalog indexes.

While I think that deduplication should work with system catalog
indexes on general principle, this is about an existing limitation.
Deduplication in catalog indexes can be revisited if and when somebody
figures out a way to make storage parameters work with system catalog
indexes.

* Basic user documentation -- this still needs work, but the basic
shape is now in place. I think that we should outline how the feature
works by describing the internals, including details of the data
structures. This provides guidance to users on when they should
disable or enable the feature.

This is discussed in the existing chapter on B-Tree internals. This
felt natural because it's similar to how GIN explains its compression
related features -- the discussion of the storage parameters in the
CREATE INDEX page of the docs links to a description of GIN internals
from "66.4. Implementation [of GIN]".

* nbtdedup.c "single value" strategy stuff now considers the
contribution of the page high key when considering how to deduplicate
such that nbtsplitloc.c's "single value" strategy has a usable split
point that helps it to hit its target free space. Not a very important
detail. It's nice to be consistent with the corresponding code within
nbtsplitloc.c.

* Worked through all remaining XXX/TODO/FIXME comments, except one:
The one that talks about the need for opclass infrastructure to deal
with cases like btree/numeric_ops, or text with a nondeterministic
collation.

The user docs now reference the BITWISE opclass stuff that we're
discussing over on the other thread. That's the only really notable
open item now IMV.

--
Peter Geoghegan

Attachment Content-Type Size
v23-0002-DEBUG-Add-pageinspect-instrumentation.patch application/octet-stream 8.6 KB
v23-0001-Add-deduplication-to-nbtree.patch application/octet-stream 176.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-11-12 23:26:39 Re: make pg_attribute_noreturn() work for msvc?
Previous Message Tom Lane 2019-11-12 23:20:56 Re: checking my understanding of TupleDesc