Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Юрий Соколов <funny(dot)falcon(at)gmail(dot)com>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2020-02-14 02:57:47
Message-ID: CAH2-WzmQGYDDoAETGhpGtJQRv_uFHMjvQZ6JdLV-sxGoCgLBNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 6, 2020 at 6:18 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Attached is v32, which is even closer to being committable.

Attached is v33, which adds the last piece we need: opclass
infrastructure that tells nbtree whether or not deduplication can be
applied safely. This is based on work by Anastasia that was shared
with me privately.

I may not end up committing 0001-* as a separate patch, but it makes
sense to post it that way to make review easier -- this is supposed to
be infrastructure that isn't just useful for the deduplication patch.
0001-* adds a new C function, _bt_allequalimage(), which only actually
gets called within code added by 0002-* (i.e. the patch that adds the
deduplication feature). At this point, my main concern is that I might
not have the API exactly right in a world where these new support
functions are used by more than just the nbtree deduplication feature.
I would like to get detailed review of the new opclass infrastructure
stuff, and have asked for it directly, but I don't think that
committing the patch needs to block on that.

I've now written a fair amount of documentation for both the feature
and the underlying opclass infrastructure. It probably needs a bit
more copy-editing, but I think that it's generally in fairly good
shape. It might be a good idea for those who would like to review the
opclass stuff to start with some of my btree.sgml changes, and work
backwards -- the shape of the API itself is the important thing within
the 0001-* patch.

New opclass proc
================

In general, supporting deduplication is the rule for B-Tree opclasses,
rather than the exception. Most can use the generic
btequalimagedatum() routine as their support function 4, which
unconditionally indicates that deduplication is safe. There is a new
test that tries to catch opclasses that omitted to do this. Here is
the opr_sanity.out changes added by the first patch:

-- Almost all Btree opclasses can use the generic btequalimagedatum function
-- as their equalimage proc (support function 4). Look for opclasses that
-- don't do so; newly added Btree opclasses will usually be able to support
-- deduplication with little trouble.
SELECT amproc::regproc AS proc, opf.opfname AS opfamily_name,
opc.opcname AS opclass_name, opc.opcintype::regtype AS opcintype
FROM pg_am am
JOIN pg_opclass opc ON opc.opcmethod = am.oid
JOIN pg_opfamily opf ON opc.opcfamily = opf.oid
LEFT JOIN pg_amproc ON amprocfamily = opf.oid AND
amproclefttype = opcintype AND
amprocnum = 4
WHERE am.amname = 'btree' AND
amproc IS DISTINCT FROM 'btequalimagedatum'::regproc
ORDER BY amproc::regproc::text, opfamily_name, opclass_name;
proc | opfamily_name | opclass_name | opcintype
-------------------+------------------+------------------+------------------
bpchar_equalimage | bpchar_ops | bpchar_ops | character
btnameequalimage | text_ops | name_ops | name
bttextequalimage | text_ops | text_ops | text
bttextequalimage | text_ops | varchar_ops | text
| array_ops | array_ops | anyarray
| enum_ops | enum_ops | anyenum
| float_ops | float4_ops | real
| float_ops | float8_ops | double precision
| jsonb_ops | jsonb_ops | jsonb
| money_ops | money_ops | money
| numeric_ops | numeric_ops | numeric
| range_ops | range_ops | anyrange
| record_image_ops | record_image_ops | record
| record_ops | record_ops | record
| tsquery_ops | tsquery_ops | tsquery
| tsvector_ops | tsvector_ops | tsvector
(16 rows)

Those types/opclasses that you see here with a "proc" that is NULL
cannot use deduplication under any circumstances -- they have no
pg_amproc entry for B-Tree support function 4. The other four rows at
the start (those with a non-NULL "proc") are for collatable types,
where using deduplication is conditioned on not using a
nondeterministic collation. The details are in the sgml docs for the
second patch, where I go into the issue with numeric display scale,
why nondeterministic collations disable the use of deduplication, etc.

Note that these "equalimage" procs don't take any arguments, which is
a first for an index AM support function. Even still, we can take a
collation at CREATE INDEX time using the standard PG_GET_COLLATION()
mechanism. I suppose that it's a little bit odd to have no arguments
but still call PG_GET_COLLATION() in certain support functions. Still,
it works just fine, at least as far as the needs of deduplication are
concerned.

Since using deduplication is supposed to pretty much be the norm from
now on, it seemed like it might make sense to add a NOTICE about it
during CREATE INDEX -- a notice letting the user know that it isn't
being used due to a lack of opclass support:

regression=# create table foo(bar numeric);
CREATE TABLE
regression=# create index on foo(bar);
NOTICE: index "foo_bar_idx" cannot use deduplication
CREATE INDEX

Note that this NOTICE isn't seen with an INCLUDE index, since that's
expected to not support deduplication.

I have a feeling that not everybody will like this, which is why I'm
pointing it out.

Thoughts?

--
Peter Geoghegan

Attachment Content-Type Size
v33-0001-Add-equalimage-B-Tree-opclass-support-functions.patch application/x-patch 39.5 KB
v33-0003-Teach-pageinspect-about-nbtree-posting-lists.patch application/x-patch 18.5 KB
v33-0004-DEBUG-Show-index-values-in-pageinspect.patch application/x-patch 4.3 KB
v33-0002-Add-deduplication-to-nbtree.patch application/x-patch 234.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-02-14 03:04:14 Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
Previous Message Melanie Plageman 2020-02-14 02:01:38 Re: Memory-Bounded Hash Aggregation