Re: PG 13 release notes, first draft

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PG 13 release notes, first draft
Date: 2020-05-12 00:14:01
Message-ID: 20200512001401.GG4666@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 11, 2020 at 05:05:29PM -0700, Peter Geoghegan wrote:
> On Mon, May 11, 2020 at 4:10 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > > think that you should point out that deduplication works by storing
> > > the duplicates in the obvious way: Only storing the key once per
> > > distinct value (or once per distinct combination of values in the case
> > > of multi-column indexes), followed by an array of TIDs (i.e. a posting
> > > list). Each TID points to a separate row in the table.
> >
> > These are not details that should be in the release notes since the
> > internal representation is not important for its use.
>
> I am not concerned about describing the specifics of the on-disk
> representation, and I don't feel too strongly about the storage
> parameter (leave it out). I only ask that the wording convey the fact
> that the deduplication feature is not just a quantitative improvement
> -- it's a qualitative behavioral change, that will help data
> warehousing in particular. This wasn't the case with the v12 work on
> B-Tree duplicates (as I said last year, I thought of the v12 stuff as
> fixing a problem more than an enhancement).
>
> With the deduplication feature added to Postgres v13, the B-Tree code
> can now gracefully deal with low cardinality data by compressing the
> duplicates as needed. This is comparable to bitmap indexes in
> proprietary database systems, but without most of their disadvantages
> (in particular around heavyweight locking, deadlocks that abort
> transactions, etc). It's easy to imagine this making a big difference
> with analytics workloads. The v12 work made indexes with lots of
> duplicates 15%-30% smaller (compared to v11), but the v13 work can
> make them 60% - 80% smaller in many common cases (compared to v12). In
> extreme cases indexes might even be ~12x smaller (though that will be
> rare).

Agreed. How is this?

This allows efficient btree indexing of low cardinality columns.
Users upgrading with pg_upgrade will need to use REINDEX to make use of
this feature.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EnterpriseDB https://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-05-12 00:17:44 Re: PG 13 release notes, first draft
Previous Message David G. Johnston 2020-05-12 00:13:38 Event trigger code comment duplication