Re: Enabling B-Tree deduplication by default

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Enabling B-Tree deduplication by default
Date: 2020-01-30 20:57:37
Message-ID: CA+TgmoYpqEVP-hTx0Ut4c+16ynMakDy5MCprPHAx61vJvXuogA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 30, 2020 at 2:40 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Thu, Jan 30, 2020 at 11:16 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > I prefer to think of the patch as being about improving the stability
> > and predictability of Postgres with certain workloads, rather than
> > being about overall throughput. Postgres has an ungoing need to VACUUM
> > indexes, so making indexes smaller is generally more compelling than
> > it would be with another system. That said, there are certainly quite
> > a few cases that have big improvements in throughput and latency.
>
> I also reran TPC-C/benchmarksql with the patch (v30). TPC-C has hardly
> any non-unique indexes, which is a little unrealistic. I found that
> the patch was up to 7% faster in the first few hours, since it can
> control the bloat from certain non-HOT updates. This isn't a
> particularly relevant workload, since almost all UPDATEs don't affect
> indexed columns. The incoming-item-is-duplicate heuristic works well
> with TPC-C, so there is probably hardly any possible downside there.
>
> I think that I should commit the patch without the GUC tentatively.
> Just have the storage parameter, so that everyone gets the
> optimization without asking for it. We can then review the decision to
> enable deduplication generally after the feature has been in the tree
> for several months.
>
> There is no need to make a final decision about whether or not the
> optimization gets enabled before committing the patch.

That seems reasonable.

I suspect that you're right that the worst-case downside is not big
enough to really be a problem given all the upsides. But the advantage
of getting things committed is that we can find out what users think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-01-30 21:15:28 Re: Hash join not finding which collation to use for string hashing
Previous Message Tom Lane 2020-01-30 20:50:21 Re: Hash join not finding which collation to use for string hashing