Re: Enabling B-Tree deduplication by default

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Enabling B-Tree deduplication by default
Date: 2020-01-30 19:40:24
Message-ID: CAH2-Wz=eHxkh0TDQwPTnaVbcBNV3GYh0xDSWPrJiSKntnQ6ehg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 30, 2020 at 11:16 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I prefer to think of the patch as being about improving the stability
> and predictability of Postgres with certain workloads, rather than
> being about overall throughput. Postgres has an ungoing need to VACUUM
> indexes, so making indexes smaller is generally more compelling than
> it would be with another system. That said, there are certainly quite
> a few cases that have big improvements in throughput and latency.

I also reran TPC-C/benchmarksql with the patch (v30). TPC-C has hardly
any non-unique indexes, which is a little unrealistic. I found that
the patch was up to 7% faster in the first few hours, since it can
control the bloat from certain non-HOT updates. This isn't a
particularly relevant workload, since almost all UPDATEs don't affect
indexed columns. The incoming-item-is-duplicate heuristic works well
with TPC-C, so there is probably hardly any possible downside there.

I think that I should commit the patch without the GUC tentatively.
Just have the storage parameter, so that everyone gets the
optimization without asking for it. We can then review the decision to
enable deduplication generally after the feature has been in the tree
for several months.

There is no need to make a final decision about whether or not the
optimization gets enabled before committing the patch.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-01-30 19:43:46 Re: Hash join not finding which collation to use for string hashing
Previous Message Tom Lane 2020-01-30 19:39:47 Re: pg_restore crash when there is a failure before all child process is created