Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-07-11 04:53:04
Message-ID: CAH2-WznZ-bLpf32sL-GK2qyE1XLa+rcp5x0Xf6Nd8tVbTZY6hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 6, 2019 at 4:08 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I took a closer look at this patch, and have some general thoughts on
> its design, and specific feedback on the implementation.

I have some high level concerns about how the patch might increase
contention, which could make queries slower. Apparently that is a real
problem in other systems that use MVCC when their bitmap index feature
is used -- they are never really supposed to be used with OLTP apps.
This patch makes nbtree behave rather a lot like a bitmap index.
That's not exactly true, because you're not storing a bitmap or
compressing the TID lists, but they're definitely quite similar. It's
easy to imagine a hybrid approach, that starts with a B-Tree with
deduplication/TID lists, and eventually becomes a bitmap index as more
duplicates are added [1].

It doesn't seem like it would be practical for these other MVCC
database systems to have standard B-Tree secondary indexes that
compress duplicates gracefully in the way that you propose to with
this patch, because lock contention would presumably be a big problem
for the same reason as it is with their bitmap indexes (whatever the
true reason actually is). Is it really possible to have something
that's adaptive, offering the best of both worlds?

Having dug into it some more, I think that the answer for us might
actually be "yes, we can have it both ways". Other database systems
that are also based on MVCC still probably use a limited form of index
locking, even in READ COMMITTED mode, though this isn't very widely
known. They need this for unique indexes, but they also need it for
transaction rollback, to remove old entries from the index when the
transaction must abort. The section "6.7 Standard Practice" from the
paper "Architecture of a Database System" [2] goes into this, saying:

"All production databases today support ACID transactions. As a rule,
they use write-ahead logging for durability, and two-phase locking for
concurrency control. An exception is PostgreSQL, which uses
multiversion concurrency control throughout."

I suggest reading "6.7 Standard Practice" in full.

Anyway, I think that *hundreds* or even *thousands* of rows are
effectively locked all at once when a bitmap index needs to be updated
in these other systems -- and I mean a heavyweight lock that lasts
until the xact commits or aborts, like a Postgres row lock. As I said,
this is necessary simply because the transaction might need to roll
back. Of course, your patch never needs to do anything like that --
the only risk is that buffer lock contention will be increased. Maybe
VACUUM isn't so bad after all!

Doing deduplication adaptively and automatically in nbtree seems like
it might play to the strengths of Postgres, while also ameliorating
its weaknesses. As the same paper goes on to say, it's actually quite
unusual that PostgreSQL has *transactional* full text search built in
(using GIN), and offers transactional, high concurrency spatial
indexing (using GiST). Actually, this is an additional advantages of
our "pure" approach to MVCC -- we can add new high concurrency,
transactional access methods relatively easily.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.3159&rep=rep1&type=pdf
[2] http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-07-11 04:58:06 Re: pg_receivewal documentation
Previous Message Amit Kapila 2019-07-11 04:31:35 Re: POC: Cleaning up orphaned files using undo logs