Re: Minmax indexes

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minmax indexes
Date: 2014-08-20 22:51:34
Message-ID: 20140820225133.GB6343@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas wrote:

> So, the other design change I've been advocating is to store the
> revmap in the first N blocks, instead of having the two-level
> structure with array pages and revmap pages.
>
> Attached is a patch for that, to be applied after v15. When the
> revmap needs to be expanded, all the tuples on it are moved
> elsewhere one-by-one. That adds some latency to the unfortunate guy
> who needs to do that, but as the patch stands, the revmap is only
> ever extended by VACUUM or CREATE INDEX, so I think that's fine.
> Like with my previous patch, the point is to demonstrate how much
> simpler the code becomes this way; I'm sure there are bugs and
> cleanup still necessary.

Thanks for the prodding. I didn't like this too much initially, but
after going over it a few times I agree that having less code and a less
complex physical representation is better. Your proposed approach is to
just call the update routine on every tuple in the page we're
evacuating. There are optimizations possible (such as doing bulk
updates; and instead of updating the revmap, keep a redirection pointer
in the page we just evacuated, so that the revmap can be updated lazily
later), but I have spent way too long on this already that I am fine
with keeping what we have here. If somebody later wants to contribute
improvements to this, it'd be welcome. But on the other hand the
operation is not that frequent and as you say it's not executed by
user-facing queries, so perhaps it's okay.

I cleaned it up some: mainly I created a separate file (mmpageops.c)
that now hosts the routines related to page operations: mm_doinsert,
mm_doupdate, mm_start_evacuating_page, mm_evacuate_page. There are
other rather very minor changes here and there; also added
CHECK_FOR_INTERRUPTS in all relevant loops.

This bit in mm_doupdate I just couldn't understand:

/* If both tuples are in fact equal, there is nothing to do */
if (!minmax_tuples_equal(oldtup, oldsz, origtup, origsz))
{
LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
return false;
}

Isn't the test exactly reversed? I don't see how this would work.
I updated it to

/*
* If both tuples are identical, there is nothing to do; except that if we
* were requested to move the tuple across pages, we do it even if they are
* equal.
*/
if (samepage && minmax_tuples_equal(oldtup, oldsz, origtup, origsz))
{
LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
return false;
}

> PS. Spotted one oversight in patch v15: callers of mm_doupdate must
> check the return value, and retry the operation if it returns false.

Right, thanks. Fixed.

So here's v16, rebased on top of 9bac66020. As far as I am concerned,
this is the last version before I start renaming everything to BRIN and
then commit.

contrib/pageinspect/Makefile | 2 +-
contrib/pageinspect/mmfuncs.c | 407 +++++++++++++
contrib/pageinspect/pageinspect--1.2.sql | 36 ++
contrib/pg_xlogdump/rmgrdesc.c | 1 +
doc/src/sgml/brin.sgml | 248 ++++++++
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/indices.sgml | 36 +-
doc/src/sgml/postgres.sgml | 1 +
minmax-proposal | 306 ++++++++++
src/backend/access/Makefile | 2 +-
src/backend/access/common/reloptions.c | 7 +
src/backend/access/heap/heapam.c | 22 +-
src/backend/access/minmax/Makefile | 17 +
src/backend/access/minmax/minmax.c | 942 +++++++++++++++++++++++++++++++
src/backend/access/minmax/mmpageops.c | 638 +++++++++++++++++++++
src/backend/access/minmax/mmrevmap.c | 451 +++++++++++++++
src/backend/access/minmax/mmsortable.c | 287 ++++++++++
src/backend/access/minmax/mmtuple.c | 478 ++++++++++++++++
src/backend/access/minmax/mmxlog.c | 323 +++++++++++
src/backend/access/rmgrdesc/Makefile | 3 +-
src/backend/access/rmgrdesc/minmaxdesc.c | 89 +++
src/backend/access/transam/rmgr.c | 1 +
src/backend/catalog/index.c | 24 +
src/backend/replication/logical/decode.c | 1 +
src/backend/storage/page/bufpage.c | 179 +++++-
src/backend/utils/adt/selfuncs.c | 24 +
src/include/access/heapam.h | 2 +
src/include/access/minmax.h | 52 ++
src/include/access/minmax_internal.h | 86 +++
src/include/access/minmax_page.h | 70 +++
src/include/access/minmax_pageops.h | 29 +
src/include/access/minmax_revmap.h | 36 ++
src/include/access/minmax_tuple.h | 90 +++
src/include/access/minmax_xlog.h | 106 ++++
src/include/access/reloptions.h | 3 +-
src/include/access/relscan.h | 4 +-
src/include/access/rmgrlist.h | 1 +
src/include/catalog/index.h | 8 +
src/include/catalog/pg_am.h | 2 +
src/include/catalog/pg_amop.h | 81 +++
src/include/catalog/pg_amproc.h | 73 +++
src/include/catalog/pg_opclass.h | 9 +
src/include/catalog/pg_opfamily.h | 10 +
src/include/catalog/pg_proc.h | 52 ++
src/include/storage/bufpage.h | 2 +
src/include/utils/selfuncs.h | 1 +
src/test/regress/expected/opr_sanity.out | 14 +-
src/test/regress/sql/opr_sanity.sql | 7 +-
48 files changed, 5248 insertions(+), 16 deletions(-)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
minmax-16.patch text/x-diff 185.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-08-20 22:58:05 Re: [patch] pg_copy - a command for reliable WAL archiving
Previous Message Arthur Silva 2014-08-20 22:42:28 Re: jsonb format is pessimal for toast compression