Re: Minmax indexes

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minmax indexes
Date: 2014-08-12 22:52:44
Message-ID: 20140812225244.GJ5728@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> I couldn't resist starting to hack on this, and implemented the
> scheme I've been having in mind:
>
> 1. MMTuple contains the block number of the heap page (range) that
> the tuple represents. Vacuum is no longer needed to clean up old
> tuples; when an index tuples is updated, the old tuple is deleted
> atomically with the insertion of a new tuple and updating the
> revmap, so no garbage is left behind.
>
> 2. LockTuple is gone. When following the pointer from revmap to
> MMTuple, the block number is used to check that you land on the
> right tuple. If not, the search is started over, looking at the
> revmap again.

Thanks, looks good, yeah. Did you just forget to attach the
access/rmgrdesc/minmaxdesc.c file, or did you ignore it altogether?
Anyway I hacked one up, and cleaned up some other things.

> I'm sure this still needs some cleanup, but here's the patch, based
> on your v14. Now that I know what this approach looks like, I still
> like it much better. The insert and update code is somewhat more
> complicated, because you have to be careful to lock the old page,
> new page, and revmap page in the right order. But it's not too bad,
> and it gets rid of all the complexity in vacuum.

It seems there is some issue here, because pageinspect tells me the
index is not growing properly for some reason. minmax_revmap_data gives
me this array of TIDs after a bunch of insert/vacuum/delete/ etc:

"(2,1)","(2,2)","(2,3)","(2,4)","(2,5)","(4,1)","(5,1)","(6,1)","(7,1)","(8,1)","(9,1)","(10,1)","(11,1)","(12,1)","(13,1)","(14,1)","(15,1)","(16,1)","(17,1)","(18,1)","(19,1)","(20,1)","(21,1)","(22,1)","(23,1)","(24,1)","(25,1)","(26,1)","(27,1)","(28,1)","(29,1)","(30,1)","(31,1)","(32,1)","(33,1)","(34,1)","(35,1)","(36,1)","(37,1)","(38,1)","(39,1)","(40,1)","(41,1)","(42,1)","(43,1)","(44,1)","(45,1)","(46,1)","(47,1)","(48,1)","(49,1)","(50,1)","(51,1)","(52,1)","(53,1)","(54,1)","(55,1)","(56,1)","(57,1)","(58,1)","(59,1)","(60,1)","(61,1)","(62,1)","(63,1)","(64,1)","(65,1)","(66,1)","(67,1)","(68,1)","(69,1)","(70,1)","(71,1)","(72,1)","(73,1)","(74,1)","(75,1)","(76,1)","(77,1)","(78,1)","(79,1)","(80,1)","(81,1)","(82,1)","(83,1)","(84,1)","(85,1)","(86,1)","(87,1)","(88,1)","(89,1)","(90,1)","(91,1)","(92,1)","(93,1)","(94,1)","(95,1)","(96,1)","(97,1)","(98,1)","(99,1)","(100,1)","(101,1)","(102,1)","(103,1)","(104,1)","(105,1)","(106,1)","(107,1)","(108,1)","(109,1)","(110,1)","(111,1)","(112,1)","(113,1)","(114,1)","(115,1)","(116,1)","(117,1)","(118,1)","(119,1)","(120,1)","(121,1)","(122,1)","(123,1)","(124,1)","(125,1)","(126,1)","(127,1)","(128,1)","(129,1)","(130,1)","(131,1)","(132,1)","(133,1)","(134,1)"

There are some who would think that getting one item per page is
suboptimal. (Maybe it's just a missing FSM update somewhere.)

I've been hacking away a bit more at this; will post updated patch
probably tomorrow (was about to post but just found a memory stomp in
pageinspect.)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-08-12 22:54:04 Re: replication commands and log_statements
Previous Message Andres Freund 2014-08-12 22:04:22 Re: WAL format and API changes (9.5)