Better tracking of free space during SP-GiST index build

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: emre(at)hasegeli(dot)com
Subject: Better tracking of free space during SP-GiST index build
Date: 2016-08-24 23:45:50
Message-ID: 19621.1472082350@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Over in the thread about the SP-GiST inet opclass, I threatened to post
a patch like this, and here it is.

The basic idea is to track more than just the very latest page we've used
in each of the page categories that SP-GiST works with. I started with an
arrangement that gave an equal number of cache slots to each category, but
soon realized that that was dumb, because there are usually way more leaf
pages than anything else. So this version has a little table of how many
slots to give to each category. The constants could maybe use a bit more
fiddling, if we have some more test data sets to try this on.

On the IRRExplorer data set we discussed in the other thread, this reduces
the index size from 132MB to 120MB. Poking into that more closely with
pg_filedump, the total free space within the index drops from 42MB to
28MB. If you think those numbers don't add up, you're right --- this
seems to result in more non-leaf tuples than before. I'm not sure why;
maybe more aggressive sucking up of free space results in more splits.
(Maybe adjustment of the default spgist fillfactor would be in order
to counteract that?) But the index search time doesn't seem to be hurt,
so perhaps there's nothing to worry about.

As coded, this makes no attempt to preferentially select pages with the
most or least free space. I don't know if it'd be worth any cycles to
do that.

I'll put this in the commitfest queue. It could use review from someone
with the time and motivation to do performance testing/tuning.

regards, tom lane

Attachment Content-Type Size
spgist-cache-more-pages-1.patch text/x-diff 13.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2016-08-25 00:30:58 Re: pg_stat_lwlock wait time view
Previous Message Martín Marqués 2016-08-24 22:30:06 Re: pg_dump with tables created in schemas created by extensions