Fast GiST index build - further improvements

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Fast GiST index build - further improvements
Date: 2011-09-08 16:35:37
Message-ID: 4E68EED9.3010205@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Now that the main GiST index build patch has been committed, there's a
few further improvements that could make it much faster still:

Better management of the buffer pages on disk. At the moment, the
temporary file is used as a heap of pages belonging to all the buffers
in random order. I think we could speed up the algorithm considerably by
reading/writing the buffer pages sequentially. For example, when an
internal page is split, and all the tuples in its buffer are relocated,
that would be a great chance to write the new pages of the new buffers
in sequential order, instead of writing them back to the pages freed up
by the original buffer, which can be spread all around the temp file. I
wonder if we could use a separate file for each buffer? Or at least, a
separate file for all buffers that are larger than, say 100 MB in size.

Better management of in-memory buffer pages. When we start emptying a
buffer, we currently flush all the buffer pages in memory to the
temporary file, to make room for new buffer pages. But that's a waste of
time, if some of the pages we had in memory belonged to the buffer we're
about to empty next, or that we empty tuples to. Also, if while emptying
a buffer, all the tuples go to just one or two lower level buffers, it
would be beneficial to keep more than one page in-memory for those buffers.

Buffering leaf pages. This I posted on a separate thread already:
http://archives.postgresql.org/message-id/4E5350DB.3060209@enterprisedb.com

Also, at the moment there is one issue with the algorithm that we have
glossed over this far: For each buffer, we keep some information in
memory, in the hash table, and in the auxiliary lists. That means that
the amount of memory needed for the build scales with the size of the
index. If you're dealing with very large indexes, hopefully you also
have a lot of RAM in your system, so I don't think this is a problem in
practice. Still, it would be nice to do something about that. A
straightforward idea would be to swap some of the information to disk.
Another idea that, simpler to implement, would be to completely destroy
a buffer, freeing all the memory it uses, when it becomes completely
empty. Then, if you're about to run out of memory (as defined by
maintenance_work_mem), you can empty some low level buffers to disk to
free up some.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-09-08 17:25:49 Re: Large C files
Previous Message Kohei Kaigai 2011-09-08 15:47:20 Re: force_not_null option support for file_fdw