Re: Use simplehash.h instead of dynahash in SMgr

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use simplehash.h instead of dynahash in SMgr
Date: 2021-06-23 00:17:17
Message-ID: CA+hUKGJVs32f68JXnQrdTrfbpmh5-V9p_76Knf6vVzoVTUuQtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 22, 2021 at 6:51 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I guess we could also ask ourselves how many join algorithms we need.

David and I discussed this a bit off-list, and I just wanted to share
how I understand the idea so far in case it helps someone else. There
are essentially three subcomponents working together:

1. A data structure similar in some ways to a C++ std::deque<T>,
which gives O(1) access to elements by index, is densely packed to
enable cache-friendly scanning of all elements, has stable addresses
(as long as you only add new elements at the end or overwrite existing
slots), and is internally backed by an array of pointers to a set of
chunks.

2. A bitmapset that tracks unused elements in 1, making it easy to
find the lowest-index hole when looking for a place to put a new one
by linear search for a 1 bit, so that we tend towards maximum density
despite having random frees from time to time (seems good, the same
idea is used in kernels to allocate the lowest unused file descriptor
number).

3. A hash table that has as elements indexes into 1. It somehow hides
the difference between keys (what callers look things up with) and
keys reachable by following an index into 1 (where elements' keys
live).

One thought is that you could do 1 as a separate component as the
"primary" data structure, and use a plain old simplehash for 3 as a
kind of index into it, but use pointers (rather than indexes) to
objects in 1 as elements. I don't know if it's better.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2021-06-23 00:31:25 Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Previous Message Peter Geoghegan 2021-06-23 00:11:41 Re: Maintaining a list of pgindent commits for "git blame" to ignore