Re: Hash tables in dynamic shared memory

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: John Gorman <johngorman2(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash tables in dynamic shared memory
Date: 2016-12-18 23:33:24
Message-ID: CAEepm=1toYCwXE3z-CCQqXE4gWCrJ3BKsMUo1048BZ7r+byeow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 20, 2016 at 11:54 AM, John Gorman <johngorman2(at)gmail(dot)com> wrote:
> I reviewed the dht-v2.patch and found that it is in excellent shape.

Thanks for reviewing! And sorry for the late reply.

> Benchmarking shows that this performs somewhat faster than
> dynahash which is surprising because it is doing DSA address
> translations on the fly.

That is indeed surprising and I think warrants more investigation.

> One area where this could run faster is to reduce the amount of
> time when the hash is in the RESIZE_IN_PROGRESS state.
>
> When a hash partition reaches 75% full a resize begins by allocating
> an empty new hash bucket array with double the number of buckets.
> This begins the RESIZE_IN_PROGRESS state where there is both an
> old and a new hash bucket array.
>
> During the RESIZE state all operations such as find, insert,
> delete and iterate run slower due to having to look items up in
> two hash bucket arrays.
>
> Whenever a new item is inserted into the hash and the hash is in
> the RESIZE state the current code takes the time to move one item
> from the old hash partition to the new hash partition. During
> insertion an exclusive lock is already held on the partition so
> this is an efficient time to do the resize cleanup work.
>
> However since we only clean up one item per insertion we do not
> complete the cleanup and free the old hash bucket array until we
> are due to start yet another resize operation.
>
> So we are effectively always in the RESIZE state which slows down
> operations and wastes some space.

Right, that is the case in a workload that inserts a bunch of data but
then becomes read-only forever. A workload that constantly does a mix
of writing and reading should settle down at a reasonable size and
stop resizing.

> Here are the timings for insert and find in nanoseconds on a
> Macbook Pro. The insert_resize figure includes the resize work to
> move one item from the old to the new hash bucket array.
>
> insert_resize: 945.98
> insert_clean: 450.39
> find_resize: 348.90
> find_clean: 293.16
>
> The attached dht-v2-resize-cleanup.patch patch applies on top of
> the dht-v2.patch and speeds up the resize cleanup process so that
> hashes spend most of their time in the clean state.
>
> It does this by cleaning up more than one old item during
> inserts. This patch cleans up three old items.
>
> There is also the case where a hash receives all of its inserts
> at the beginning and the rest of the work load is all finds. This
> patch also cleans up two items for each find.
>
> The downside of doing extra cleanup early is some additional
> latency. Here are the experimental figures and the approximate
> formulas for different numbers of cleanups for inserts. Note that
> we cannot go lower than one cleanup per insert.
>
> Resize work in inserts: 729.79 insert + 216.19 * cleanups
> 1 945.98
> 2 1178.13
> 3 1388.73
> 4 1617.04
> 5 1796.91
>
> Here are the timings for different numbers of cleanups for finds.
> Note that since we do not hold an exclusive partition lock on a find
> there is the additional overhead of taking that lock.
>
> Resize work in finds: 348.90 dirty_find + 233.45 lock + 275.78 * cleanups
> 0 348.90
> 1 872.88
> 2 1138.98
> 3 1448.94
> 4 1665.31
> 5 1975.91
>
> The new effective latencies during the resize vs. the clean states.
>
> #define DHT_INSERT_CLEANS 3
> #define DHT_SEARCH_CLEANS 2
>
> insert_resize: 1388.73 -- 3 cleaned
> insert_clean: 450.39
> find_resize: 1138.98 -- 2 cleaned
> find_clean: 293.16
>
> The overall performance will be faster due to not having to examine
> more than one hash bucket array most of the time.

Thanks for doing all these experiments. Yeah, I think it makes sense
to merge this change to improve that case.

Since Dilip Kumar's Parallel Bitmap Heap Scan project is no longer
using this, I think I should park it here unless/until another
potential use case pops up. Some interesting candidates have been
mentioned already, and I'm fairly sure there are other uses too, but I
don't expect anyone to be interested in committing this patch until
something concrete shows up, so I'll go work on other things until
then.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2016-12-19 00:38:54 Re: delta relations in AFTER triggers
Previous Message David Fetter 2016-12-18 23:21:09 Re: pg_background contrib module proposal