Experimenting with hash tables inside pg_dump

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Experimenting with hash tables inside pg_dump
Date: 2021-10-21 22:27:25
Message-ID: 2595220.1634855245@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Today, pg_dump does a lot of internal lookups via binary search
in presorted arrays. I thought it might improve matters
to replace those binary searches with hash tables, theoretically
converting O(log N) searches into O(1) searches. So I tried making
a hash table indexed by CatalogId (tableoid+oid) with simplehash.h,
and replacing as many data structures as I could with that.

This makes the code shorter and (IMO anyway) cleaner, but

(a) the executable size increases by a few KB --- apparently, even
the minimum subset of simplehash.h's functionality is code-wasteful.

(b) I couldn't measure any change in performance at all. I tried
it on the regression database and on a toy DB with 10000 simple
tables. Maybe on a really large DB you'd notice some difference,
but I'm not very optimistic now.

So this experiment feels like a failure, but I thought I'd post
the patch and results for the archives' sake. Maybe somebody
will think of a way to improve matters. Or maybe it's worth
doing just to shorten the code?

regards, tom lane

Attachment Content-Type Size
use-simplehash-in-pg-dump-1.patch text/x-diff 24.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-10-21 23:04:34 Re: CREATEROLE and role ownership hierarchies
Previous Message Bossart, Nathan 2021-10-21 22:23:20 Re: Fixing WAL instability in various TAP tests