From: | Ivan Voras <ivoras(at)gmail(dot)com> |
---|---|
To: | "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu> |
Cc: | postgres performance list <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Indexes for hashes |
Date: | 2016-06-15 13:09:04 |
Message-ID: | CAF-QHFVq9XFAqA2GhT15btmvQBErNjLpaChPrx3BGmTRpu320w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 15 June 2016 at 15:03, ktm(at)rice(dot)edu <ktm(at)rice(dot)edu> wrote:
> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
> > Hi,
> >
> > I have an application which stores a large amounts of hex-encoded hash
> > strings (nearly 100 GB of them), which means:
> >
> > - The number of distinct characters (alphabet) is limited to 16
> > - Each string is of the same length, 64 characters
> > - The strings are essentially random
> >
> > Creating a B-Tree index on this results in the index size being larger
> than
> > the table itself, and there are disk space constraints.
> >
> > I've found the SP-GIST radix tree index, and thought it could be a good
> > match for the data because of the above constraints. An attempt to create
> > it (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes
> more
> > than 12 hours (while a similar B-tree index takes a few hours at most),
> so
> > I've interrupted it because "it probably is not going to finish in a
> > reasonable time". Some slides I found on the spgist index allude that
> both
> > build time and size are not really suitable for this purpose.
> >
> > My question is: what would be the most size-efficient index for this
> > situation?
>
> Hi Ivan,
>
> If the strings are really random, then maybe a function index on the first
> 4, 8, or 16 characters could be used to narrow the search space and not
> need
> to index all 64. If they are not "good" random numbers, you could use a
> hash
> index on the strings. It will be much smaller since it currently uses a
> 32-bit
> hash. It has a number of caveats and is not currently crash-safe, but it
> seems
> like it might work in your environment. You can also use a functional
> index on
> a hash-function applied to your values with a btree to give you crash
> safety.
>
>
Hi,
I figured the hash index might be helpful and I've tried it in the
meantime: on one of the smaller tables (which is 51 GB in size), a btree
index is 32 GB, while the hash index is 22 GB (so btree is around 45%
larger).
I don't suppose there's an effort in progress to make hash indexes use WAL?
:D
From | Date | Subject | |
---|---|---|---|
Next Message | ktm@rice.edu | 2016-06-15 13:16:14 | Re: Indexes for hashes |
Previous Message | ktm@rice.edu | 2016-06-15 13:03:07 | Re: Indexes for hashes |