Re: GIN improvements part 1: additional information

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part 1: additional information
Date: 2014-01-08 21:58:39
Message-ID: CAPpHfduBDxQ7Tw5bZCCgxmJzNHDRG6eELAuseuCEvKt=dw0Yfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 6, 2014 at 12:35 PM, Amit Langote <amitlangote09(at)gmail(dot)com>wrote:

> On Sat, Dec 21, 2013 at 4:36 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
> >
> > Yet another version. The encoding/decoding code is now quite isolated in
> > ginpostinglist.c, so it's easy to experiment with different encodings.
> This
> > patch uses varbyte encoding again.
> >
> > I got a bit carried away, experimented with a bunch of different
> encodings.
> > I tried rice encoding, rice encoding with block and offset number delta
> > stored separately, the simple9 variant, and varbyte encoding.
> >
> > The compressed size obviously depends a lot on the distribution of the
> > items, but in the test set I used, the differences between different
> > encodings were quite small.
> >
> > One fatal problem with many encodings is VACUUM. If a page is completely
> > full and you remove one item, the result must still fit. In other words,
> > removing an item must never enlarge the space needed. Otherwise we have
> to
> > be able to split on vacuum, which adds a lot of code, and also makes it
> > possible for VACUUM to fail if there is no disk space left. That's
> > unpleasant if you're trying to run VACUUM to release disk space. (gin
> fast
> > updates already has that problem BTW, but let's not make it worse)
> >
> > I believe that eliminates all encodings in the Simple family, as well as
> > PForDelta, and surprisingly also Rice encoding. For example, if you have
> > three items in consecutive offsets, the differences between them are
> encoded
> > as 11 in rice encoding. If you remove the middle item, the encoding for
> the
> > next item becomes 010, which takes more space than the original.
> >
> > AFAICS varbyte encoding is safe from that. (a formal proof would be nice
> > though).
> >
> > So, I'm happy to go with varbyte encoding now, indeed I don't think we
> have
> > much choice unless someone can come up with an alternative that's
> > VACUUM-safe. I have to put this patch aside for a while now, I spent a
> lot
> > more time on these encoding experiments than I intended. If you could
> take a
> > look at this latest version, spend some time reviewing it and cleaning up
> > any obsolete comments, and re-run the performance tests you did earlier,
> > that would be great. One thing I'm slightly worried about is the
> overhead of
> > merging the compressed and uncompressed posting lists in a scan. This
> patch
> > will be in good shape for the final commitfest, or even before that.
> >
>
>
> I just tried out the patch "gin-packed-postinglists-varbyte2.patch"
> (which looks like the latest one in this thread) as follows:
>
> 1) Applied patch to the HEAD (on commit
> 94b899b829657332bda856ac3f06153d09077bd1)
> 2) Created a test table and index
>
> create table test (a text);
> copy test from '/usr/share/dict/words';
> create index test_trgm_idx on test using gin (a gin_trgm_ops);
>
> 3) Got the following error on a wildcard query:
>
> postgres=# explain (buffers, analyze) select count(*) from test where
> a like '%tio%';
> ERROR: lock 9447 is not held
> STATEMENT: explain (buffers, analyze) select count(*) from test where
> a like '%tio%';
> ERROR: lock 9447 is not held
>

Thanks for reporting. Fixed version is attached.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gin-packed-postinglists-varbyte3.patch.gz application/x-gzip 29.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2014-01-08 22:04:10 commit fest manager?
Previous Message Kevin Grittner 2014-01-08 21:57:20 Re: Standalone synchronous master