Re: [PATCHES] GIN improvements

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] GIN improvements
Date: 2009-01-19 16:53:22
Message-ID: 4974B002.3040202@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Changes:
Results of pernding list's scan now are placed directly in resulting
tidbitmap. This saves cycles for filtering results and reduce memory usage.
Also, it allows to not check losiness of tbm.

> Is this a 100% bulletproof solution, or is it still possible for a query
> to fail due to the pending list? It relies on the stats collector, so
> perhaps in rare cases it could still fail?
Yes :(

> Can you explain why the tbm must not be lossy?

The problem with lossy tbm has two aspects:
- amgettuple interface hasn't possibility to work with page-wide result instead
of exact ItemPointer. amgettuple can not return just a block number as
amgetbitmap can.
- Because of concurrent vacuum process: while we scan pending list, it's
content could be transferred into regular structure of index and then we will
find the same tuple twice. Again, amgettuple hasn't protection from that,
only amgetbitmap has it. So, we need to filter results from regular GIN
by results from pending list. ANd for filtering we can't use lossy tbm.

v0.21 prevents from that fail on call of gingetbitmap, because now all results
are collected in single resulting tidbitmap.

> Also, can you clarify why a large update can cause a problem? In the

If query looks like
UPDATE tbl SET col=... WHERE col ... and planner choose GIN indexscan over col
then there is a probability of increasing of pending list over non-lossy limit.

> previous discussion, you suggested that it force normal index inserts
> after a threshold based on work_mem:
>
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00065.php

I see only two guaranteed solution of the problem:
- after limit is reached, force normal index inserts. One of the motivation of
patch was frequent question from users: why update of whole table with GIN index
is so slow? So this way will not resolve this question.
- after limit is reached, force cleanup of pending list by calling
gininsertcleanup. Not very good, because users sometimes will see a huge
execution time of simple insert. Although users who runs a huge update should be
satisfied.

I have difficulties in a choice of way. Seems to me, the better will be second
way: if user gets very long time of insertion then (auto)vacuum of his
installation should tweaked.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
fast_insert_gin-0.21.gz application/x-tar 23.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brendan Jurd 2009-01-19 16:57:46 Re: number of connections
Previous Message Alvaro Herrera 2009-01-19 16:38:26 Re: Fixes for compiler warnings

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2009-01-19 17:13:18 Re: [PATCHES] GIN improvements
Previous Message Jeff Davis 2009-01-19 07:08:25 Re: [PATCHES] GIN improvements