Skip site navigation (1) Skip section navigation (2)

Re: [PATCHES] GIN improvements

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] GIN improvements
Date: 2009-01-19 16:53:22
Message-ID: 4974B002.3040202@sigaev.ru (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
Changes:
  Results of pernding list's scan now are placed directly in resulting 
tidbitmap. This saves cycles for filtering results and reduce memory usage. 
Also, it allows to not check losiness of tbm.


> Is this a 100% bulletproof solution, or is it still possible for a query
> to fail due to the pending list? It relies on the stats collector, so
> perhaps in rare cases it could still fail?
Yes :(

> Can you explain why the tbm must not be lossy?

The problem with lossy tbm has two aspects:
  - amgettuple interface hasn't possibility to work with page-wide result instead
    of exact ItemPointer. amgettuple can not return just a block number as
    amgetbitmap can.
  - Because of concurrent vacuum process: while we scan pending list, it's
    content could be transferred into regular structure of index and then we will
    find the same tuple twice. Again, amgettuple hasn't protection from that,
    only amgetbitmap has it. So, we need to filter results from regular GIN
    by results from pending list. ANd for filtering we can't use lossy tbm.

v0.21 prevents from that fail on call of gingetbitmap, because now all results 
are collected in single resulting tidbitmap.



> Also, can you clarify why a large update can cause a problem? In the

If query looks like
UPDATE tbl SET col=... WHERE col ... and planner choose GIN indexscan over col 
then there is a probability of increasing of pending list over non-lossy limit.


> previous discussion, you suggested that it force normal index inserts
> after a threshold based on work_mem:
> 
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00065.php

I see only two guaranteed solution of the problem:
- after limit is reached, force normal index inserts. One of the motivation of 
patch was frequent question from users: why update of whole table with GIN index 
is so slow? So this way will not resolve this question.
- after limit is reached, force cleanup of pending list by calling 
gininsertcleanup. Not very good, because users sometimes will see a huge 
execution time of simple insert. Although users who runs a huge update should be 
satisfied.

I have difficulties in a choice of way. Seems to me, the better will be second 
way: if user gets very long time of insertion then (auto)vacuum of his 
installation should tweaked.


-- 
Teodor Sigaev                                   E-mail: teodor(at)sigaev(dot)ru
                                                    WWW: http://www.sigaev.ru/

Attachment: fast_insert_gin-0.21.gz
Description: application/x-tar (23.3 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Brendan JurdDate: 2009-01-19 16:57:46
Subject: Re: number of connections
Previous:From: Alvaro HerreraDate: 2009-01-19 16:38:26
Subject: Re: Fixes for compiler warnings

pgsql-patches by date

Next:From: Alvaro HerreraDate: 2009-01-19 17:13:18
Subject: Re: [PATCHES] GIN improvements
Previous:From: Jeff DavisDate: 2009-01-19 07:08:25
Subject: Re: [PATCHES] GIN improvements

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group