Re: [PATCHES] GIN improvements

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] GIN improvements
Date: 2008-12-22 01:56:06
Message-ID: 1229910966.2285.76.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Fri, 2008-12-12 at 20:36 +0300, Teodor Sigaev wrote:
> Changes:
> - added vacuum_delay_point() in gininsertcleanup
> - add trigger to run vacuum by number of inserted tuples from
> last vacuum. Number of inserted tuples represents number
> of really inserted tuples in index and it is calculated as
> tuples_inserted + tuples_updated - tuples_hot_updated.
> Trigger fires on instuples > vac_base_thresh because search time is linear
> on number of pending pages (tuples)

Hi,

Comments:

1.

You use something like the following in a few places:

START_CRIT_SECTION();
...
l = PageAddItem(...);
if (l == InvalidOffsetNumber)
elog(ERROR, "failed to add item to index page in \"%s\"",
RelationGetRelationName(index));

It's no use using ERROR, because it will turn into PANIC, which is
obviously unacceptable. It looks to me like those conditions can't
happen anyway, so it's probably better to add a comment explaining why
it can't happen, and Assert().

2. It appears to be properly triggering autovacuum when only inserts
happen, so I think that issue is solved.

3. Simple performance result with autovacuum off:

create table random(i int[]);
insert into random select ARRAY[(random() * 10)::int, (random() *
10)::int, (random() * 10)::int, (random() * 10)::int, (random() *
10)::int, (random() * 10)::int, (random() * 10)::int, (random() *
10)::int, (random() * 10)::int, (random() * 10)::int] from
generate_series(1, 1000000);

\timing on
drop table foo;
create table foo(i int[]);
create index foogin on foo using gin (i);
insert into foo select i from random;
vacuum foo;

Without patch:
INSERT: 71s
VACUUM: 2s
total: 73s

With patch:
INSERT: 33s
VACUUM: 12s
total: 45s

So, there is a performance advantage. This was just a quick test to make
sure the numbers matched my expectations.

4. Heikki mentioned:
http://archives.postgresql.org/pgsql-hackers/2008-11/msg01832.php

"To make things worse, a query will fail if all the matching
fast-inserted tuples don't fit in the non-lossy tid bitmap."

That issue still remains, correct? Is there a resolution to that?

5. I attached a newer version merged with HEAD.

6. You defined:

#define GinPageHasFullRow(page) ( GinPageGetOpaque(page)->flags &
GIN_LIST_FULLROW )

But in many places you still do the same check without using that macro.
The macro has only one call site, so I suggest either removing the macro
entirely, or using it every place you check that flag.

7. I don't understand this chunk of code:

ItemPointerData item = pos->item;

if ( scanGetCandidate(scan, pos) == false || !
ItemPointerEquals(&pos->item, &item) )
elog(ERROR,"Could not process tuple"); /* XXX should not be here !
*/

How can (!ItemPointerEquals(&pos->item, &item)) ever happen?

And how can (scanGetCandidate(scan, pos) == false) ever happen? Should
that be an Assert() instead?

If those can happen during normal operation, then we need a better error
message there.

Regards,
Jeff Davis

Attachment Content-Type Size
gin-fast-insert-20081221.patch text/x-patch 85.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ITAGAKI Takahiro 2008-12-22 02:36:55 Re: generic reloptions improvement
Previous Message Jaime Casanova 2008-12-22 00:22:29 Re: rules regression test failed on mingw

Browse pgsql-patches by date

  From Date Subject
Next Message Heikki Linnakangas 2008-12-22 20:18:57 Re: [PATCHES] Infrastructure changes for recovery (v8)
Previous Message Simon Riggs 2008-12-18 02:50:49 Re: [PATCHES] Infrastructure changes for recovery (v8)