Re: GIN fast insert

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN fast insert
Date: 2009-02-24 05:18:25
Message-ID: 603c8f070902232118u439bc042g4acadfcf78162968@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 23, 2009 at 1:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Feb 23, 2009 at 10:05 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Actually, I'm going to *insist* that we lose the index AM scan
>>> altogether.
>
>> Except that the "inessential" feature in question is a feature that
>> currently WORKS, and I don't believe that the testing you've done is
>> anywhere near sufficient to show that no one will be upset if it goes
>> away.
>
> What feature is that --- the ability to get an undefined subset of rows
> quickly by using LIMIT without ORDER BY?  Not much of a feature.

That's a huge over-generalization of the effect of removing index
scans, as I am sure you are well aware. It only took me about 5
minutes to come up with a test case against CVS HEAD where disabling
index scans resulted in a significant dropoff in performance. Here it
is:

create table foo (id serial, x int[], primary key (id));
create index foo_gin on foo using gin (x);
insert into foo (x) select array[random()*10000::integer,
random()*10000::integer, random()*10000::integer,
random()*10000::integer] from generate_series(1,10000);
analyze foo;

OK, now here's the query:

select sum(1) from generate_series(1,10000) g left join foo on
array[g] <@ x where x is null;

On my system this takes about 45 ms to execute with default settings
and about 90 ms to execute with index scan disabled.

>> Without some convincing evidence to support that proposition, I
>> think it would be better to postpone the whole patch to 8.5 and use
>> that time to fix the problem,
>
> Wouldn't bother me any.  We are way overdue for 8.4 already.

I completely agree, but there are four patches on the CommitFest wiki
that still need some committer attention before we close up shop:

B-Tree Emulation for GIN
Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Proposal of PITR performance improvement
SE-PostgreSQL Lite

I have reviewed the second of these already and believe it's in pretty
good shape and may review one or more of the others as time permits,
especially if you or one of the other committers express an opinion
that it would be helpful for me to do that and especially if you
express an opinion on which of them it would be most helpful for.

It is well past time to move "Reducing some DDL Locks to ShareLock" to
committed and leave the unapplied portions for 8.5. As much as I
would like to have the feature, it is probably also time to think
about punting "Hot Standby" unless it's going to get committed RSN.
At this point, we are definitely holding up both the release of 8.4
and development for 8.5.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message K, Niranjan (NSN - IN/Bangalore) 2009-02-24 06:47:30 Synchronous replication & Hot standby patches
Previous Message pi song 2009-02-24 04:20:33 Re: Hadoop backend?