Re: Specifying many rows in a table

From: Steve Atkins <steve(at)blighty(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Specifying many rows in a table
Date: 2004-01-29 00:14:21
Message-ID: 20040129001421.GB7707@gp.word-to-the-wise.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Jan 28, 2004 at 06:13:36PM -0500, Tom Lane wrote:
> "scott.marlowe" <scott(dot)marlowe(at)ihs(dot)com> writes:
> > You could try inserting those
> > numbers into a temp table and subselecting it.
>
> That's what I'd try too; it should work reasonably well in 7.4. Two
> thoughts:
>
> 1. Be sure to "analyze" the temp table after filling it. The planner
> must know the number of rows in the table to have any shot at choosing a
> good plan.
>
> 2. If you know that the set of numbers you have emitted are all
> different, don't use IN at all, but a straight join to the temp table:
>
> select table.* from table, tmp where id = tmp.id;
>
> With IN the system will have to work harder to eliminate duplicates from
> the temp table; why make it do that if you know it's not necessary?

Yup.

I'm trying a number of variants (in(list), in(select), exists, join) and
benchmarking all of them across a range of numbers, randomly chosen or
clustered with appropriate splashing of the DB and disk buffers between
runs.

Once the perl script has finished I'll post a pointer to the data.

An interesting thing, though is that for 10,000,000 rows in big and
one row in little, everything indexed and analyzed an exists query
is painfully bad...

select * from big where exists (select 1 from little where s=big.id)

...expands to nested seqscans over big and little, rather than the
seqscan over little with a nested index scan over big I was expecting.

I'm pretty sure that in(list) will be a clear win for a small number
of rows, and a simple join will win for a larger number, but I'm
benchmarking the others out of random interest.

Cheers,
Steve

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joe Conway 2004-01-29 00:55:17 Re: [GENERAL] SELECT Question
Previous Message Kris Jurka 2004-01-28 23:35:20 Re: Is autocommit=true bad?