Re: temporary tables, indexes, and query plans

From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: temporary tables, indexes, and query plans
Date: 2010-10-27 19:29:08
Message-ID: AANLkTinSVjip_UkKG69LsiuQ0=epDvA-daXC-Ec+76Q7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Oct 27, 2010 at 1:52 PM, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> wrote:
> On Wed, Oct 27, 2010 at 1:32 PM, Reid Thompson <Reid(dot)Thompson(at)ateb(dot)com> wrote:
>> On Wed, 2010-10-27 at 13:23 -0500, Jon Nelson wrote:
>>> set it to 500 and restarted postgres.
>>
>> did you re-analyze?
>
> Not recently. I tried that, initially, and there was no improvement.
> I'll try it again now that I've set the stats to 500.
> The most recent experiment shows me that, unless I create whatever
> indexes I would like to see used *before* the large (first) update,
> then they just don't get used. At all. Why would I need to ANALYZE the
> table immediately following index creation? Isn't that part of the
> index creation process?
>
> Currently executing is a test where I place an "ANALYZE foo" after the
> COPY, first UPDATE, and first index, but before the other (much
> smaller) updates.
>
> ..
>
> Nope. The ANALYZE made no difference. This is what I just ran:
>
> BEGIN;
> CREATE TEMPORARY TABLE foo
> COPY ...
> UPDATE ... -- 1/3 of table, approx
> CREATE INDEX foo_rowB_idx on foo (rowB);
> ANALYZE ...
> -- queries from here to 'killed' use WHERE rowB = 'someval'
> UPDATE ... -- 7 rows. seq scan!
> UPDATE ... -- 242 rows, seq scan!
> UPDATE .. -- 3700 rows, seq scan!
> UPDATE .. -- 3100 rows, seq scan!
> killed.
>

Even generating the index beforehand (sans ANALYZE) was no help.
If I generate *all* of the indexes ahead of time, before the COPY,
that's the only time index usage jives with my expectations.

Here is an example of the output from auto analyze (NOTE: the WHERE
clause in this statement specifies a single value in the same column
that has a UNIQUE index on it):

Seq Scan on foo_table (cost=0.00..289897.04 rows=37589 width=486)

and yet the actual row count is exactly 1.

If I change the order so that the index creation *and* analyze happen
*before* the first (large) update, then things appear to proceed
normally and the indexes are used when expected, although in some
cases the stats are still way off:

Bitmap Heap Scan on foo_table (cost=40.96..7420.39 rows=1999 width=158)

and yet there are only 7 rows that match. The others seem closer (only
off by 2x rather than 250x).

It seems as though creating an index is not enough. It seems as though
ANALYZE after index creation is not enough, either. I am theorizing
that I have to touch (or just scan?) some percentage of the table in
order for the index to be used? If that's true, then what is ANALYZE
for? I've got the stats cranked up to 500. Should I try 1000?

Jason Pitts:
RE: changing default_statistics_target (or via ALTER TABLE SET STATS)
not taking effect until ANALYZE is performed.

I did already know that, but it's probably good to put into this
thread. However, you'll note that this is a temporary table created at
the beginning of a transaction.

--
Jon

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2010-10-27 19:43:13 Re: temporary tables, indexes, and query plans
Previous Message Scott Carey 2010-10-27 19:25:42 Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?