Re: query plan question

From: "David Parker" <dparker(at)tazznetworks(dot)com>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jeff" <threshar(at)torgo(dot)978(dot)org>, "Russell Smith" <mr-russ(at)pws(dot)com(dot)au>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query plan question
Date: 2004-11-18 16:12:12
Message-ID: 07FDEE0ED7455A48AC42AC2070EDFF7C26BAC3@corpsrv2.tazznetworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

What I think is happening with the missing pg_statistic entries:

The install of our application involves a lot of data importing (via
JDBC) in one large transaction, which can take up to 30 minutes. (I
realize I left out this key piece of info in my original post...)

The pg_autovacuum logic is relying on data from pg_stat_all_tables to
make the decision about running analyze. As far as I can tell, the data
in this view gets updated outside of the transaction, because I saw the
numbers growing while I was importing. I saw pg_autovacuum log messages
for running analyze on several tables, but no statistics data showed up
for these, I assume because the actual data in the table wasn't yet
visible to pg_autovacuum because the import transaction had not finished
yet.

When the import finished, not all of the tables affected by the import
were re-visited because they had not bumped up over the threshold again,
even though the analyze run for those tables had not generated any stats
because of the still-open transaction.

Am I making the correct assumptions about the way the various pieces
work? Does this scenario make sense?

It's easy enough for us to kick off a vacuum/analyze at the end of a
long import - but this "mysterious" behavior was bugging me!

Thanks.

- DAP

>-----Original Message-----
>From: Matthew T. O'Connor [mailto:matthew(at)zeut(dot)net]
>Sent: Wednesday, November 17, 2004 2:02 PM
>To: David Parker
>Cc: Tom Lane; Jeff; Russell Smith; pgsql-performance(at)postgresql(dot)org
>Subject: Re: [PERFORM] query plan question
>
>Well based on the autovacuum log that you attached, all of
>those tables
>are insert only (at least during the time period included in
>the log.
>Is that correct? If so, autovacuum will never do a vacuum
>(unless required by xid wraparound issues) on those tables.
>So this doesn't appear to be an autovacuum problem. I'm not
>sure about the missing pg_statistic entries anyone else care
>to field that one?
>
>Matthew
>
>
>David Parker wrote:
>
>>Thanks. The tables I'm concerned with are named: 'schema', 'usage',
>>'usageparameter', and 'flow'. It looks like autovacuum is performing
>>analyzes:
>>
>>% grep "Performing: " logs/.db.tazz.vacuum.log
>>[2004-11-17 12:05:58 PM] Performing: ANALYZE
>>"public"."scriptlibrary_library"
>>[2004-11-17 12:15:59 PM] Performing: ANALYZE
>>"public"."scriptlibraryparm"
>>[2004-11-17 12:15:59 PM] Performing: ANALYZE "public"."usageparameter"
>>[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageproperty"
>>[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."route"
>>[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageparameter"
>>[2004-11-17 12:21:00 PM] Performing: ANALYZE
>>"public"."scriptlibrary_library"
>>[2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usage"
>>[2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usageparameter"
>>[2004-11-17 12:31:04 PM] Performing: ANALYZE "public"."usageproperty"
>>[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."route"
>>[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."service_usage"
>>[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."usageparameter"
>>
>>But when I run the following:
>>
>>select * from pg_statistic where starelid in (select oid from
>pg_class
>>where relname in
>>('schema','usageparameter','flow','usage'))
>>
>>it returns no records. Shouldn't it? It doesn't appear to be doing a
>>vacuum anywhere, which makes sense because none of these tables have
>>over the default threshold of 1000. Are there statistics
>which only get
>>generated by vacuum?
>>
>>I've attached a gzip of the pg_autovacuum log file, with -d 3.
>>
>>Thanks again.
>>
>>- DAP
>>
>>
>>
>>
>>>-----Original Message-----
>>>From: Matthew T. O'Connor [mailto:matthew(at)zeut(dot)net]
>>>Sent: Wednesday, November 17, 2004 11:41 AM
>>>To: David Parker
>>>Cc: Tom Lane; Jeff; Russell Smith; pgsql-performance(at)postgresql(dot)org
>>>Subject: Re: [PERFORM] query plan question
>>>
>>>David Parker wrote:
>>>
>>>
>>>
>>>>We're using postgresql 7.4.5. I've only recently put
>pg_autovacuum in
>>>>place as part of our installation, and I'm basically taking the
>>>>defaults. I doubt it's a problem with autovacuum itself, but rather
>>>>with my configuration of it. I have some reading to do, so
>>>>
>>>>
>>>any pointers
>>>
>>>
>>>>to existing autovacuum threads would be greatly appreciated!
>>>>
>>>>
>>>>
>>>Well the first thing to do is increase the verbosity of the
>>>pg_autovacuum logging output. If you use -d2 or higher,
>pg_autovacuum
>>>will print out a lot of detail on what it thinks the thresholds are
>>>and
>>>why it is or isn't performing vacuums and analyzes. Attach
>>>some of the
>>>log and I'll take a look at it.
>>>
>>>
>>>
>
>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-11-18 16:14:41 Re: plpgsql on 8.0b4 bug?
Previous Message Tom Lane 2004-11-18 15:45:53 Re: invalid page header

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2004-11-18 16:43:12 Timing of pgstats updates
Previous Message Andrew Janian 2004-11-18 15:57:17 Re: Query Performance and IOWait