Re: Trying to understand Stats/Query planner issue

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Strange, John W" <john(dot)w(dot)strange(at)jpmorgan(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Trying to understand Stats/Query planner issue
Date: 2011-11-13 16:17:22
Message-ID: 6469.1321201042@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Strange, John W" <john(dot)w(dot)strange(at)jpmorgan(dot)com> writes:
> I have a question on how the analyzer works in this type of scenario.
> We calculate some results and COPY INTO some partitioned tables, which we use some selects to aggregate the data back out into reports. Everyone once in a while the aggregation step picks a bad plan due to stats on the tables that were just populated. Updating the stats and rerunning the query seems to solve the problem, this only happens if we enable nested loop query plans.

Well, even if auto-analyze launches instantly after you commit the
insertions (which it won't), it's going to take time to scan the table
and then commit the updates to pg_statistic. So there is always going
to be some window where queries will get planned with obsolete
information. If you're inserting enough data to materially change the
statistics of a table, and you need to query that table right away,
doing a manual ANALYZE rather than waiting for auto-analyze is
recommended.

> The other option is just to analyze each table involved in the query after the insert, but that seems a bit counterproductive.

Why would you think that? This type of scenario is exactly why ANALYZE
isn't deprecated as a user command.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Venkat Balaji 2011-11-14 05:54:48 Re: : bg_writer overloaded ?
Previous Message Pavel Stehule 2011-11-13 08:57:31 Re: Heavy contgnous load