Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

From: Mark <mwchambers(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Date: 2019-01-02 18:04:19
Message-ID: CAFh58O-R=LYKHkMcNVpToa4oyXjPsrrZVKCYCks96q3agDycqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi Tom,

Thanks for your reply.

Am I correct in my understanding that any row that has been modified (i.e.
UPDATE) is in state HEAPTUPLE_INSERT_IN_PROGRESS so it will not be included
in the sample?

I'm going to rework the application so there is less time between the
DELETE and the COMMIT so I will only see the problem if ANALYZE runs during
this smaller time window. Looks like your patch will help if this happens.

Then again, it also seems no one has had a problem with its current
behaviour (for 11 years!).

Thanks,

Mark

On Wed, 2 Jan 2019 at 16:11 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Mark <mwchambers(at)gmail(dot)com> writes:
> > I have a long running job which deletes all of the common_student table
> and
> > then repopulates it. It takes long time to load all the other data and
> > commit the transaction. I didn't think the delete inside the transaction
> > would have any effect until it is commited or rolled back.
> > I will have to rewrite the application so it updates the existing rows
> > rather than deleting all and then inserting.
>
> Hmm ... I'm not sure if that will actually make things better. The root
> of the issue is what analyze.c does with DELETE_IN_PROGRESS tuples:
>
> * We count delete-in-progress rows as still live,
> using
> * the same reasoning given above; but we don't bother
> to
> * include them in the sample.
>
> The "reasoning given above" is a reference to what happens for
> INSERT_IN_PROGRESS tuples:
>
> * Insert-in-progress rows are not counted. We assume
> * that when the inserting transaction commits or
> aborts,
> * it will send a stats message to increment the proper
> * count. This works right only if that transaction
> ends
> * after we finish analyzing the table; if things
> happen
> * in the other order, its stats update will be
> * overwritten by ours. However, the error will be
> large
> * only if the other transaction runs long enough to
> * insert many tuples, so assuming it will finish
> after us
> * is the safer option.
>
> Now the problem with this, from your perspective, is that *neither*
> INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples are included in
> ANALYZE's data sample. So a long-running update transaction will
> cause all the rows it changes to be excluded from the sample until
> commit. This seems like a bad thing, and it definitely means that
> adjusting your app as you're contemplating won't help.
>
> In order to handle the bulk-update scenario sanely, it seems like
> we ought to allow one of INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples
> to be included. And it looks like, for consistency with the row-counting
> logic, the one that's included needs to be DELETE_IN_PROGRESS. This
> is a slightly annoying conclusion, because it means we're accumulating
> stats that we know are likely to soon be obsolete, but I think sampling
> INSERT_IN_PROGRESS tuples instead would lead to very strange results
> in some cases.
>
> In short, I think we want to do this to the DELETE_IN_PROGRESS logic:
>
> if
> (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
> deadrows += 1;
> else
> + {
> + sample_it = true;
> liverows += 1;
> + }
>
> with suitable adjustment of the adjacent comment.
>
> Thoughts?
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rich Shepard 2019-01-02 18:05:58 Re: Implementing standard SQL's DOMAIN constraint [RESOLVED]
Previous Message David G. Johnston 2019-01-02 18:02:24 Re: Implementing standard SQL's DOMAIN constraint

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-02 18:19:25 Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Previous Message Hugh Ranalli 2019-01-02 17:58:03 Re: BUG #15548: Unaccent does not remove combining diacritical characters