Re: BUG #16280: dead tuples (probably) effect plan and query performance

From: James Coleman <jtc331(at)gmail(dot)com>
To: "Serbin, Ilya" <iserbin(at)bostonsd(dot)ru>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16280: dead tuples (probably) effect plan and query performance
Date: 2020-03-11 13:01:16
Message-ID: CAAaqYe96jM14OuvjYh2bb2EyR=bCjWGwB4sVEFFM_sVMhHfB6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Mar 11, 2020 at 6:51 AM Serbin, Ilya <iserbin(at)bostonsd(dot)ru> wrote:
>
> Hello,
> Asking again just in case someone can help to find out why is that happening in my case and if it is a bug.
>
> пт, 28 февр. 2020 г. в 12:25, Serbin, Ilya <iserbin(at)bostonsd(dot)ru>:
>>
>> Hello, Tom!
>> Thanks for your answer. My concern is that plan changes after a relatively small number of dead tuples. Bad plan is being generated when table1 contain 300-400 dead tuples. It is only 0.07%-0.08% of the whole table (409k+ entries).
>> In addition, table is growing and currently there are 425k of entries. However even on 425k size table plan stays good until number of dead tuples reaches 300-400, аfter that plan changes to the bad one.
>> As I said, I tried analyzing table with various default_statistics_target (100-1000 with step of 100) - plan stays bad. Tried setting random_page_cost=0.1 and seq_page_cost=1 (2, 3, 4, etc). Plan changed to good one only starting from random_page_cost=0.1 and seq_page_cost=8. However, once I ran vacuum - plan changed to good one and stayed the same even when I set random_page_cost=30;set seq_page_cost=1;
>> I realize that I can set autovacuum thresholds for this table to trigger it once dead tuples reach 300, but it doesn't seem right to me (this number of changes happens in something like 5 minutes and tables is ~2GB size as of now).
>>
>> Why does such a small amount (0.07%) of dead tuples changes cost estimations so dramatically? Or am I missing something and dead tuples has nothing to do with it?

That looks like an almost impossible query to plan correctly -- the
planner would need some kind of stats on fields internal to the json
blob, but it can't have that since the blob is necessarily schema less
(at least from PG's perspective). That's why Tom is wondering about
making the representation map more closely to a SQL/relational table
model with discrete columns.

Alternatively have you tried adding function indexes on the (it looks
like) ~2 paths you're querying in the JSON field?

It's not impossible there's a bug in the planner here, but the
opaqueness of this query to the planner is why that's not anyone's
first assumption. To show otherwise someone would have to step through
the execution and show that it's not just reaching some cutoff (that
makes somewhat reasonable sense for how hard this query is to plan) or
that the cutoff is being calculated incorrectly.

James

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2020-03-11 14:06:43 Re: Optimizer Doesn't Push Down Where Expressions on Rollups
Previous Message Serbin, Ilya 2020-03-11 10:50:41 Re: BUG #16280: dead tuples (probably) effect plan and query performance