Re: huge discrepancy between EXPLAIN cost and actual time (but the table has just been ANALYZED)

From: Kirk Wolak <wolakk(at)gmail(dot)com>
To: Kent Tong <kent(dot)tong(dot)mo(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: huge discrepancy between EXPLAIN cost and actual time (but the table has just been ANALYZED)
Date: 2023-05-11 23:13:53
Message-ID: CACLU5mT37ZE4S9AJYwHs5aZ6rmRPobNERpAgzuHX1u1otJWqLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, May 8, 2023 at 8:29 AM Kent Tong <kent(dot)tong(dot)mo(at)gmail(dot)com> wrote:

> Hi,
>
> I have a complex query involving over 15 joins and a CTE query and it
> takes over 17s to complete. The output of EXPLAIN ANALYZE includes
> (somewhere deep inside):
>
> Index Scan using document_pkey on document document0_ (cost=0.29..8.31
> rows=1 width=3027) (actual time=16243.959..16243.961 rows=1 loops=1)
>
>
> This shows an index scan with a very small cost but a very large actual
> time. The strange thing is, all the tables have just been analyzed with the
> ANALYZE command (it is not a foreign table). Furthermore, if I run a simple
> query using that index, both the cost and the actual time are small.
>
> Another snippet is:
>
>
>
> -> CTE Scan on all_related_document p (cost=1815513.32..3030511.77
> rows=241785 width=16) (actual time=203.969..203.976 rows=0 loops=1)
>
>
> I think the cost-actual time discrepancy is fine as it is a recursive CTE
> so postgresql can't estimate the cost well. It is materialized and a full
> table scan is performed. However, the actual time is not that bad. Also,
> the estimated rows and the actual rows are also vastly different, but I
> guess this is fine, isn't it?
>
> Any idea how I should check further?
>
> Many thanks in advance
>
> --
> Kent Tong
> IT author and consultant, child education coach
>

Kent,
I had a really slow CTE based query a while back. The problem went away
when I materialized the query.
In our case, we had carried this query from another big DB, and used the
CTE query to force it to be evaluated once.
It is known to be slow, and about 1% of the size of the other tables we
were operating on.

PG, by default, optimized this, and inlined it as a subquery. Causing a
massive slowdown. Materialize fixed it right up.
That would be a quick test.

Outside of that, I agree with Tom... it's really hard to help without
full details.

>>> FWIW, if I had my way, THIS would work:
EXPLAIN (ANALYZE, BUFFERS, SETTINGS, RELATIONS_DDL, SHOW_QUERY) <query>;

Would Dump the table structures of all involved tables/views/indexes. It
would repeat the query.
And in a perfect world, it would show you the "rewritten" query (now I am
dreaming).
All with the plan...

But it will take a while to get the level of detail to come out... But the
AIs/ML will go crazy with it!

This way you just copy that output and share it. (Eventually...)

Anyways, in the meantime, a query and the table structure/row counts would
be nice.

Kirk...

In response to

Browse pgsql-general by date

  From Date Subject
Next Message FOUTE K. Jaurès 2023-05-12 04:50:11 Re: Materialized Views - Way to refresh automatically (Incrementaly)
Previous Message Kirk Wolak 2023-05-11 23:00:03 Re: "PANIC: could not open critical system index 2662" - twice