Re: strange slow query - lost lot of time somewhere

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: strange slow query - lost lot of time somewhere
Date: 2022-05-04 18:38:25
Message-ID: CAFj8pRAmwD50zZH2CH=P2Os-gHx+1vk2hrROS8N=cziAnnwM5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

st 4. 5. 2022 v 2:15 odesílatel David Rowley <dgrowleyml(at)gmail(dot)com> napsal:

> On Tue, 3 May 2022 at 17:02, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
> > út 3. 5. 2022 v 6:57 odesílatel Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> napsal:
> >> You sure there's not something taking an exclusive lock on one of these
> >> tables every so often?
> >
> > I am almost sure, I can see this issue only every time when I set a
> higher work mem. I don't see this issue in other cases.
>
> hmm, I don't think the query being blocked on a table lock would cause
> this behaviour. As far as I know, all table locks should be obtained
> before the timer starts for the "Execution Time" timer in EXPLAIN
> ANALYZE. However, locks are obtained on indexes at executor startup,
> so if there was some delay in obtaining a lock on the index it would
> show up this way. I just don't know of anything that obtains a
> conflicting lock on an index without the same conflicting lock on the
> table that the index belongs to.
>
> I do agree that the perf report does indicate that the extra time is
> taken due to some large amount of memory being allocated. I just can't
> quite see how that would happen in Memoize given that
> estimate_num_groups() clamps the distinct estimate as the number of
> input rows, which is 91 in both cases in your problem query.
>
> Are you able to run the Memoize query in psql with \watch 0.1 for a
> few seconds while you do:
>
> perf record --call-graph dwarf --pid <pid> sleep 2
>
> then send along the perf report.
>
> I locally hacked build_hash_table() in nodeMemoize.c to make the
> hashtable 100 million elements and I see my perf report for a trivial
> Memoize query come up as:
>
> + 99.98% 0.00% postgres postgres [.] _start
> + 99.98% 0.00% postgres libc.so.6 [.]
> __libc_start_main_alias_2 (inlined)
> + 99.98% 0.00% postgres libc.so.6 [.]
> __libc_start_call_main
> + 99.98% 0.00% postgres postgres [.] main
> + 99.98% 0.00% postgres postgres [.] PostmasterMain
> + 99.98% 0.00% postgres postgres [.] ServerLoop
> + 99.98% 0.00% postgres postgres [.] BackendStartup
> (inlined)
> + 99.98% 0.00% postgres postgres [.] BackendRun (inlined)
> + 99.98% 0.00% postgres postgres [.] PostgresMain
> + 99.98% 0.00% postgres postgres [.] exec_simple_query
> + 99.98% 0.00% postgres postgres [.] PortalRun
> + 99.98% 0.00% postgres postgres [.] FillPortalStore
> + 99.98% 0.00% postgres postgres [.] PortalRunUtility
> + 99.98% 0.00% postgres postgres [.]
> standard_ProcessUtility
> + 99.98% 0.00% postgres postgres [.] ExplainQuery
> + 99.98% 0.00% postgres postgres [.] ExplainOneQuery
> + 99.95% 0.00% postgres postgres [.] ExplainOnePlan
> + 87.87% 0.00% postgres postgres [.]
> standard_ExecutorStart
> + 87.87% 0.00% postgres postgres [.] InitPlan (inlined)
> + 87.87% 0.00% postgres postgres [.] ExecInitNode
> + 87.87% 0.00% postgres postgres [.] ExecInitNestLoop
> + 87.87% 0.00% postgres postgres [.] ExecInitMemoize
> + 87.87% 0.00% postgres postgres [.]
> build_hash_table (inlined) <----
> + 87.87% 0.00% postgres postgres [.] memoize_create
> (inlined)
> + 87.87% 0.00% postgres postgres [.]
> memoize_allocate (inlined)
> + 87.87% 0.00% postgres postgres [.]
> MemoryContextAllocExtended
> + 87.87% 0.00% postgres postgres [.] memset (inlined)
>
> Failing that, are you able to pg_dump these tables and load them into
> a PostgreSQL instance that you can play around with and patch?
> Provided you can actually recreate the problem on that instance.
>

+ 71,98% 14,36% postmaster [kernel.kallsyms] [k] page_fault

+ 70,19% 6,59% postmaster libc-2.28.so [.]
__memset_avx2_erms ▒
+ 68,20% 0,00% postmaster postgres [.] ExecInitNode

+ 68,20% 0,00% postmaster postgres [.]
ExecInitNestLoop ▒
+ 68,13% 0,00% postmaster postgres [.]
ExecInitMemoize ▒
+ 68,13% 0,00% postmaster postgres [.]
MemoryContextAllocExtended ▒
+ 63,20% 0,00% postmaster postgres [.]
0x0000000000776b89 ▒
+ 63,20% 0,00% postmaster postgres [.] PostgresMain

+ 63,03% 0,00% postmaster postgres [.]
0x00000000007f48ca ▒
+ 63,03% 0,00% postmaster postgres [.] PortalRun

+ 63,03% 0,00% postmaster postgres [.]
0x00000000007f83ae ▒
+ 63,03% 0,00% postmaster postgres [.]
0x00000000007f7fee ▒
+ 63,03% 0,00% postmaster pg_stat_statements.so [.]
0x00007f5579b599c6 ▒
+ 63,03% 0,00% postmaster postgres [.]
standard_ProcessUtility ▒
+ 63,03% 0,00% postmaster postgres [.] ExplainQuery

+ 62,83% 0,00% postmaster postgres [.]
0x000000000062e83c ▒
+ 62,83% 0,00% postmaster postgres [.] ExplainOnePlan

+ 57,47% 0,14% postmaster [kernel.kallsyms] [k] do_page_fault

+ 57,23% 0,51% postmaster [kernel.kallsyms] [k]
__do_page_fault ▒
+ 55,61% 0,71% postmaster [kernel.kallsyms] [k]
handle_mm_fault ▒
+ 55,19% 0,00% postmaster pg_stat_statements.so [.]
0x00007f5579b5ad2c ▒
+ 55,19% 0,00% postmaster postgres [.]
standard_ExecutorStart ▒
+ 54,78% 0,87% postmaster [kernel.kallsyms] [k]
__handle_mm_fault ▒
+ 53,54% 0,37% postmaster [kernel.kallsyms] [k]
do_anonymous_page ▒
+ 36,36% 0,21% postmaster [kernel.kallsyms] [k]
alloc_pages_vma ▒
+ 35,99% 0,31% postmaster [kernel.kallsyms] [k]
__alloc_pages_nodemask ▒
+ 35,40% 1,06% postmaster [kernel.kallsyms] [k]
get_page_from_freelist ▒
+ 27,71% 0,62% postmaster [kernel.kallsyms] [k] prep_new_page

+ 27,09% 26,99% postmaster [kernel.kallsyms] [k]
clear_page_erms ▒
+ 11,24% 2,29% postmaster [kernel.kallsyms] [k]
swapgs_restore_regs_and_return_to_usermode ▒
+ 8,95% 6,87% postmaster [kernel.kallsyms] [k]
prepare_exit_to_usermode ▒
+ 7,83% 1,01% postmaster [kernel.kallsyms] [k]
mem_cgroup_charge

>
> David
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-05-04 18:48:02 Re: strange slow query - lost lot of time somewhere
Previous Message Tom Lane 2022-05-04 16:25:50 TAP test fail: we don't always detect backend crashes