Re: using memoize in in paralel query decreases performance

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: using memoize in in paralel query decreases performance
Date: 2023-03-07 09:08:55
Message-ID: CAFj8pRBnvSZBQrnyfp5ZPV6f5GrpwdTC+ACy7i_TKPa0fYV_wQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

út 7. 3. 2023 v 9:58 odesílatel David Rowley <dgrowleyml(at)gmail(dot)com> napsal:

> /On Tue, 7 Mar 2023 at 21:09, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
> >
> > po 6. 3. 2023 v 22:52 odesílatel David Rowley <dgrowleyml(at)gmail(dot)com>
> napsal:
> >> I wonder if the additional work_mem required for Memoize is just doing
> >> something like causing kernel page cache evictions and leading to
> >> fewer buffers for ixfk_ite_itemcategoryid and the item table being
> >> cached in the kernel page cache.
> >>
> >> Maybe you could get an idea of that if you SET track_io_timing = on;
> >> and EXPLAIN (ANALYZE, BUFFERS) both queries.
> >
> >
> > https://explain.depesz.com/s/vhk0
>
> This is the enable_memoize=on one. The I/O looks like:
>
> Buffers: shared hit=105661309 read=15274264 dirtied=15707 written=34863
> I/O Timings: shared/local read=2671836.341 write=1286.869
>
> 2671836.341 / 15274264 = ~0.175 ms per read.
>
> > https://explain.depesz.com/s/R5ju
>
> This is the faster enable_memoize = off one. The I/O looks like:
>
> Buffers: shared hit=44542473 read=18541899 dirtied=11988 written=18625
> I/O Timings: shared/local read=1554838.583 write=821.477
>
> 1554838.583 / 18541899 = ~0.084 ms per read.
>
> That indicates that the enable_memoize=off version is just finding
> more pages in the kernel's page cache than the slower query. The
> slower query just appears to be under more memory pressure causing the
> kernel to have less free memory to cache useful pages. I don't see
> anything here that indicates any problems with Memoize. Sure the
> statistics could be better as, ideally, the Memoize wouldn't have
> happened for the i_2 relation. I don't see anything that indicates any
> bugs with this, however. It's pretty well known that Memoize puts
> quite a bit of faith into ndistinct estimates. If it causes too many
> issues the enable_memoize switch can be turned to off.
>
> You might want to consider experimenting with smaller values of
> work_mem and/or hash_mem_multiplier for this query or just disabling
> memoize altogether.
>

I can live with it. This is an analytical query and the performance is not
too important for us. I was surprised that the performance was about 25%
worse, and so the hit ratio was almost zero. I am thinking, but I am not
sure if the estimation of the effectiveness of memoization can depend (or
should depend) on the number of workers? In this case the number of workers
is high.

> David
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-03-07 09:09:23 Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Previous Message Kyotaro Horiguchi 2023-03-07 09:07:58 Re: Add pg_walinspect function with block info columns