From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: using memoize in in paralel query decreases performance |
Date: | 2023-03-07 09:08:55 |
Message-ID: | CAFj8pRBnvSZBQrnyfp5ZPV6f5GrpwdTC+ACy7i_TKPa0fYV_wQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
út 7. 3. 2023 v 9:58 odesílatel David Rowley <dgrowleyml(at)gmail(dot)com> napsal:
> /On Tue, 7 Mar 2023 at 21:09, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
> >
> > po 6. 3. 2023 v 22:52 odesílatel David Rowley <dgrowleyml(at)gmail(dot)com>
> napsal:
> >> I wonder if the additional work_mem required for Memoize is just doing
> >> something like causing kernel page cache evictions and leading to
> >> fewer buffers for ixfk_ite_itemcategoryid and the item table being
> >> cached in the kernel page cache.
> >>
> >> Maybe you could get an idea of that if you SET track_io_timing = on;
> >> and EXPLAIN (ANALYZE, BUFFERS) both queries.
> >
> >
> > https://explain.depesz.com/s/vhk0
>
> This is the enable_memoize=on one. The I/O looks like:
>
> Buffers: shared hit=105661309 read=15274264 dirtied=15707 written=34863
> I/O Timings: shared/local read=2671836.341 write=1286.869
>
> 2671836.341 / 15274264 = ~0.175 ms per read.
>
> > https://explain.depesz.com/s/R5ju
>
> This is the faster enable_memoize = off one. The I/O looks like:
>
> Buffers: shared hit=44542473 read=18541899 dirtied=11988 written=18625
> I/O Timings: shared/local read=1554838.583 write=821.477
>
> 1554838.583 / 18541899 = ~0.084 ms per read.
>
> That indicates that the enable_memoize=off version is just finding
> more pages in the kernel's page cache than the slower query. The
> slower query just appears to be under more memory pressure causing the
> kernel to have less free memory to cache useful pages. I don't see
> anything here that indicates any problems with Memoize. Sure the
> statistics could be better as, ideally, the Memoize wouldn't have
> happened for the i_2 relation. I don't see anything that indicates any
> bugs with this, however. It's pretty well known that Memoize puts
> quite a bit of faith into ndistinct estimates. If it causes too many
> issues the enable_memoize switch can be turned to off.
>
> You might want to consider experimenting with smaller values of
> work_mem and/or hash_mem_multiplier for this query or just disabling
> memoize altogether.
>
I can live with it. This is an analytical query and the performance is not
too important for us. I was surprised that the performance was about 25%
worse, and so the hit ratio was almost zero. I am thinking, but I am not
sure if the estimation of the effectiveness of memoization can depend (or
should depend) on the number of workers? In this case the number of workers
is high.
> David
>
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2023-03-07 09:09:23 | Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) |
Previous Message | Kyotaro Horiguchi | 2023-03-07 09:07:58 | Re: Add pg_walinspect function with block info columns |