Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-12-08 02:54:22
Message-ID: CALNJ-vScdyVYQaYZEKT_=krBbQ3Zb+7c05AmB37NhOJXrU-E+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> just removing the logic that has the
soft upper limit and just have it do cache evictions after each
allocation after the cache first fills

Yeah - having one fewer limit would simplify the code.

Cheers

On Mon, Dec 7, 2020 at 5:27 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Mon, 7 Dec 2020 at 14:25, Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> >
> > > + /* Make a guess at a good size when we're not given a valid size.
> */
> > > + if (size == 0)
> > > + size = 1024;
> > >
> > > Should the default size be logged ?
> >
> > > I'm not too sure if I know what you mean here. Should it be a power of
> > > 2? It is. Or do you mean I shouldn't use a magic number?
> >
> > Using 1024 seems to be fine. I meant logging the default value of 1024
> so that user / dev can have better idea the actual value used (without
> looking at the code).
>
> Oh, right. In EXPLAIN ANALYZE. Good point. I wonder if that's going
> to be interesting enough to show.
>
> > >> Or do you think 98% is not a good number?
> >
> > Since you have played with Result Cache, I would trust your judgment in
> choosing the percentage.
> > It is fine not to expose this constant until the need arises.
>
> I did some experimentation with different values on a workload that
> never gets a cache hit. and just always evicts the oldest entry.
> There's only very slight changes in performance between 90%, 98% and
> 100% with 1MB work_mem.
>
> times in milliseconds measured over 60 seconds on each run.
>
> 90% 98% 100%
> run1 2318 2339 2344
> run2 2339 2333 2309
> run3 2357 2339 2346
> avg (ms) 2338 2337 2333
>
> Perhaps this is an argument for just removing the logic that has the
> soft upper limit and just have it do cache evictions after each
> allocation after the cache first fills.
>
> Setup: same tables as [1]
> alter table hundredk alter column hundredk set (n_distinct = 10);
> analyze hundredk;
> alter system set work_mem = '1MB';
> select pg_reload_conf();
>
> Query
> select count(*) from hundredk hk inner join lookup l on hk.hundredk = l.a;
>
> David
>
> [1]
> https://www.postgresql.org/message-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2020-12-08 02:55:37 Blocking I/O, async I/O and io_uring
Previous Message Amit Kapila 2020-12-08 02:38:25 Re: [Patch] Optimize dropping of relation buffers using dlist