Re: Avoid detoast overhead when possible

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: zhihuifan1213(at)163(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Avoid detoast overhead when possible
Date: 2023-12-04 14:41:24
Message-ID: CAEze2WiXLxu1MA9hNP7wVyoVnUHU3g++L2EGRGJCgvbanPadmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 4 Dec 2023 at 14:23, <zhihuifan1213(at)163(dot)com> wrote:
>
>
> Hi,
>
> Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> writes:
>
> > On Mon, 4 Dec 2023 at 07:56, <zhihuifan1213(at)163(dot)com> wrote:
>
> > ..It would also add overhead when
> > we write results to disk, such as spilling merge sorts, hash join
> > spills, or CTE materializations.
> >
> > Could you find a way to reduce this memory and IO usage when the value
> > is not going to be used immediately? Using the toast pointer at such
> > points surely will be cheaper than storing the full value again and
> > again.
>
> I'm not sure I understand you correctly, I think the issue you raised
> here is covered by the below design (not implemented in the patch).
>
> "
> However this patch just throws away almost all the benefits of toast, so
> how can we draw a line between should vs should not do this code path?
> IMO, we should only run the 'eagerly detoast' when we know that we will
> have a FuncCall against the toast_col on **the current plan node**. I
> think this information can be get from Qual and TargetList. If so, we
> can set the slot->detoast_attrs accordingly.
> "
>
> Let's see an example of this:
>
> SELECT f(t1.toastable_col) FROM t1 join t2 using(c);
>
> Suppose it is using hash join and t1 should be hashed. With the above
> design, we will NOT detoast toastable_col at the scan of t1 or hash t1
> since there is no one "funcall" access it in either SeqScan of t1 or
> hash (t1). But when we do the projection on the joinrel, the detoast
> would happen.

I assume that you detoast the column only once, and not in a separate
per-node context? This would indicate to me that a query like the
following would detoast toastable_col and never "retoast" it.

SELECT toastable_col FROM t1
WHERE f(t1.toastable_col)
ORDER BY nonindexed;

or the equivalent in current PG catalogs:

SELECT ev_class
FROM pg_rewrite
WHERE octet_length(ev_action) > 1
ORDER BY ev_class;

whose plan is

Sort
Sort Key: ev_class
-> Seq Scan on pg_rewrite
Filter: (octet_length((ev_action)::text) > 1)

This would first apply the condition (because sort-then-filter is
generally more expensive than filter-then-sort), and thus permanently
detoast the column, which is thus detoasted when it is fed into the
sort, which made the sort much more expensive than without the
aggressive detoasting.

Or do I still misunderstand something here?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-12-04 14:53:44 Re: Use PGAlignedBlock instead of "char buf[BLCKSZ]" in more places
Previous Message Drouvot, Bertrand 2023-12-04 14:37:48 Re: Synchronizing slots from primary to standby