Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE
Date: 2021-04-19 17:22:20
Message-ID: CA+TgmoYPpG5_hDRGyO_PB--Mwbsr2WMjeDSN-xh_d8bjiBTcBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 19, 2021 at 1:03 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> That doco is explaining the users-eye view of it. Places addressed
> to datatype developers, such as the CREATE TYPE reference page, see
> it a bit differently. CREATE TYPE for instance points out that
>
> All storage values other than plain imply that the functions of the
> data type can handle values that have been toasted, as described in ...

Interesting. It feels to me like SET STORAGE PLAIN feels like it is
really trying to be two different things. Either you want to inhibit
compression and external storage for performance reasons, or your data
type can't support either one. Maybe we should separate those
concepts, since there's no mode right now that says "don't ever
compress, and externalize only if there's absolutely no other way,"
and there's no way to disable compression and externalization without
also killing off short headers. :-(

> The notion that short header doesn't cost anything seems extremely Intel-centric to me.

I don't think so. It's true that Intel is very forgiving about
unaligned accesses compared to some other architectures, but I think
if you have a terabyte of data, you want it to fit into as few disk
pages as possible pretty much no matter what architecture you're
using. The dominant costs are going to be the I/O costs, not the CPU
costs of dealing with unaligned bytes. In fact, even if you have a
gigabyte of data, I bet it's *still* better to use a more compact
on-disk representation. Now, the dominant cost is going to be pumping
the data through the L3 CPU cache, which is still - I think - going to
be quite a lot more important than the CPU costs of dealing with
unaligned bytes. The CPU bus is an I/O bottleneck not unlike the disk
itself, just at a higher rate of speed which is still way slower than
the CPU speed. Now if you have a megabyte of data, or better yet a
kilobyte of data, then I think optimizing for CPU efficiency may well
be the right thing to do. I don't know how much 4-byte varlena headers
really save there, but if I were designing a storage representation
for very small data sets, I'd definitely be thinking about how I could
waste space to shave cycles.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-04-19 17:25:21 Re: pg_amcheck option to install extension
Previous Message Ondřej Žižka 2021-04-19 17:19:37 Synchronous commit behavior during network outage