Re: Initdb-time block size specification

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Christensen <david(dot)christensen(at)crunchydata(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Initdb-time block size specification
Date: 2023-06-30 23:44:04
Message-ID: 20230630234404.ln26g67vatp2o65s@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-06-30 16:28:59 -0500, David Christensen wrote:
> On Fri, Jun 30, 2023 at 3:29 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >> And indeed. Comparing e.g. TPC-H, I see *massive* regressions. Some queries
> > are the same, sobut others regress by up to 70% (although more commonly around
> > 10-20%).
>
> Hmm, that is definitely not good.
>
> > That's larger than I thought, which makes me suspect that there's some bug in
> > the new code.
>
> Will do a little profiling here to see if I can figure out the
> regression. Which build optimization settings are you seeing this
> under?

gcc 12 with:

meson setup \
-Doptimization=3 -Ddebug=true \
-Dc_args="-ggdb -g3 -march=native -mtune=native -fno-plt -fno-semantic-interposition -Wno-array-bounds" \
-Dc_link_args="-fuse-ld=mold -Wl,--gdb-index,--Bsymbolic" \
...

Relevant postgres settings:
-c huge_pages=on -c shared_buffers=12GB -c max_connections=120
-c work_mem=32MB
-c autovacuum=0 # I always do that for comparative benchmarks, too much variance
-c track_io_timing=on

The later run where I saw the smaller regression was with work_mem=1GB. I
just had forgotten to adjust that.

I had loaded tpch scale 5 before, which is why I just used that.

FWIW, even just "SELECT count(*) FROM lineitem;" shows a substantial
regression.

I disabled parallelism, prewarmed the data and pinned postgres to a single
core to reduce noise. The result is the best of three (variance was low in all
cases).

HEAD patch
index only scan 1896.364 2242.288
seq scan 1586.990 1628.042

A profile shows that 20% of the runtime in the IOS case is in
visibilitymap_get_status():

+ 20.50% postgres.new postgres.new [.] visibilitymap_get_status
+ 19.54% postgres.new postgres.new [.] ExecInterpExpr
+ 14.47% postgres.new postgres.new [.] IndexOnlyNext
+ 6.47% postgres.new postgres.new [.] index_deform_tuple_internal
+ 4.67% postgres.new postgres.new [.] ExecScan
+ 4.12% postgres.new postgres.new [.] btgettuple
+ 3.97% postgres.new postgres.new [.] ExecAgg
+ 3.92% postgres.new postgres.new [.] _bt_next
+ 3.71% postgres.new postgres.new [.] _bt_readpage
+ 3.43% postgres.new postgres.new [.] fetch_input_tuple
+ 2.87% postgres.new postgres.new [.] index_getnext_tid
+ 2.45% postgres.new postgres.new [.] MemoryContextReset
+ 2.35% postgres.new postgres.new [.] tts_virtual_clear
+ 1.37% postgres.new postgres.new [.] index_deform_tuple
+ 1.14% postgres.new postgres.new [.] ExecStoreVirtualTuple
+ 1.13% postgres.new postgres.new [.] PredicateLockPage
+ 1.12% postgres.new postgres.new [.] int8inc
+ 1.04% postgres.new postgres.new [.] ExecIndexOnlyScan
+ 0.57% postgres.new postgres.new [.] BufferGetBlockNumber

mostly due to

│ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
2.46 │ lea -0x60(,%rdx,4),%rcx
│ xor %edx,%edx
59.79 │ div %rcx

You can't have have divisions for this kind of thing in the vicinity of a
peformance critical path. With compile time constants the compiler can turn
this into shifts, but that's not possible as-is after the change.

While not quite as bad as divisions, the paths with multiplications are also
not going to be ok. E.g.
return (Block) (BufferBlocks + ((Size) (buffer - 1)) * CLUSTER_BLOCK_SIZE);
is going to be noticeable.

You'd have to turn all of this into shifts (and enforce power of 2 sizes, if
you aren't yet).

I don't think pre-computed tables are a viable answer FWIW. Even just going
through a memory indirection is going to be noticable. This stuff is in a
crapton of hot code paths.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2023-06-30 23:57:09 RFC: pg_stat_logmsg
Previous Message Bruce Momjian 2023-06-30 23:26:12 Re: Initdb-time block size specification