Re: GenBKI emits useless open;close for catalogs without rows

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: GenBKI emits useless open;close for catalogs without rows
Date: 2023-09-22 15:50:08
Message-ID: CAEze2WidWxspc8LaQAQF3bURGjZm0yCN=eA_9cbq1=8zE9LvPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 22 Sept 2023 at 00:25, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2023-09-19 21:05:41 +0300, Heikki Linnakangas wrote:
> > On 18/09/2023 17:50, Matthias van de Meent wrote:
> > > (initdb takes about 73ms locally with syncing disabled)
> >
> > That's impressive. It takes about 600 ms on my laptop. Of which about 140 ms
> > goes into processing the BKI file. And that's with "initdb -no-sync" option.
>
> I think there must be a digit missing in Matthias' numbers.

Yes, kind of. The run was on 50 iterations, not the assumed 250.
Also note that the improved measurements were recorded inside the
boostrap-mode PostgreSQL instance, not inside the initdb that was
processing the postgres.bki file. So it might well be that I didn't
improve the total timing by much.

> > > Various methods of reducing the size of postgres.bki were applied, as
> > > detailed in the patch's commit message. I believe the current output
> > > is still quite human readable.
> >
> > Overall this does not seem very worthwhile to me.
>
> Because the wins are too small?
>
> FWIW, Making postgres.bki smaller and improving bootstrapping time does seem
> worthwhile to me. But it doesn't seem quite right to handle the batching in
> the file format, it should be on the server side, no?

The main reason I did batching in the file format is to reduce the
storage overhead of the current one "INSERT" per row. Batching
improved that by replacing the token with a different construct, but
it's not neccessarily the only solution. The actual parser still
inserts the tuples one by one in the relation, as I didn't spend time
on making a simple_heap_insert analog for bulk insertions.

> We really should stop emitting WAL during initdb...

I think it's quite elegant that we're able to bootstrap the relation
data of a new PostgreSQL cluster from the WAL generated in another
cluster, even if it is indeed a bit wasteful. I do see your point
though - the WAL shouldn't be needed if we're already fsyncing the
files to disk.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2023-09-22 18:52:00 Invalidate the subscription worker in cases where a user loses their superuser status
Previous Message Anthonin Bonnefoy 2023-09-22 15:45:27 Re: POC: Extension for adding distributed tracing - pg_tracing