Re: [Patch] Make block and file size for WAL and relations defined at cluster creation

From: Remi Colinet <remi(dot)colinet(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Make block and file size for WAL and relations defined at cluster creation
Date: 2018-01-03 22:17:59
Message-ID: CADdR5nxKG6VSMzjwEauzxEvf-28zKbnT-_xg-2q3aA4o9hjjtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

2018-01-03 21:51 GMT+01:00 Andres Freund <andres(at)anarazel(dot)de>:

> Hi,
>
> On 2018-01-03 21:43:51 +0100, Remi Colinet wrote:
> > - we may test different combinations of file and block sizes, for the
> > relation and the WAL in order to have the better performances of the
> server.
> > Avoiding a compilation for each combination of values seems to make
> sense.
>
> That's something you need to proof to beneficial *before* we make this
> change.
>

Performance is only one argument advocating for the need of run-time
block/file sizes choices.

DBA may just want to have larger files for its relation and WAL in order to
reduce the number of files. Why would this be an unacceptable wish? Just
because a developer decided to chose a value for the whole world?

What about the fact that storage are getting larger every year? Ok, at some
point in time, a developer may change the default value in the source code
and rebuild. But this is not very handy. For insance, we do not need to
rebuild a kernel when we want to change just one parameter.

By the way, we someone install Postgresql, he may not want to rebuild but
only to use.

>
> > - Selecting the correct values for file and block sizes is a DBA task,
> and
> > not a developer task.
> > For instance, when someone wants to create a Linux filesystem with a
> given
> > block size, he is not forced to accept a given value chosed by the
> > developer of the filesystem driver when this later was compiled.
>
> I'm unconvinced there's as much value syncing up fs in pg as some
> conventional wisdom says.
>

The argument is to tell that visible parameters should be set by users or
DBAs. This is an admin task. For instance, if someone uses a storage with
4K sectors, he may need to set the block size to 4K for both WAL and
relations, without having to rebuild the binaries. Building binaries is not
an easy task for everybody.

>
> > - The file and block sizes should depend mostly of the physical server
> and
> > physical storage.
> > Not of the database software itself.
>
> Citation needed.
>

Someone using a large database will probably want to have larger files.
This is matter of personal perception. Some companies may alsohave defined
policies regarding databases in order to avoid having too many files.

When using a storage with 4K blocks, it may be better to use 4K block sizes
for Postgresql. But then, what about a storage with 16K blocks? Rebuild
again...? And then, you need a build for each block and file size
combination. You may end up with a lot of builds to manage.

>
> > Regarding the cost of using run-time configurable values for file and
> block
> > sizes of the WAL and relations, this cost is low both :
> >
> > - from a developer point of view: the source code changes are spread in
> > many files but only a few one have significant changes.
> > Mainly the tidbitmap.c is concerned the change. Other changes are minor
> > changes.
> >
> > - from a run-time point of view. The overhead is only at the start of the
> > database instance.
> > And moreover, the overhead is still very low at the start of the server,
> > with only a few more dynamic memory allocations.
>
> That's to some degree because you rely on stack allocation of variable
> sided amounts of data - we can't rely on that. E.g. you allocate stack
> variables sized by rel_block_size, that's unfortunately not
> ok. Additionally some of the size calculations will have some
> performance impact.
>

Data structures depending on BLCKSZ and allocated on stack are migrated to
palloc/pfree management in the patch. A few files are concerned by such
change with the most noticeable one being tidbitmap.c. This later one is a
bit more difficult to change because it includes directly the header file
simplehash.h (not nice for gdb). Anyway, I could perform the conversion to
run-time values with a minimal change, even for tidbitmap.c

Regards
Remi

>
> - Andres
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2018-01-03 22:21:57 Re: compress method for spgist - 2
Previous Message Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?= 2018-01-03 22:17:22 Re: compress method for spgist - 2