Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Neha Sharma <neha(dot)sharma(at)enterprisedb(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2021-12-21 05:40:17
Message-ID: CAE9k0PmLY7qBM7s9FU-o_B1Ln3Z3htQyfi2H4CnJTGRV8PBMog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I am getting the below error when running the same test-case that Neha
shared in her previous email.

ERROR: 55000: some relations of database "test1" are already in tablespace
"tab1"
HINT: You must move them back to the database's default tablespace before
using this command.
LOCATION: movedb, dbcommands.c:1555

test-case:
========
create tablespace tab1 location '/home/ashu/test1';
create tablespace tab location '/home/ashu/test';

create database test tablespace tab;
\c test

create table t(a int primary key, b text);

create or replace function large_val() returns text language sql as 'select
array_agg(md5(g::text))::text from generate_series(1, 256) g';

insert into t values (generate_series(1,100000), large_val());

alter table t set tablespace tab1 ;

\c postgres
create database test1 template test;

\c test1
alter table t set tablespace tab;

\c postgres
alter database test1 set tablespace tab1; -- this fails with the given
error.

Observations:
===========
Please note that before running above alter database statement, the table
't' is moved to tablespace 'tab' from 'tab1' so not sure why ReadDir() is
returning true when searching for table 't' in tablespace 'tab1'. It should
have returned NULL here:

while ((xlde = ReadDir(dstdir, dst_dbpath)) != NULL)
{
if (strcmp(xlde->d_name, ".") == 0 ||
strcmp(xlde->d_name, "..") == 0)
continue;

ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("some relations of database \"%s\" are already
in tablespace \"%s\"",
dbname, tblspcname),
errhint("You must move them back to the database's
default tablespace before using this command.")));
}

Also, if I run the checkpoint explicitly before executing the above alter
database statement, this error doesn't appear which means it only happens
with the new changes because earlier we were doing the force checkpoint at
the end of createdb statement.

--
With Regards,
Ashutosh Sharma.

On Thu, Dec 16, 2021 at 9:26 PM Neha Sharma <neha(dot)sharma(at)enterprisedb(dot)com>
wrote:

> Hi,
>
> While testing the v8 patches in a hot-standby setup, it was observed the
> master is crashing with the below error;
>
> 2021-12-16 19:32:47.757 +04 [101483] PANIC: could not fsync file
> "pg_tblspc/16385/PG_15_202112111/16386/16391": No such file or directory
> 2021-12-16 19:32:48.917 +04 [101482] LOG: checkpointer process (PID
> 101483) was terminated by signal 6: Aborted
>
> Parameters configured at master:
> wal_level = hot_standby
> max_wal_senders = 3
> hot_standby = on
> max_standby_streaming_delay= -1
> wal_consistency_checking='all'
> max_wal_size= 10GB
> checkpoint_timeout= 1d
> log_min_messages=debug1
>
> Test Case:
> create tablespace tab1 location
> '/home/edb/PGsources/postgresql/inst/bin/test1';
> create tablespace tab location
> '/home/edb/PGsources/postgresql/inst/bin/test';
> create database test tablespace tab;
> \c test
> create table t( a int PRIMARY KEY,b text);
> CREATE OR REPLACE FUNCTION large_val() RETURNS TEXT LANGUAGE SQL AS
> 'select array_agg(md5(g::text))::text from generate_series(1, 256) g';
> insert into t values (generate_series(1,100000), large_val());
> alter table t set tablespace tab1 ;
> \c postgres
> create database test1 template test;
> \c test1
> alter table t set tablespace tab;
> \c postgres
> alter database test1 set tablespace tab1;
>
> --cancel the below command
> alter database test1 set tablespace pg_default; --press ctrl+c
> \c test1
> alter table t set tablespace tab1;
>
>
> Log file attached for reference.
>
> Thanks.
> --
> Regards,
> Neha Sharma
>
>
> On Thu, Dec 16, 2021 at 4:17 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
>> On Thu, Dec 16, 2021 at 12:15 AM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> >
>> > On Thu, Dec 2, 2021 at 07:19:50PM +0530, Dilip Kumar wrote:
>> > From the patch:
>> >
>> > > Currently, CREATE DATABASE forces a checkpoint, then copies all the
>> files,
>> > > then forces another checkpoint. The comments in the createdb()
>> function
>> > > explain the reasons for this. The attached patch fixes this problem
>> by making
>> > > create database completely WAL logged so that we can avoid the
>> checkpoints.
>> > >
>> > > This can also be useful for supporting the TDE. For example, if we
>> need different
>> > > encryption for the source and the target database then we can not
>> re-encrypt the
>> > > page data if we copy the whole directory. But with this patch, we
>> are copying
>> > > page by page so we have an opportunity to re-encrypt the page before
>> copying that
>> > > to the target database.
>> >
>> > Uh, why is this true? Why can't we just copy the heap/index files 8k at
>> > a time and reencrypt them during the file copy, rather than using shared
>> > buffers?
>>
>> Hi Bruce,
>>
>> Yeah, you are right that if we copy in 8k block then we can re-encrypt
>> the page, but in the current system, we are not copying block by
>> block. So the main effort for this patch is not only for TDE but to
>> get rid of the checkpoint we are forced to do before and after create
>> database. So my point is that in this patch since we are copying page
>> by page we get an opportunity to re-encrypt the page. I agree that if
>> the re-encryption would have been the main goal of this patch then
>> true we can copy files in 8k blocks and re-encrypt those blocks, that
>> time even if we have to access some page data for re-encryption (like
>> nonce) then also we can do it, but that is not the main objective.
>>
>> --
>> Regards,
>> Dilip Kumar
>> EnterpriseDB: http://www.enterprisedb.com
>>
>>
>>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2021-12-21 05:46:51 Re: Multi-Column List Partitioning
Previous Message Peter Geoghegan 2021-12-21 05:35:08 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations