Re: Implement UNLOGGED clause for COPY FROM

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Implement UNLOGGED clause for COPY FROM
Date: 2020-07-24 08:14:42
Message-ID: CA+fd4k4ErA8mnQcmf=X9x-6KWZvPYb4b2QeG_qCw=Gvz2yworw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 17 Jul 2020 at 13:23, osumi(dot)takamichi(at)fujitsu(dot)com
<osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
>
> Hi,
>
> > AFAICS, we can already accomplish basically the same thing as what you want to
> > do like this:
> >
> > alter table foo set unlogged;
> > copy foo from ...;
> > alter table foo set logged;
> This didn't satisfy what I wanted.
> In case that 'foo' has huge amount of rows at the beginning,
> this example would spend much time to copy
> the contents of 'foo' twice to swap relfilenodes atomically.
> When that loaded data by COPY is big too, its execution time becomes much longer.
>
> > You keep on ignoring the indexes... not to mention replication.
> Sorry for having made you think like this.
>
> When the server crash occurs during data loading of COPY UNLOGGED,
> it's a must to keep index consistent of course.
> I'm thinking that to rebuild the indexes on the target table would work.
>
> In my opinion, UNLOGGED clause must be designed to guarantee that
> where the data loaded by this clause is written starts from the end of all other data blocks.
> Plus, those blocks needs to be protected by any write of other transactions during the copy.
> Apart from that, the server must be aware of which block is the first block,
> or the range about where it started or ended in preparation for the crash.
>
> During the crash recovery, those points are helpful to recognize and detach such blocks
> in order to solve a situation that the loaded data is partially synced to the disk and the rest isn't.

How do online backup and archive recovery work?

Suppose that the user executes pg_basebackup during COPY UNLOGGED
running, the physical backup might have the portion of tuples loaded
by COPY UNLOGGED but these data are not recovered. It might not be a
problem because the operation is performed without WAL records. But
what if an insertion happens after COPY UNLOGGED but before
pg_stop_backup()? I think that a new tuple could be inserted at the
end of the table, following the data loaded by COPY UNLOGGED. With
your approach described above, the newly inserted tuple will be
recovered during archive recovery, but it either will be removed if we
replay the insertion WAL then truncate the table or won’t be inserted
due to missing block if we truncate the table then replay the
insertion WAL, resulting in losing the tuple although the user got
successful of insertion.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajin Cherian 2020-07-24 08:15:58 Re: deferred primary key and logical replication
Previous Message Ajin Cherian 2020-07-24 08:13:26 Re: logical replication empty transactions