RE: Implement UNLOGGED clause for COPY FROM

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Fujii Masao' <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "'pgsql-hackers(at)postgresql(dot)org'" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Implement UNLOGGED clause for COPY FROM
Date: 2020-07-09 06:17:12
Message-ID: OSBPR01MB488838F80A90CEEFD2C76294ED640@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fujii-san

Thank you for your interest in this idea.

> This feature can work safely with wal_level=replica or logical?
> Or it can work only with wal_level=minimal?
>If yes, what is the main difference
> between this method and wal_skip_threshold?
I'm thinking this feature can be used
when you set any parameters of wal_level.
Besides that, data loading into a table *with some data*
should be allowed. This means I don't want to limit
the usage of this feature only for initial load
for empty table or under condition of 'minimal' wal_level in other words.

Let me explain more detail of the background.

I got a report that one of my customers says that
multiple COPY from multiple sessions have a bottleneck to write WAL.
Her use case was DWH system using postgres mainly to load dozens of GB (or more) log data
from multiple data sources to execute night batch processing everyday.

Her scenario included both initial load to empty table
and load to table that already has records.

In passing, she also used our company's product of parallel loader,
to load data with dozens of, nearly 100, BGWs at the same time.
Through investigation of iostat,
they found the same problem that CPU worked for WAL write intensively.

This could happen after the implementation
of Parallel copy that is really hotly discussed and reviewed in the mailing lists now.
So I thought it's good to discuss this in advance.

> > 4. Execute the data loading, bypassing WAL generation for data.
> > 5. Sync the data to disk by performing checkpoint.
>
> What happens if the server crashes before #5? Since no WAL for data-loading can
> be replayed, the target table should be truncated?
My answer for this is just to load that COPY data again.
It's because the application itself knows what kind of data was loaded
from the command.

Lastly, let me add some functional specifications of this clause.
The syntax is "COPY tbl FROM ‘/path/to/input/file’ UNLOGGED".

In terms of streaming replication,
I'd like to ask for advice of other members in this community.
Now, I think this feature requires to re-create standby
immediately after the COPY UNLOGGED like Oracle's clause
but I wanna make postgres more attractive than Oracle to users.
Does someone have any ideas ?

Regards,
Takamichi Osumi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message movead.li@highgo.ca 2020-07-09 06:19:46 Re: A patch for get origin from commit_ts.
Previous Message Michael Paquier 2020-07-09 05:37:46 Re: A patch for get origin from commit_ts.