Re: Importing a Windows database (in en_GB.CP1252) to linux

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Jean-Christophe BOGGIO <postgresql(at)thefreecat(dot)org>, "pgsql-admin(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: Importing a Windows database (in en_GB.CP1252) to linux
Date: 2025-12-01 15:07:30
Message-ID: f6306ad9fe30e09ceb06900c856dfab57afe8c85.camel@cybertec.at
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Mon, 2025-12-01 at 14:37 +0100, Jean-Christophe BOGGIO wrote:
> I have a (custom) backup created on a Windows machine in en_GB.CP1252 encoding.
> And of course, some characters can't be imported because they don't exist in UTF-8.

Hm? Which character can be encoded in WINDOWS-1252, but not in UTF-8?
I don't think that can be the problem.

> So I created a new cluster on PG18 port 5433 initialized in WIN1252 encoding:
>
> $ \l imlocal
>                                             List of databases
>   Name   | Owner | Encoding | Locale Provider | Collate | Ctype | Locale | ICU Rules | Access privileges  
> ---------+-------+----------+-----------------+---------+-------+--------+-----------+-------------------
>  imlocal | cat   | WIN1252  | libc            | C       | C     | ∅    | ∅       | ∅
>  (1 row)
>
>   I am now trying to import the data in that database but I keep getting this error:
>
> $ pg_restore -p 5433 -t csakafl -d imlocal imlocal20251127.backup
>  pg_restore: error: COPY failed for table "csakafl": ERROR:  invalid byte sequence for encoding "UTF8": 0x92
>  CONTEXT:  COPY csakafl, line 298
>
>  So pg_restore still thinks I want to use UTF8.

That looks like pg_restore sets a wrong client_encoding, which is weird.

What do you get for

pg_restore -p 5433 -t csakafl -s -f - imlocal20251127.backup | grep client_encoding

If the dump was taken from a WINDOWS-1252 encoded database, that line should
read

SET client_encoding = 'WIN1252';

and everything should work fine. But apparently, the client_encoding is set to
UTF-8 in your case.

How did that happen? How exactly did you take that dump?
Did you do anything (like an encoding conversion) with the dump after you took it?

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Jean-Christophe BOGGIO 2025-12-02 09:39:04 Re: Importing a Windows database (in en_GB.CP1252) to linux
Previous Message Ron Johnson 2025-12-01 14:45:02 Re: Migration from MSSQL to POSTGRESQL