Re: pg_dump's "--exclude-table" and "--exclude-table-data" options are ignored and/or cause the dump to fail entirely unless both the schema and table name use 1950s-era identifiers.

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: tutiluren(at)tutanota(dot)com
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_dump's "--exclude-table" and "--exclude-table-data" options are ignored and/or cause the dump to fail entirely unless both the schema and table name use 1950s-era identifiers.
Date: 2020-07-21 06:43:54
Message-ID: CAFj8pRC1brQpeic_5xkwaRXF+yboaiTo7dCyxR_fMhLFaqrewA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

út 21. 7. 2020 v 8:30 odesílatel <tutiluren(at)tutanota(dot)com> napsal:

> After wasting yet another full day on this what should be a non-problem, I
> have come to the equally sad and maddening conclusion that the following
> crucial features of `pg_dump` are utterly broken, possibly "only" on
> Windows (only tested there):
>
> --exclude-table
> --exclude-table-data
>
> The problem: the options *refuse* to work if your schema or table name
> contains any non-lowercase, non-ASII character. I've tested it heavily
> today (with PostgreSQL 11.7 on Windows 10). I'm 99% sure by now that I'm
> not making some kind of "obvious mistake", including setting the encoding
> in every possible manner...
>
> Pardon me if it's actually been fixed recently, but it seems **extremely**
> unlikely. I couldn't find any mention of anything related to
> "--exclude-table" having been fixed in any changelog.
>
> PostGIS did the same thing: it stubbornly ignored PostgreSQL's own rules
> of how you can name a schema/table, eventually bullying me into renaming
> its schema "postgis" because I *needed* it to work, which now looks awfully
> inconsistent with my other, properly named schemas.
>
> I *need* this to work as well, because I can't keep making full backups of
> gigantic tables full of longterm-useless debug data. However, I will not
> let `pg_dump` bully me into renaming my entire database structure. I'm
> never, ever going through that again in my life. Plus I'm actually
> following its long-established rules, so I objectively am not doing
> anything wrong -- `pg_dump` is. Yet I'm the one who has to suffer... And
> yes, I know that it's volunteers and so on. I've paid to the PostgreSQL
> project with a large part of my life instead of money, with endless nights
> struggling with things like this.
>
> PostGIS is a third-party(-ish) extension, so it's *slightly* more
> understandable there. `pg_dump`, on the other hand, is the **official tool
> for backing up a PostgreSQL database**, yet it still somehow doesn't
> understand PostgreSQL's own syntax! The PostgreSQL manual very clearly
> states that you can name your schemas and tables to (almost) anything you
> want, as long as you double-quote them when referring to them, as
> expected/natural.
>
> Only its own `pg_dump` tool doesn't follow this rule. It demands --
> *assumes*! -- that everything is in all-lowercase, only ASCII, with no
> spaces. It doesn't matter if you add quotes, which always works in PG SQL
> queries and which is just "how you do it".
>
> Try it out yourself, by creating a test schema called "Personal stöff" and
> a table in it called "My däiary". Then create a text column and make it PK
> and then add the text "This is supposed to be ignored.". Then try to run
> this command:
>
> pg_dump --format plain --verbose --file "C:\test.txt"
> --exclude-table-data="Personal stöff"."My däiary" --host="localhost"
> --port="5432" --username="postgres" --dbname="TestDB"
>
> I've also tried a million variations of those quotes as well, including
> nested ones, and also tried with and without the --encoding option as well
> as setting the environment variable for encoding. It doesn't seem to matter
> -- it either fails to run the command or runs it but just ignores the
> exclude rules.
>
> The resulting `test.txt` will contain the `This is supposed to be
> ignored.` text, even though we have told `pg_dump` to exclude the data in
> the specified table.
>
> If you rename the schema name and table name to `personal_stuff.my_diary`,
> it will suddenly work.
>
> This feature is apparently nine years old, if a blog post I found is
> accurate, yet is still at what can only be described as a "pre-alpha test
> stage". What I wonder, other than what to do now, is what all-lowercase PG
> users used to do a decade ago when they needed to exclude certain tables'
> data...? Surely that is a very basic feature for a backup tool?
>
> Just to avoid wasting time, when the command doesn't work at all, it
> outputs things like this:
>
> pg_dump: [archiver (db)] query failed: ERROR: invalid byte sequence for
> encoding "UTF8": 0xf6 0x72 0x66 0x72
> pg_dump: [archiver (db)] query was: SELECT c.oid
> FROM pg_catalog.pg_class c
> LEFT JOIN pg_catalog.pg_namespace n
> ON n.oid OPERATOR(pg_catalog.=) c.relnamespace
> WHERE c.relkind OPERATOR(pg_catalog.=) ANY
> (array['r', 'S', 'v', 'm', 'f', 'p'])
> AND c.relname OPERATOR(pg_catalog.~) '^(table name)$'
> AND n.nspname OPERATOR(pg_catalog.~) '^(schema name)$'
>
> Note how "table name" and "schema name" are both in all-lowercase even
> though they were inputted as "Table name" and "Schema name". This indicates
> to me that it uses some extremely crude and inaccurate logic for its
> internal mechanism/queries.
>
> PS: The thousand-step ordeal I had to go through to finally get to send
> this e-mail to your e-mail list did not exactly improve my already bad
> mood... You must have got *extreme* amounts of abuse here to actually
> implement such a system... I hope that my e-mail is taken seriously,
> because I find this (and what PostGIS does) very serious.
>

It is working on Linux, but you can be careful when you are writing case
sensitive identifiers:

postgres=# create table "Foo"(a int);
CREATE TABLE
postgres=# insert into "Foo" values(10);
INSERT 0 1

[pavel(at)nemesis tvision]$ pg_dump -T '"Foo"'
--
-- PostgreSQL database dump
--

postgres=# create schema "ŽlutýPes";
CREATE SCHEMA
postgres=# create table "ŽlutýPes"."ŽlutáKočka"(a int);
CREATE TABLE
postgres=# insert into "ŽlutýPes"."ŽlutáKočka" values(10);
INSERT 0 1

pg_dump -T '"ŽlutýPes"."ŽlutáKočka"'
pg_dump -N '"ŽlutýPes"'

Regards

Pavel

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2020-07-21 06:59:27 Re: pg_dump's "--exclude-table" and "--exclude-table-data" options are ignored and/or cause the dump to fail entirely unless both the schema and table name use 1950s-era identifiers.
Previous Message tutiluren 2020-07-21 06:30:05 pg_dump's "--exclude-table" and "--exclude-table-data" options are ignored and/or cause the dump to fail entirely unless both the schema and table name use 1950s-era identifiers.