Re: [PATCH] Add --ordered option to pg_dump

From: Thomas Sibley <tom+pg(at)zulutango(dot)org>
To: Bob Lunney <bob_lunney(at)yahoo(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Add --ordered option to pg_dump
Date: 2014-05-21 23:25:58
Message-ID: 20140521232558.GA25842@orac
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I came across this thread from a web search when looking for prior art.
I'm reviving it to alert other interested folks to a flaw in the
provided patch should they try to use it. The feature request seems
somewhat common on the web, and the patch here would cause some fun
debugging sessions otherwise.

On Thu, Apr 15, 2010 at 10:48:25AM -0700, Bob Lunney wrote:
> Using --ordered will order the data by primary key or
> unique index, if one exists, and use the "smallest" ordering
> (i.e. least number of columns required for a unique
> order). 

The provided patch for pg_dump.c fails on a corner case with deleted
columns. It's likely rare, but tripping over it will either cause an
unexpected order (at best) or a failed dump (at worst).

> +       if (ordered)
> +       {
> +               appendPQExpBuffer(p, "SELECT array_to_string(indkey, ','), array_length(i.indkey, 1) "
> +                                    "  FROM pg_catalog.pg_index i "
> +                                    " WHERE (i.indisprimary = true or i.indisunique = true) "
> +                                    "   AND i.indisvalid = true "
> +                                    "   AND i.indrelid = '%s'::regclass "
> +                                    " ORDER BY 2, 1 LIMIT 1",

In this stanza the internal column numbers of the selected index keys
are joined with commas to produce a string.

> +                       const char *s = PQgetvalue(res, 0, 0);
> +                       if (s != NULL)
> +                       {
> +                           appendPQExpBuffer(q, " ORDER BY %s", s);
> +                       }

Then that string is used when constructing the ORDER BY clause for the
data retrieval query. The query is a SELECT * FROM ... which means the
positional ORDER BY takes internal column numbers and uses them for the
implicit select-list columns. That's fine if * is guaranteed to match
the internal column order, but that's not true in cases where the table
has a deleted column between two columns in the index. The query will
either order by an unexpected column not part of the index (with
potentially very slow results), or it will fail if it overruns the
length of the select-list. A SQL script demonstrating the failure mode
is attached.

To work around this corner case, the patch should map the internal
column numbers to column names (via pg_attribute or somesuch) and then
use those in the ORDER BY.

Hopefully this'll save someone else a bit of debugging in the future.

Thomas

Attachment Content-Type Size
naive-order-by-attnum-bug.sql text/plain 790 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2014-05-22 02:22:01 9.4 btree index corruption
Previous Message Tom Lane 2014-05-21 20:37:46 Re: [9.3] Should we mention "set_config(...)" in 18.1.3 in Server Configuration?