Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
Date: 2018-11-13 21:29:25
Message-ID: CAH2-Wz=WRpJeej2xpd+FbzJXYAZZPiTgebTGv1P2NMcvgzvZGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 5, 2018 at 7:46 PM Andrey Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> Problem No. 1 will be amplified with new asynchronous operations,
> background workers and distributing query execution. It is not problem
> of DBMS. The solution is change the tests: include sorting of query
> results, sorting of system messages before diff operation.
> If steady order of messages is critical for users we can sort
> targetObjects array in the begin of reportDependentObjects() routine by
> classId, objectId and objectSubId fields.

Yeah, maybe. I'm not that worried about the order of objects; we can
probably continue to get away with suppressing the list of objects by
placing a "\set VERBOSITY terse" where needed -- that's something
we've already been doing for some time [1]. I accept that it would be
better to sort the output, but I'm concerned that that would be a
difficult, risky project. What if there was a huge number of dependent
objects? What if a memory allocation fails?

> Problem No. 2: we can suppress some output optimizations in
> object_address_present_add_flags() routine and print all deleted objects.

If there is anything that makes it necessary to sort, it's this -- the
order of visitation can affect whether or not
object_address_present_add_flags() suppresses redundant entries. But I
still prefer to fix the problem by changing the scan order to be what
we actually want it to be.

> Problem No. 3: I suppose we can go one of two ways:
> a) print all depended objects returned by scan of DependDependerIndexId
> relation, not only the first.
> b) search a root of dependence and print only it.

A solution does occur to me that I'm kind of embarrassed to suggest,
but that would nonetheless probably do the job:

The general problem here is that the scan order of a scan that uses
pg_depend_depender_index doesn't reliably give us the order we
actually want among duplicate index entries (at least, you could
choose to characterize it in that narrow way). The index
pg_depend_depender_index looks like this:

Column │ Type │ Key? │ Definition
──────────┼─────────┼──────┼────────────
classid │ oid │ yes │ classid
objid │ oid │ yes │ objid
objsubid │ integer │ yes │ objsubid
btree, for table "pg_catalog.pg_depend"

Note that this isn't a unique index -- there are no unique indexes on
pg_depend at all, in fact. (Note also that the same observations apply
to pg_shdepend.)

The objsubid (not refobjsubid) is 0 in all cases that we have problems
with -- we're never going to have a problem with multiple
dependent-object-is-column entries that have the same '(classid,
objid, objsubid)' value, as far as I can tell:

pg(at)regression[2201]=# SELECT count(*), classid, objid, objsubid FROM
pg_depend WHERE objsubid != 0 GROUP BY classid, objid, objsubid HAVING
count(*) > 1;
count │ classid │ objid │ objsubid
───────┼─────────┼───────┼──────────
(0 rows)

(i.e. We're only having problems with some of the entries/values that
you'll see when the above query is changed to have "bjsubid = 0" in
its WHERE clause.)

Why not vary the objsubid value among entries that don't use it
anyway, so that they have a negative integer objsubid values rather
than 0? That would have the effect of forcing the scan order of
pg_depend_depender_index scans to be whatever we want it to be.
dependent-object-is-column entries would not have their sort order
affected anyway, because in practice there is only one entry with the
same '(classid, objid, objsubid)' value when objsubid > 0.

We could invent some scheme that made pg_depend/pg_shdepend entries
use fixed objsubid negative integer codes according to the
*referenced* object type (refobjid), and/or the entry's deptype.
Places that look at ObjectAddress.objectSubId would look for entries
that were > 0 rather than != 0, so we wouldn't really be changing the
meaning of objsubid (though it would need to be documented in the user
docs, and have a release note compatibility entry).
findDependentObjects() would still use "nkeys = 2" for these entries;
it would be one of the places that would be taught to check
objectSubId > 0 rather than objectSubId != 0. But it would get the
scan order it actually needs.

I haven't attempted to put this together just yet, because I want to
see if it passes the laugh test. My sense is that the best way to fix
the problem is to force the scan order to be the one that we
accidentally depend on using some mechanism or other. Though not
necessarily the one I've sketched out.

[1] https://postgr.es/m/24279.1525401104@sss.pgh.pa.us
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-11-13 21:57:16 Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well
Previous Message Tom Lane 2018-11-13 21:08:04 Re: Restore CurrentUserId only if 'prevUser' is valid when abort transaction