Re: (Re)building index using itself or another index of the same table

From: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: (Re)building index using itself or another index of the same table
Date: 2019-09-16 13:24:19
Message-ID: 87blvkcr24.fsf@ars-thinkpad
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:

> On Thu, Sep 12, 2019 at 11:08:28AM -0400, Tom Lane wrote:

>>I have exactly no faith that this fixes things enough to make such
>>cases supportable. And I have no interest in opening that can of
>>worms anyway. I'd rather put in some code to reject database
>>accesses in immutable functions.
>>
>
> Same here. My hunch is a non-trivaial fraction of applications using
> this "trick" is silently broken in various subtle ways.

Ok, I see the point. However, "could not read block" error might seem
quite scary to the users; it looks like data corruption. How about
ERRORing out then in get_relation_info instead of skipping reindexing
indexes, like in attached? Even if this doesn't cover all cases, at
least one scenario observed in the field would have better error
message.

Rejecting database access completely in immutable functions would be
unfortunate for our particular case, because this GIN index on
expression joining the very indexed table multiple times (and using thus
btree index) is, well, useful. Here is a brief description of the
case. Indexed table stores postal addresses, which are of hierarchical
nature (e.g. country-region-city-street-house). Single row is one element
of any depth (e.g. region or house); each row stores link to its parent
in parent_guid column, establishing thus the hierarchy
(e.g. house has link to the street).

The task it to get the full address by typing random parts of it
(imagine typing hints in Google Maps). For that, FTS is used. GIN index
is built on full addresses, and to get the full address table is climbed
up about six times (hierarchy depth) by following parent_guid chain.

We could materialize full addresses in the table and eliminate the need
to form them in the index expression, but that would seriously increase
amount of required storage -- GIN doesn't store indexed columns fully,
and thus it is cheaper to 'materialize' full addresses inside it only.

Surely this is a hack which cheats the system. We might imagine creating
some functionality (kinda index referring to multiple rows of the table
-- or even rows of different tables) making it unneccessary, but such
functionality doesn't exist today, and the hack is useful, if you
understand the risk.

>>> One might argue that function selecting from table can hardly be called
>>> immutable, and immutability is required for index expressions. However,
>>> if user is sure table contents doesn't change, why not?
>>
>>If the table contents never change, why are you doing VACUUM FULL on it?
>>
>
> It's possible the columns referenced by the index expression are not
> changing, but some additional columns are updated.

Yeah. Also table can be CLUSTERed without VACUUM FULL.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-ERROR-out-early-on-attempt-to-touch-user-indexes-whi.patch text/x-diff 5.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-09-16 13:30:06 Re: block-level incremental backup
Previous Message Nikita Glukhov 2019-09-16 12:44:07 Re: Bug in GiST paring heap comparator