Re: Covering GiST indexes

From: Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Covering GiST indexes
Date: 2018-04-12 20:20:32
Message-ID: CAC8Q8tKNnRfnfFaiMP2aLfCb47c-G3-rvp5fa6UHsrVx4p2j4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Another thing that could be done for PostGIS geometries is just another
> opclass which
> stores geometries "as is" in leafs. As I know, geometries contain MBRs
> inside their
> own, so there is no need to store extra MBR. I think the reason why
> PostGIS
> doesn't have such opclass yet is that geometries are frequently large and
> easily
> can exceed maximum size of index tuple.
>

Geometry datatype layout was designed with TOASTing in mind: most of data
is stored in the header, including type, SRID, box and some other flags, so
getting just several first bytes tells you a lot.

PostGIS datasets are often of a mixed binary length: in buildings, for
example, it is quite common to have a lot of four corner houses, and just
one mapped as a circle, that digitizing software decided to make via
720-point polygon. Since reading it from TOAST for an index would require a
seek of some kind, it may be as efficient to just look it up in the table.

This way a lossy decompress function can help with index only scans: up to
some binary length, try to store the original geometry in the index. After
that, store a shape that has less points in it but covers slightly larger
area, plus a flag that it's not precise. There are a lot of ways to
generate a covering shape with less points: obvious is a box, less obvious
is non axis aligned box, a collection of boxes for a multipart shape, an
outer ring for an area with lots of holes, a convex hull or a smallest
enclosing k-gon.

In GIS there is a problem of border of Russia: the country overlaps over
180 meridian and has a complex border shape. if you take a box of it, it
spans from -180 to 180. If you query any spot in US or in Europe, you'll
have it intersecting with your area, require a full recheck, complete
detoast and decompression, and then "no, it's not a thing we need, skip".
Allowing anything better than a box would help. If we're allowing a complex
shape - we've got the container for it, geometry.

If we're storing geometry in index and original's small, why not allow
complete Index Only Scan on it, and let it skip recheck? :)

Darafei Praliaskouski,
GIS Engineer / Juno Minsk

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2018-04-12 20:27:18 Re: submake-errcodes
Previous Message Alvaro Herrera 2018-04-12 19:59:05 Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.