Quick Links

Tid scan improvements

From:	Edmund Horner <ejrh00(at)gmail(dot)com>
To:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Tid scan improvements
Date:	2018-08-12 02:29:05
Message-ID:	CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

To scratch an itch, I have been working on teaching TidScan how to do
range queries, i.e. those using >=, <, BETWEEN, etc. This means we
can write, for instance,

SELECT * FROM t WHERE ctid >= '(1000,0)' AND ctid < '(2000,0)';

instead of resorting to the old trick:

SELECT * FROM t WHERE ctid = ANY (ARRAY(SELECT format('(%s,%s)', i, j)::tid
FROM generate_series(1000,1999) AS gs(i), generate_series(1,200)
AS gs2(j)));

where "200" is some guess at how many tuples can fit on a page for that table.

There's some previous discussion about this at
https://www.postgresql.org/message-id/flat/CAHyXU0zJhg_5RtxKnNbAK%3D4ZzQEFUFi%2B52RjpLrxtkRTD6CDFw%40mail.gmail.com#3ba2c3a6be217f40130655a3250d80a4
.

Since range scan execution is rather different from the existing
TidScan execution, I ended up making a new plan type, TidRangeScan.
There is still only one TidPath, but it has an additional member that
describes which method to use.

As part of the work I also taught TidScan that its results are ordered
by ctid, i.e. to set a pathkey on a TidPath. The benefit of this is
that queries such as

SELECT MAX(ctid) FROM t;
SELECT * FROM t WHERE ctid IN (...) ORDER BY ctid;

are now planned a bit more efficiently. Execution was already
returning tuples in ascending ctid order; I just had to add support
for descending order.

Attached are a couple of patches:
- 01_tid_scan_ordering.patch
- 02_tid_range_scan.patch, to be applied on top of 01.

Can I add this to the next CommitFest?

Obviously the whole thing needs thorough review, and I expect there to
be numerous problems. (I had to make this prototype to demonstrate to
myself that it wasn't completely beyond me. I know from experience
how easy it is to enthusiastically volunteer something for an open
source project, discover that one does not have the time or skill
required, and be too embarrassed to show one's face again!)

As well as actual correctness, some aspects that I am particularly
unsure about include:

- Is it messy to use TidPath for both types of scan?
- What is the planning cost for plans that don't end up being a
TidScan or TidRangeScan?
- Have I put the various helper functions in the right files?
- Is there a less brittle way to create tables of a specific number
of blocks/tuples in the regression tests?
- Have a got the ScanDirection right during execution?
- Are my changes to heapam ok?

Cheers,
Edmund

Attachment	Content-Type	Size
01_tid_scan_ordering.patch	application/octet-stream	16.1 KB
02_tid_range_scan.patch	application/octet-stream	63.0 KB

Responses

Re: Tid scan improvements at 2018-08-12 08:07:50 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2018-08-12 05:35:33	Re: [HACKERS] proposal: schema variables
Previous Message	Tom Lane	2018-08-12 02:23:29	Re: Allowing printf("%m") only where it actually works