Re: Tid scan improvements

From: Edmund Horner <ejrh00(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Tid scan improvements
Date: 2018-08-14 23:11:54
Message-ID: CAMyN-kCknpsgHt8eP=jeYMGMU=suRAoZ38CoRJB2fxKTKa52gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12 August 2018 at 20:07, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> Since range scan execution is rather different from the existing
>> TidScan execution, I ended up making a new plan type, TidRangeScan.
>> There is still only one TidPath, but it has an additional member that
>> describes which method to use.
>
> I always thought that this would be implemented by overloading
> TidScan. I thought that TidListEval() could be modified to remove
> duplicates accounting for range scans. For example:
>
> SELECT * FROM t WHERE ctid BETWEEN '(0,1)' AND (0,10') OR ctid
> IN('(0,5)','(0,30)');
>
> would first sort all the tids along with their operator and then make
> a pass over the sorted array to remove any equality ctids that are
> redundant because they're covered in a range.

Initially, I figured that 99% of the time, the user either wants to
filter by a specific set of ctids (such as those returned by a
subquery), or wants to process a table in blocks. Picking up an
OR-set of ctid-conditions and determining which parts should be picked
up row-wise (as in the existing code) versus which parts are blocks
that should be scanned -- and ensuring that any overlaps were removed
-- seemed more complicated than it was worth.

Having thought about it, I think what you propose might be worth it;
at least it limits us to a single TidScan plan to maintain.

The existing code:
- Looks for a qual that's an OR-list of (ctid = ?) or (ctid IN (?))
- Costs it by assuming each matching tuple is a separate page.
- When beginning the scan, evaluates all the ?s and builds an array
of tids to fetch.
- Sorts and remove duplicates.
- Iterates over the array, fetching tuples.

So we'd extend that to:
- Include in the OR-list "range" subquals of the form (ctid > ? AND
ctid < ?) (either side could be optional, and we have to deal with >=
and <= and having ctid on the rhs, etc.).
- Cost the range subquals by assuming they don't overlap, and
estimating how many blocks and tuples they span.
- When beginning the scan, evaluate all the ?s and build an array of
"tid ranges" to fetch. A tid range is a struct with a starting tid,
and an ending tid, and might just be a single tid item.
- Sort and remove duplicates.
- Iterate over the array, using a single fetch for single-item tid
ranges, and starting/ending a heap scan for multi-item tid ranges.

I think I'll try implementing this.

>> As part of the work I also taught TidScan that its results are ordered
>> by ctid, i.e. to set a pathkey on a TidPath. The benefit of this is
>> that queries such as
>>
>> SELECT MAX(ctid) FROM t;
>> SELECT * FROM t WHERE ctid IN (...) ORDER BY ctid;
>
> I think that can be done as I see you're passing allow_sync as false
> in heap_beginscan_strat(), so the scan will start at the beginning of
> the heap.

I found that heap scan caters to parallel scans, synchronised scans,
and block range indexing; but it didn't quite work for my case of
specifying a subset of a table and scanning backward or forward over
it. Hence my changes. I'm not overly familiar with the heap scan
code though.

>> - Is there a less brittle way to create tables of a specific number
>> of blocks/tuples in the regression tests?
>
> Perhaps you could just populate a table with some number of records
> then DELETE the ones above ctid (x,100) on each page, where 100 is
> whatever you can be certain will fit on a page on any platform. I'm
> not quite sure if our regress test would pass with a very small block
> size anyway, but probably worth verifying that before you write the
> first test that will break it.

I don't think I've tested with extreme block sizes.

> I'll try to look in a bit more detail during the commitfest.
>
> It's perhaps a minor detail at this stage, but generally, we don't
> have code lines over 80 chars in length. There are some exceptions,
> e.g not breaking error message strings so that they're easily
> greppable. src/tools/pgindent has a tool that you can run to fix the
> whitespace so it's in line with project standard.

I'll try to get pgindent running before my next patch.

Thanks for the comments!

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2018-08-14 23:17:20 Re: Facility for detecting insecure object naming
Previous Message Asim R P 2018-08-14 23:08:54 Re: Postgres, fsync, and OSs (specifically linux)