Re: Tid scan improvements

From: Edmund Horner <ejrh00(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Tid scan improvements
Date: 2018-09-28 05:02:04
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Wed, 19 Sep 2018 at 18:56, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 19 September 2018 at 18:04, Edmund Horner <ejrh00(at)gmail(dot)com> wrote:
> > I have been generally following this approach (handling more kinds of
> > TID comparisons), and have found myself doing things like pairing up >
> > with <, estimating how much of a table is covered by some set of >, <,
> > or "> AND <" quals, etc. Things that I'm sure are handled in an
> > advanced way by index paths; unfortunately I didn't see any easily
> > reusable code in the index path code. So I've ended up writing
> > special-case code for TID scans. Hopefully it will be worth it.
> I don't think it would need to be as complex as the index matching
> code. Just looping over the quals and gathering up all compatible ctid
> quals should be fine. I imagine the complex handling of sorting the
> quals by ctid and removal of redundant quals that are covered by some
> range would be done in the executor.

I've got the path creation and execution pretty much working, though
with some inefficiencies:
- Each individual TID is treated as a range of size 1 (but CURRENT
OF is handled as a single fetch)
- Range scans have to scan whole blocks, and skip over the tuples
that are out of range.
But it's enough to get the tests passing.

Right now I'm looking at costing:

> Probably the costing will get more complex. At the moment it seems we
> add a random_page_cost per ctid, but you'd probably need to make that
> better and loop over the quals in each implicitly ANDed set and find
> the max ctid for the > / >= quals and the the min < / <= ctid, then
> get the page number from each and assume max - min seq_page_cost, then
> add random_page_cost for any remaining equality quals. The costs from
> other OR branches can likely just be added on. This would double
> count if someone did WHERE ctid BETWEEN '(0,0') AND '(100,300)' OR
> ctid BETWEEN '(0,0') AND '(100,300)'; The current code seems to
> double count now for duplicate ctids anyway. It even double counts if
> the ctid being compared to is on the same page as another ctid, so I
> don't think that would be unacceptable.

There are two stages of costing:
1. Estimating the number of rows that the relation will return. This
happens before path generation.
2. Estimating the cost of the path.

In the existing code, (1) goes through the normal clausesel.c
machinery, eventually getting to the restriction function defined in
pg_operator. For range quals, e.g. >, it looks for a stats entry for
the variable, but since it's a system variable with no stats, it
returns DEFAULT_INEQ_SEL (in function scalarineqsel). For equality
quals, it does have some special-case code (in function
get_variable_numdistinct) to use stadistinct=-1 for the CTID variable,
resulting in a selectivity estimate of 1/ntuples.

(2), on the other hand, has special-case code in costsize.c (function
cost_tidscan), which estimates each TID as being a separate tuple
fetch from a different page. (The existing code only has to support
=, IN, and CURRENT OF as quals for a TID path.)

In my work, I have been adding support for range quals to (2), which
includes estimating the selectivity of expressions like (CTID > a AND
CTID < b). I got tired of handling all the various ways of ordering
the quals, so I thought I would try re-using the clausesel.c
machinery. In selfuncs.c, I've added special case code for
scalarineqsel and nulltestsel to handle CTID variables. (This also
improves the row count estimates.)

I'm not 100% sure what the costs of each range should be. I think the
first block should incur random_page_cost, with subsequent blocks
being seq_page_cost. Simple "CTID = ?" quals are still estimated as 1
tuple + 1 random block.

Have a look at the attached WIP if you like and tell me if you think
it's going in the right direction. I'm sorry for the size of the
patch; I couldn't find a nice way to cut it up. I did run pgindent
over it though. :)


Attachment Content-Type Size
tid_scan_improvements-v1.patch application/octet-stream 75.1 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-09-28 05:02:49 Re: pgbench's expression parsing & negative numbers
Previous Message Masahiko Sawada 2018-09-28 04:53:14 Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru