Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-03-15 17:35:32
Message-ID: CAAaqYe9BsrW=DRBOd9yW0s2djofXTM9mRpO=LhHrCu4qdGgrVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 14, 2020 at 10:55 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> On Fri, Mar 13, 2020 at 1:06 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> >
> > On Thu, Mar 12, 2020 at 5:53 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> > >
> > > I gave this a very quick look; I don't claim to understand it or
> > > anything, but I thought these trivial cleanups worthwhile. The only
> > > non-cosmetic thing is changing order of arguments to the SOn_printf()
> > > calls in 0008; I think they are contrary to what the comment says.
> >
> > Yes, I think you're correct (re: 0008).
> >
> > They all look generally good to me, and are included in the attached
> > patch series.
>
> I just realized something about this (unsure if in Alvaro's or in my
> applying that) broke make check pretty decently (3 test files broken,
> also much slower, and the incremental sort test returns a lot of
> obviously broken results).
>
> I'll take a look tomorrow and hopefully get a fix (probably will reply
> to the more recent subthread's though).

This took a bit of manually just excluding changes until I got a
red/green set because nothing in the patch set looked all incorrect.
But it turns out this change breaks things:

- if (tuplesort_gettupleslot(read_sortstate, ScanDirectionIsForward(dir),
- false, slot,
NULL) || node->finished)
+ if (node->finished ||
+ tuplesort_gettupleslot(read_sortstate, ScanDirectionIsForward(dir),
+ false, slot, NULL))

I believe what's happening here is that we need the
tuplesort_gettupleslot to set the slot to a NULL tuple (if there
aren't any left) before we return the slot, but that returns false, so
the node->finished check is to ensure that the first time that method
nulls out the slot and returns false we still return the value in
slot.

Since this isn't obvious, I'll add a comment. I think I'm also going
to rename node->finished to be more clear.

In this debugging I also noticed that we don't set node->finished back
to false in rescan, which I assume is a bug, but I don't really
understand a whole lot about rescan. IIRC from some previous
discussions rescan exists for things like cursors, where you can move
back and forth over the result set. Assuming that's the case, do we
need explicit tests for cursors using incremental sort? Is there a
good strategy for how much to do there (since I don't want to
duplicate every non-cursor functional test).

Thanks,
James

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message legrand legrand 2020-03-15 17:35:55 pg_stat_statements: rows not updated for CREATE TABLE AS SELECT statements
Previous Message Fabien COELHO 2020-03-15 17:15:02 Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)