Re: Progress on fast path sorting, btree index creation time

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress on fast path sorting, btree index creation time
Date: 2012-02-01 15:38:05
Message-ID: CAEYLb_Uw+GiNpt9zAC7_rEX-Uj_6kdaCYARPqg+ptKQfB96TvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31 January 2012 19:47, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jan 27, 2012 at 3:33 PM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
>> Patch is attached. I have not changed the duplicate functions. This is
>> because I concluded that it was the lesser of two evils to have to get
>> the template to generate both declarations in the header file, and
>> definitions in the .c file - that seemed particularly obscure. We're
>> never going to have to expose/duplicate any more comparators anyway.
>> Do you agree?
>
> Not really.  You don't really need macros to generate the prototypes;
> you could just write them out longhand.

Well, since that would have the advantage of forcing the client of
those macros to explicitly acknowledge what specialisations they were
getting in order to get a warning-free build, that might be a better
approach (though uncalled static functions are discarded anyway).
However, I don't really see how you expect me to make the
comparator-proper functions within nbtcompare.c inline without some
duplication. How can I have inline functions in that .c file, while
still having those functions callable from other TUs, while avoiding
duplication/moving the functions? I suppose that I could do some trick
with macros, but once again I believe that the cure is worse than the
disease - it it really such a big deal to duplicate a handful of
trivial functions, given than we're never going to have to expand that
number?

> I think there's a mess of naming confusion in here, though, as perhaps
> best illlustrated by this macro definition:
>
> #define TEMPLATE_QSORT_ARG_HEAP(TYPE, COMPAR)                               \
> DO_TEMPLATE_QSORT_ARG(TYPE, COMPAR, inlheap,                                \
>        SING_ADDITIONAL_CODE, TYPE##inlheapcomparetup_inline)               \
> DO_TEMPLATE_QSORT_ARG(TYPE, COMPAR, regheap,                                \
>        MULT_ADDITIONAL_CODE(TYPE##regheapAppFunc),                         \
>            TYPE##regheapcomparetup_inline)
>
> The idea here is that when we have only a single sort key, we include
> SING_ADDITIONAL_CODE in the tuple comparison function, whereas when we
> have more than one, we instead include MULT_ADDITIONAL_CODE.  Right
> there, I think we have a naming problem, because abbreviating "single"
> to "sing" and multiple to "mult" is less than entirely clear.  For a
> minute or two I was trying to figure out whether our sorting code was
> musically inclined, and I'm a native english speaker.  But then we
> switch to another set of terminology completely for the generated
> functions: inlheap for the single-key case, and regheap for the
> multiple-key case.   I find that even more confusing.

I'm happy to clean that up, though I think that when you try and
browbeat C into the role of supporting generic programming, confusing
code is more or less inevitable.

> I think we ought to get rid of this:
>
> +typedef enum TypeCompar
> +{
> +       TYPE_COMP_OTHER,
> +       TYPE_COMP_INT4,
> +       TYPE_COMP_INT8,
> +       TYPE_COMP_FLOAT4,
> +       TYPE_COMP_FLOAT8,
> +       TYPE_COMP_FULL_SPECIALISATION
> +} TypeCompar;
>
> Instead, just modify SortSupportData to have two function pointers as
> members, one for the single-key case and another for the multiple-key
> case, and have the sortsupport functions initialize them to the actual
> functions that should be called.  The layer of indirection, AFAICS,
> serves no purpose.

It anticipates a time when the system will have both btree and heap
variants of the specialisations. It also serves to usefully document
the intent of the optimisation - the most direct interface isn't
necessarily the best. That said, I really am looking for the path of
least resistance towards getting this committed at this stage, so if
you still think it's a bad idea, I won't complain if you modify the
patch to have multiple function pointers as you described.

>> It's pretty easy to remove a specialisation at any time - just remove
>> less than 10 lines of code. It's also pretty difficult to determine,
>> with everyone's absolute confidence, where the right balance lies.
>> Perhaps the sensible thing to do is to not be so conservative in what
>> we initially commit, while clearly acknowledging that we may not have
>> the balance right, and that it may have to change. We then have the
>> entire beta part of the cycle in which to decide to roll back from
>> that position, without any plausible downside. If, on the other hand,
>> we conservatively lean towards fewer specialisations in the initial
>> commit, no one will complain about the lack of an improvement in
>> performance that they never had.
>
> Eh, really?  Typically when we do something good, the wolves are
> howling at the door to make it work in more cases.

Well, if anyone was complaining about a regression, because the
optimisation represented a net loss for their reasonable use-case,
then clearly we wouldn't be doing well, so we'd have to reconsider our
position, and no sensible wolf is going to argue with that. Obviously
it's not possible to prove that there couldn't be some kind of
regression under some set of circumstances in a water-tight fashion.
However, I believe that I have argued well that on balance, it's worth
maintaining the full set of specialisations. This is one case where
committing code and working out if there are problems makes sense,
because: 1) There very probably aren't (you yourself were not
surprised that a regression couldn't be demonstrated), and 2) if there
are, we can withdraw from having so many specialisations in small
increments, rather than having to throw everything out as might have
been the case in the past with other proposed performance patches.

>> I think that possibly the one remaining blocker to tentatively
>> committing this with all specialisations intact is that I haven't
>> tested this on Windows, as I don't currently have access to a Windows
>> development environment. I have set one up before, but it's a huge
>> pain. Can anyone help me out?
>
> This doesn't strike me as terribly OS-dependent, unless by that we
> mean compiler-dependent.

What concerns me is how the pg_always_inline macro behaves when
building with MSVC. While I have no reason to doubt that the effect
should be similar there, I would like to have that verified. Building
with MSVC is simple if you're only building release source packages,
but when you bring Flex + Bison into the equation, it becomes rather
difficult, which is why I asked for help there. You are generally
right to say that it is not terribly OS-dependent though, since, as
I've already shown, very portable software like the GNU C library uses
tricks like pg_always_inline extensively, both in code intended for a
particular arch, and code intended for all. Even where
pg_always_inline doesn't have an underlying compiler-specific way of
compelling inlining (it just falls back on inline, which is probably
now a no-op on this obscure platform anyway), that's just improving
performance. The usefulness of this optimisation does not hinge on
pg_always_inline in any way.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2012-02-01 15:49:30 Re: JSON for PG 9.2
Previous Message postgres 2012-02-01 14:28:30 BUG #6425: Bus error in slot_deform_tuple