Re: Patch for SortSupport implementation on inet/cdir

From: Edmund Horner <ejrh00(at)gmail(dot)com>
To: Brandur Leach <brandur(at)mutelight(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for SortSupport implementation on inet/cdir
Date: 2019-02-09 07:12:53
Message-ID: CAMyN-kAOw-LY1ZKJ9-R4uYdX3d2=_2SiYntpKCHhrCe1Md1NLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 9 Feb 2019 at 04:48, Brandur Leach <brandur(at)mutelight(dot)org> wrote:
> I've attached a patch that implements SortSupport for the
> inet/cidr types. It has the effect of typically reducing
> the time taken to sort these types by ~50-60% (as measured
> by `SELECT COUNT(DISTINCT ...)` which will carry over to
> common operations like index creation, `ORDER BY`, and
> `DISTINCT`.

Hi Brandur,

I had a look at this. Your V2 patch applies cleanly, and the code was
straightforward and well commented. I appreciate the big comment at the
top of network_abbrev_convert explaining how you encode the data.

The tests pass. I ran a couple of large scale tests myself and didn't find
any problems. Sorting a million random inets in work_mem = 256MB goes from
roughty 3670ms to 1620ms with the SortSupport, which is pretty impressive.
(But that's in my debug build, so not a serious benchmark.)

An interesting thing about sorting IPv4 inets on 64-bit machines is that
when the inets are the same, the abbreviated comparitor will return 0 which
is taken by the sorting machinery to mean "the datums are the same up to
this point, so you need to call the full comparitor" -- but, in this case,
0 means "the datums truly are the same, no need to call the full
comparitor". Since the full comparitor assumes its arguments to be the
original (typically pass-by-reference) datums, you can't do it there.
You'd need to add another optional comparitor to be called after the
abbreviated one. In inet's case on a 64-bit machine, it would look at the
abbreviated datums and if they're both in the IPv4 family, would return 0
(knowing that the abbreviated comparitor has already done the real work).
I have no reason to believe this particular optimisation is worth anything
much, though; it's outside the scope of this patch, besides.

I have some comments on the comments:

network.c:552
* SortSupport conversion routine. Converts original inet/cidr
representations
* to abbreviated keys . The inet/cidr types are pass-by-reference, so is an
* optimization so that sorting routines don't have to pull full values from
* the heap to compare.

Looks like you have an extra space before the "." on line 553. And
abbreviated keys being an optimisation for pass-by-reference types can be
taken for granted, so I think the last sentence is redundant.

network.c::567
* IPv4 and IPv6 are identical in this makeup, with the difference being that
* IPv4 addresses have a maximum of 32 bits compared to IPv6's 64 bits, so in
* IPv6 each part may be larger.

IPv6's addresses are 128 bit. I'm not sure sure if "maximum" is accurate,
or whether you should just say "IPv4 addresses have 32 bits".

network.c::571
* inet/cdir types compare using these sorting rules. If inequality is
detected
* at any step, comparison is done. If any rule is a tie, the algorithm drops
* through to the next to break it:

When you say "comparison is done" it sounds like more comparing is going to
be done, but what I think you mean is that comparison is finished.

> [...]

> My benchmarking methodology and script is available here
> [1], and involves gathering statistics for 100
> `count(distinct ...)` queries at various data sizes. I've
> saved the results I got on my machine here [2].

I didn't see any links for [1], [2] and [3] in your email.

Finally, there's a duplicate CF entry:
https://commitfest.postgresql.org/22/1990/ .

Since you're updating https://commitfest.postgresql.org/22/1991/ , I
suggest you mark 1990 as Withdrawn to avoid confusion. If there's a way to
remove it from the CF list, that would be even better.

Edmund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-02-09 07:14:41 Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
Previous Message Amit Kapila 2019-02-09 05:07:13 Re: Transaction commits VS Transaction commits (with parallel) VS query mean time