Re: A worst case for qsort

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A worst case for qsort
Date: 2014-08-07 17:16:15
Message-ID: CAM3SWZRkSUYe9bpY9ZVxCH7ogEwHfR6OCH2E1KEMjFemSciQQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> So here. You may not agree that the mitigation strategies for which
> others are asking for are worthwhile, but you can't expect everyone
> else to agree with your assessment of which cases are likely to occur
> in practice. The case of a cohort of strings to be sorted which share
> a long fixed prefix and have different stuff at the end does not seem
> particularly pathological to me. It doesn't, in other words, require
> an adversary: some real-world data sets will look like that. I will
> forebear repeating examples I've given before, but I've seen that kind
> of thing more than once in real data sets that people (well, me,
> anyway) actually wanted to put into a PostgreSQL database. So I'm
> personally of the opinion that the time you've put into trying to
> protect against those cases is worthwhile. I understand that you may
> disagree with that, and that's fine: we're not all going to agree on
> everything.

I actually agree with you here. Sorting text is important, so we
should spend a lot of time avoiding regressions. Your example is
reasonable - I believe that people do that to a non-trivial degree.
The fact is, I probably can greatly ameliorate that case. However, to
give an example of a case I have less sympathy for, I'll name the case
you mention, *plus* the strings are already in logical order, and so
our qsort() can (dubiously) take advantage of that, so that there is a
1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls.
There might be other cases like that that crop up.

I also thought the paper was pretty cool, and thought it might be
interesting because of the fact that is was authored by McIlroy, and
he refers to weaknesses in his and our very own implementation
specifically.
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gabriele Bartolini 2014-08-07 17:29:53 Re: Proposal: Incremental Backup
Previous Message John Cochran 2014-08-07 17:00:19 Re: A worst case for qsort