Re: A worst case for qsort

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A worst case for qsort
Date: 2014-08-07 15:07:23
Message-ID: CA+TgmoaFYNL0N3QJK=Tj9nbe3Usg=jYAvrziJZy298R=tC7gFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> "The adversarial method works for almost any polymorphic program
> recognizable as quicksort. The subject quicksort may copy values at
> will, or work with lists rather than arrays. It may even pick the
> pivot at random. The quicksort will be vulnerable provided only that
> it satisfies some mild assumptions that are met by every
> implementation I have seen".
>
> IMHO, while worst case performance is a very useful tool for analyzing
> algorithms (particularly their worst case time complexity), a worst
> case should be put in its practical context. For example, if we had
> reason to be concerned about *adversarial* inputs, I think that there
> is a good chance that our qsort() actually would be problematic to the
> point of driving us to prefer some generally slower alternative.

I completely agree with this, and I think everyone else does, too.
Where we don't all necessarily agree is which worst cases are
realistic and which worst cases are simply pathological cases with
which we need not be concerned in practice. For example, when I
proposed the patch to use MVCC catalog snapshots, Andres invented a
test case which I thought was far more brutal than anything likely to
be encountered in the real world. But Andres didn't agree; he thought
the test case was realistic. So, I worked on the patch some more and
came up with a solution that performs acceptably even if those brutal
conditions are encountered in the world. As a result, the patch as
committed is better than the one originally submitted. I could
certainly have spent more time debating whether that effort was
worthwhile, and maybe I would have won the argument, but it was a
better of use of that time to instead try to improve the patch.

So here. You may not agree that the mitigation strategies for which
others are asking for are worthwhile, but you can't expect everyone
else to agree with your assessment of which cases are likely to occur
in practice. The case of a cohort of strings to be sorted which share
a long fixed prefix and have different stuff at the end does not seem
particularly pathological to me. It doesn't, in other words, require
an adversary: some real-world data sets will look like that. I will
forebear repeating examples I've given before, but I've seen that kind
of thing more than once in real data sets that people (well, me,
anyway) actually wanted to put into a PostgreSQL database. So I'm
personally of the opinion that the time you've put into trying to
protect against those cases is worthwhile. I understand that you may
disagree with that, and that's fine: we're not all going to agree on
everything.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2014-08-07 15:25:28 Re: Enhancing pgbench parameter checking
Previous Message Robert Haas 2014-08-07 14:58:01 Re: Minmax indexes