Re: Distinct oddity

From: Glenn Maynard <glennfmaynard(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Distinct oddity
Date: 2009-05-13 18:48:44
Message-ID: bd36f99e0905131148k5255d63ai69c76072efa078c5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

For purposes of DISTINCT, I'd expect any sort order should do; all it
needs is for equal values to be grouped together. If strcoll() ever
fails to do that, I'd call it a critical bug--even throwing total
garbage at it should result in a consistent ordering, even if the
ordering itself is totally meaningless. Many sort functions depend on
this.

On Wed, May 13, 2009 at 8:37 AM, Maximilian Tyrtania
<maximilian(dot)tyrtania(at)onlinehome(dot)de> wrote:
> Opened those files (with textwrangler, as I 've never used sed), stripped
> off the '$', sorted and looked at the differences (using textwranglers
> "compare documents"-feature).

Can you narrow down what triggers this? Try copying off the table,
and running this:

******
CREATE FUNCTION is_inconsistent() RETURNS BOOLEAN LANGUAGE SQL AS $$
select (select count(distinct(f.bezeichnung)) from firmen_copy f) <>
(select count(distinct(f.bezeichnung||'1')) from firmen_copy f)
$$;

-- If deleting [first,last] leaves the results inconsistent, return
true; otherwise
-- roll back the deletion and return false.
CREATE FUNCTION test_delete_range(first BIGINT, last BIGINT)
RETURNS BOOLEAN LANGUAGE plpgsql AS $$
BEGIN
DELETE FROM firmen_copy WHERE id BETWEEN first AND last;
IF is_inconsistent() THEN RETURN true; END IF;
SELECT 1/0;
EXCEPTION
WHEN division_by_zero THEN
RETURN false;
END;
$$;

CREATE FUNCTION test_func()
RETURNS INTEGER LANGUAGE plpgsql AS $$
DECLARE
total bigint;
BEGIN
IF NOT is_inconsistent() THEN RETURN -1; END IF;

LOOP
total := (SELECT MAX(id) FROM firmen_copy);

IF test_delete_range(0, total/2) THEN CONTINUE; END IF;
IF test_delete_range(total*1/4, total*3/4) THEN
CONTINUE; END IF;
IF test_delete_range(total/2, total) THEN CONTINUE; END IF;

RETURN 0;
END LOOP;
END;
$$;

SELECT test_func();
******

This assumes you have a primary key named "id", and that your IDs
start around 0 and are vaguely monotonic (renumber them in the copy if
necessary). I can't easily test this code, of course, but it's a
simple binary search. Depending on what's triggering this, it may or
may not be able to narrow in on a test case.

Tangentally, is there a better way of rolling back a function than a
dumb hack like "SELECT 1/0"?

--
Glenn Maynard

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2009-05-13 19:03:33 Re: Distinct oddity
Previous Message Tom Lane 2009-05-13 15:47:56 Re: Distinct oddity