Re: finding columns that have three or fewer distinct characters

From: Colin Wetherbee <cww(at)denterprises(dot)org>
To: Jeff Frost <jeff(at)frostconsultingllc(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: finding columns that have three or fewer distinct characters
Date: 2008-03-05 17:56:01
Message-ID: 47CEDEB1.3050902@denterprises.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Jeff Frost wrote:
> I've got an interesting one...I'm trying to find columns that have three
> or fewer distinct characters (for example, "aaaaaaaaaa"). Wondering if
> I need to write a function or if someone has an idea how to do it with
> built in functions and/or pattern matching?
>
> I think the thing to do would be to lowercase everything, then remove
> all duplicate chars and spaces, then use length() on that, but it's not
> obvious to me how I might remove the duplicate chars with the pattern
> matching support in the docs.

It's interesting, indeed.

Here's how you might do it with a PL/Perl function. :)

CREATE OR REPLACE FUNCTION remove_duplicates(TEXT) RETURNS TEXT AS
$$
my ($text) = @_;
while ($text =~ s/(.)(.*)\1/$1$2/g != 0) {};
return $text;
$$ LANGUAGE plperl;

cww=# SELECT
remove_duplicates('ffffoooooooobbbbaaaarrrr(dot)!(at)#$(dot)foobar-baz-qux');
remove_duplicates
-------------------
fobar(dot)!(at)#$-zqux
(1 row)

Colin

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Osvaldo Rosario Kussama 2008-03-05 18:46:59 Re: finding columns that have three or fewer distinct characters
Previous Message Steve Midgley 2008-03-05 17:38:20 Re: using copy from in function