Quick Links

Re: finding columns that have three or fewer distinct characters

From:	Colin Wetherbee <cww(at)denterprises(dot)org>
To:	Jeff Frost <jeff(at)frostconsultingllc(dot)com>
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: finding columns that have three or fewer distinct characters
Date:	2008-03-05 17:56:01
Message-ID:	47CEDEB1.3050902@denterprises.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

Jeff Frost wrote:
> I've got an interesting one...I'm trying to find columns that have three
> or fewer distinct characters (for example, "aaaaaaaaaa"). Wondering if
> I need to write a function or if someone has an idea how to do it with
> built in functions and/or pattern matching?
>
> I think the thing to do would be to lowercase everything, then remove
> all duplicate chars and spaces, then use length() on that, but it's not
> obvious to me how I might remove the duplicate chars with the pattern
> matching support in the docs.

It's interesting, indeed.

Here's how you might do it with a PL/Perl function. :)

CREATE OR REPLACE FUNCTION remove_duplicates(TEXT) RETURNS TEXT AS
$$
my ($text) = @_;
while ($text =~ s/(.)(.*)\1/$1$2/g != 0) {};
return $text;
$$ LANGUAGE plperl;

cww=# SELECT
remove_duplicates('ffffoooooooobbbbaaaarrrr(dot)!(at)#$(dot)foobar-baz-qux');
remove_duplicates
-------------------
fobar(dot)!(at)#$-zqux
(1 row)

Colin

In response to

finding columns that have three or fewer distinct characters at 2008-03-05 17:33:53 from Jeff Frost

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Osvaldo Rosario Kussama	2008-03-05 18:46:59	Re: finding columns that have three or fewer distinct characters
Previous Message	Steve Midgley	2008-03-05 17:38:20	Re: using copy from in function