Re: Need some help with a query (uniq -c)

From: A B <gentosaker(at)gmail(dot)com>
To: Kenichiro Tanaka <ketanaka(at)ashisuto(dot)co(dot)jp>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Need some help with a query (uniq -c)
Date: 2010-04-13 19:12:59
Message-ID: m2wdbbf25901004131212i7ada65ccy40f20791a42ec23f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you all who has replied. I will study your suggestions and see
what will work best in my case.

2010/4/13 Kenichiro Tanaka <ketanaka(at)ashisuto(dot)co(dot)jp>:
> Hello.
>
> I try with "With Query".
> http://www.postgresql.org/docs/8.4/static/queries-with.html
>
> #We can use "With Queries" >  v8.4
> #That'll only work if the time values are contiguous, but there's probably a
> #similar trick for non-contiguous ,too.
>
> --create data
> drop table foo;
> create table foo( time int,message text);
> insert into foo values(1,'a');
> insert into foo values(2,'b');
> insert into foo values(3,'b');
> insert into foo values(4,'c');
> insert into foo values(5,'a');
> insert into foo values(6,'c');
> insert into foo values(7,'c');
> insert into foo values(8,'a');
> insert into foo values(9,'a');
> insert into foo values(10,'a');
>
> --begin Answer
> with recursive r as (
> select foo.time,foo.message,1  as dummy from foo
> union all
> select foo.time,foo.message,r.dummy+1 from foo , r
> where foo.time=r.time-1 and foo.message=r.message
> )
> ,rr as (
> select foo.time,foo.message,'OLID' as flag  from foo
> union all
> select foo.time,foo.message,'DUP' as flag from foo , rr
> where foo.time-1=rr.time-2 and foo.message=rr.message
> )
> select time min,time+max(dummy)-1 max,message,max(dummy) counts
> from r where time not in (select distinct (time+1) times from rr
> where flag='DUP') group by time,message order by time;
>
> --result
> postgres(# where flag='DUP') group by time,message order by time;
>  min | max | message | counts
> -----+-----+---------+--------
>   1 |   1 | a       |      1
>   2 |   3 | b       |      2
>   4 |   4 | c       |      1
>   5 |   5 | a       |      1
>   6 |   7 | c       |      2
>   8 |  10 | a       |      3
> (6 rows)
>
> --end
>
> But I think some one can provide more simple SQL.
>
> Thank you.
>>
>> On Apr 12, 2010, at 11:31 AM, Scott Marlowe wrote:
>>
>>
>>>
>>> On Mon, Apr 12, 2010 at 12:22 PM, A B<gentosaker(at)gmail(dot)com>  wrote:
>>>
>>>>
>>>> Hello!
>>>>
>>>> I have a table (think of it as a table of log messages)
>>>>
>>>> time |  message
>>>> -----------------------
>>>> 1      |   a
>>>> 2      |   b
>>>> 3      |   b
>>>> 4      |  b
>>>> 5      |  a
>>>>
>>>> the three 'b' are the same message, so I would like to write a query
>>>> that would give me a result that is similar to what the unix command
>>>> "uniq -c" would give:
>>>>
>>>> first |  message | last | count
>>>> --------------------------------------
>>>> 1     |     a              |   1   |     1
>>>> 2     |     b              |   4   |     3<--- here it squeezes
>>>> similar consecutive messages into a single row
>>>> 5     |     a              |   5   |     1
>>>>
>>>> How do I write such a command?
>>>>
>>>
>>> Pretty straight ahead:
>>>
>>> select min(t), message, max(t), count(*) from table group by message.
>>>
>>
>> That was my first though too, but it combines everything not just adjacent
>> messages.
>>
>> Something like this, maybe
>>
>> select t1.message, t1.time as first, t2.time as last, t2.time-t1.time+1 as
>> count
>>         from foo as t1, foo as t2
>>     where t1.time<= t2.time and t1.message = t2.message
>>         and not exists
>>             (select * from foo as t3
>>              where (t3.time between t1.time and t2.time and t3.message<>
>>  t1.message)
>>              or (t3.time = t2.time + 1 and t3.message = t1.message)
>>              or (t3.time = t1.time - 1 and t3.message = t1.message));
>>
>>  message | first | last | count
>> ---------+-------+------+-------
>>  a       |     1 |    1 |     1
>>  b       |     2 |    4 |     3
>>  a       |     5 |    5 |     1
>>
>> That'll only work if the time values are contiguous, but there's probably
>> a
>> similar trick for non-contiguous.
>>
>> Cheers,
>>   Steve
>>
>>
>>
>
>
> --
> ================================================
> Kenichiro Tanaka
> K.K.Ashisuto
> http://www.ashisuto.co.jp/english/index.html
> ================================================
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2010-04-13 19:16:06 Re: When is an explicit cast necessary?
Previous Message Tom Lane 2010-04-13 18:37:25 Re: C-language functions: SRF question