Re: grouping subsets

From: "Oliveiros d'Azevedo Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt>
To: "Rainer Stengele" <rainer(dot)stengele(at)diplan(dot)de>
Cc: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: grouping subsets
Date: 2010-07-30 15:07:49
Message-ID: 5BE4369386904E178DD4B537C759B2B0@marktestcr.marktest.pt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I See.

And the analysis you need to do, the sum of the rows with the same keys
(until they change) will have to be done over all
table?
Or just over some predefined interval ?

Best,
Oliveiros
----- Original Message -----
From: "Rainer Stengele" <rainer(dot)stengele(at)diplan(dot)de>
Newsgroups: gmane.comp.db.postgresql.sql
To: "Oliveiros d'Azevedo Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt>
Cc: <pgsql-sql(at)postgresql(dot)org>
Sent: Friday, July 30, 2010 10:35 AM
Subject: Re: grouping subsets

> the table may include up to maybe 30 entries per day, average maybe 10-15
> After a year this makes about 10.000 entries - maximum, average about 5000
> entries.
>
> For the problem described I have to use a Microsoft SQL database and would
> like to use pure SQL.
> As I use postgres on my Linux servers I found this newsgroup and thought I
> ask here.
>
> Thanks!
> Rainer
>
> Am 29.07.2010 15:31, schrieb Oliveiros d'Azevedo Cristina:
>> Fine.
>>
>> Please advice me,
>>
>> How long can
>> your table be? Thousands? Millions of records?
>>
>> Do you really need it in pure SQL
>> ?
>>
>> It seems to me that it might be possible, I'm just affraid that the query
>> would become too complex and thus slow...
>>
>> Best,
>> Oliveiros
>>
>> ----- Original Message ----- From: "Rainer Stengele"
>> <rainer(dot)stengele(at)diplan(dot)de>
>> To: "Oliveiros d'Azevedo Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt>
>> Cc: <pgsql-sql(at)postgresql(dot)org>
>> Sent: Thursday, July 29, 2010 1:10 PM
>> Subject: Re: grouping subsets
>>
>>
>>> No. This is by accident.
>>> We have to assume that the combinations do change anytime, and many
>>> times per day.
>>>
>>> So
>>>
>>> "Or is it possible to have the same combination on one day with several
>>> sets?"
>>>
>>> YES!
>>>
>>> Rainer
>>>
>>>
>>>
>>> Am 29.07.2010 13:47, schrieb Oliveiros d'Azevedo Cristina:
>>>> Yes. This is somewhat more complicated because it has more constraints.
>>>> I've noticed that a given combination doesn't appear with holes on a
>>>> certain day.
>>>>
>>>> For ex, on a daily basis, we have every three key combinations
>>>> together.
>>>>
>>>> We dont have things like
>>>> 2010-7-01 1726 3212 1428
>>>> 2010-7-01 1726 3212 1428
>>>> ... 318 1846 1012
>>>> 2010-7-01 1726 3212 1428
>>>>
>>>> Can I assume that, for a certain day , the records for the same three
>>>> combination are all together? There is just one set per day for a given
>>>> combination?
>>>>
>>>> Or is it possible to have the same combination on one day with several
>>>> sets?
>>>>
>>>> Best,
>>>> Oliveiros
>>>>
>>>>
>>>> ----- Original Message ----- From: "Rainer Stengele"
>>>> <rainer(dot)stengele(at)diplan(dot)de>
>>>> Newsgroups: gmane.comp.db.postgresql.sql
>>>> To: "Oliveiros d'Azevedo Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt>
>>>> Cc: <>
>>>> Sent: Thursday, July 29, 2010 10:41 AM
>>>> Subject: Re: grouping subsets
>>>>
>>>>
>>>>> Howdy Cristina,
>>>>>
>>>>> unfortunately things are more complicated. I have inserted an excerpt
>>>>> of the real data here:
>>>>>
>>>>> ================================================================================
>>>>> TableID MasterID dtBegin dtEnd idR
>>>>> idL idB consumption
>>>>> 4057312 295530 2010-07-01 00:59:21.077 2010-07-01
>>>>> 01:32:59.670 1726 3212 1428 279
>>>>> 4061043 295574 2010-07-01 01:59:31.137 2010-07-01
>>>>> 02:32:09.373 1726 3212 1428 183
>>>>> 4083397 295838 2010-07-01 07:57:51.327 2010-07-01
>>>>> 08:28:28.117 318 1846 1012 30
>>>>> 4090858 295920 2010-07-01 09:52:33.777 2010-07-01
>>>>> 10:31:34.393 318 1846 1012 487
>>>>> 4094589 295961 2010-07-01 10:47:59.370 2010-07-01
>>>>> 11:32:20.903 318 1846 1012 472
>>>>> 4098330 296013 2010-07-01 11:58:53.890 2010-07-01
>>>>> 12:31:35.730 318 1846 1012 195
>>>>> 4102069 296058 2010-07-01 12:36:19.170 2010-07-01
>>>>> 13:32:13.950 318 1846 1012 338
>>>>> 4105809 296102 2010-07-01 13:58:53.170 2010-07-01
>>>>> 14:02:57.710 318 1846 1012 105
>>>>> 4109555 296150 2010-07-01 14:59:11.663 2010-07-01
>>>>> 15:32:33.810 318 1846 1012 187
>>>>> 4113305 296194 2010-07-01 15:59:01.797 2010-07-01
>>>>> 16:02:27.260 318 1846 1012 108
>>>>> 4117048 296238 2010-07-01 16:20:47.997 2010-07-01
>>>>> 17:32:49.367 318 1846 1012 179
>>>>> 4120791 296282 2010-07-01 17:58:27.657 2010-07-01
>>>>> 18:29:01.733 318 1846 1012 256
>>>>> 4128291 296370 2010-07-01 19:54:17.687 2010-07-01
>>>>> 20:32:53.850 318 1846 1012 239
>>>>> 4132044 296413 2010-07-01 20:31:37.653 2010-07-01
>>>>> 21:29:13.497 318 1846 1012 39
>>>>> 4135797 296458 2010-07-01 21:59:13.983 2010-07-01
>>>>> 22:32:46.503 318 1846 1012 157
>>>>> 4139572 296506 2010-07-01 22:58:49.530 2010-07-01
>>>>> 23:32:22.543 318 1846 1012 218
>>>>> 4142941 296554 2010-07-01 23:59:13.857 2010-07-02
>>>>> 00:32:30.390 318 1846 1012 248
>>>>> 4146289 296598 2010-07-02 00:58:55.763 2010-07-02
>>>>> 01:32:41.983 318 1846 1012 204
>>>>> 4149616 296642 2010-07-02 01:46:57.357 2010-07-02
>>>>> 02:32:56.983 318 1846 1012 42
>>>>> 4152952 296686 2010-07-02 02:55:19.653 2010-07-02
>>>>> 03:32:28.013 318 1846 1012 135
>>>>> 4156289 296730 2010-07-02 03:43:52.777 2010-07-02
>>>>> 04:32:55.250 318 1846 1012 743
>>>>> 4159624 296774 2010-07-02 04:43:15.310 2010-07-02
>>>>> 05:32:44.547 318 1846 1012 277
>>>>> 4162961 296817 2010-07-02 05:58:59.483 2010-07-02
>>>>> 06:32:37.340 318 1846 1012 121
>>>>> 4166303 296862 2010-07-02 06:58:50.733 2010-07-02
>>>>> 07:32:39.113 318 1846 1012 239
>>>>> 4172981 296950 2010-07-02 07:28:55.293 2010-07-02
>>>>> 09:33:01.200 318 1846 1012 512
>>>>> 4176322 296993 2010-07-02 09:59:04.607 2010-07-02
>>>>> 10:33:01.903 318 1846 1012 139
>>>>> 4179667 297038 2010-07-02 10:55:27.760 2010-07-02
>>>>> 11:32:56.560 318 1846 1012 722
>>>>> 4183012 297082 2010-07-02 11:59:33.650 2010-07-02
>>>>> 12:32:14.700 318 1846 1012 163
>>>>> 4186351 297126 2010-07-02 12:23:45.997 2010-07-02
>>>>> 13:32:59.500 318 1846 1012 284
>>>>> 4189689 297169 2010-07-02 13:44:21.253 2010-07-02
>>>>> 14:18:05.080 318 1846 1012 254
>>>>> 4196371 297258 2010-07-02 16:16:19.123 2010-07-02
>>>>> 16:32:53.437 1706 3541 1511 161
>>>>> 4199720 297301 2010-07-02 16:59:35.127 2010-07-02
>>>>> 17:32:57.950 1706 3541 1511 250
>>>>> 4203068 297346 2010-07-02 17:59:34.027 2010-07-02
>>>>> 18:32:54.337 1706 3541 1511 302
>>>>> 4206413 297389 2010-07-02 18:59:28.730 2010-07-02
>>>>> 19:32:37.950 1706 3541 1511 276
>>>>> 4209758 297434 2010-07-02 19:54:00.243 2010-07-02
>>>>> 20:32:57.433 1706 3541 1511 209
>>>>> 4213102 297473 2010-07-02 20:49:10.963 2010-07-02
>>>>> 21:30:44.540 1706 3541 1511 76
>>>>> 4216447 297511 2010-07-02 21:59:34.810 2010-07-02
>>>>> 22:33:00.603 1706 3541 1511 287
>>>>> 4219818 297569 2010-07-02 22:56:52.750 2010-07-02
>>>>> 23:59:31.607 1706 3541 1511 1877
>>>>> 4219819 297570 2010-07-02 23:59:21.577 2010-07-03
>>>>> 00:54:40.153 1706 3541 1511 1798
>>>>> 4219821 297572 2010-07-03 00:48:03.310 2010-07-03
>>>>> 01:59:37.920 1706 3541 1511 1125
>>>>> 4219823 297574 2010-07-03 01:51:01.057 2010-07-03
>>>>> 02:59:45.433 1706 3541 1511 1629
>>>>> 4219820 297571 2010-07-03 02:59:29.393 2010-07-03
>>>>> 03:59:54.920 1706 3541 1511 2462
>>>>> 4219822 297573 2010-07-03 03:59:18.663 2010-07-03
>>>>> 04:01:48.810 1706 3541 1511 70
>>>>> 4225738 297656 2010-07-03 06:13:34.980 2010-07-03
>>>>> 06:28:09.697 1726 3212 1428 46
>>>>> 4228694 297695 2010-07-03 06:59:15.560 2010-07-03
>>>>> 07:32:45.653 1726 3212 1428 251
>>>>> 4231649 297733 2010-07-03 07:59:11.937 2010-07-03
>>>>> 08:32:57.217 1726 3212 1428 284
>>>>> 4234604 297771 2010-07-03 08:57:00.357 2010-07-03
>>>>> 09:32:47.903 1726 3212 1428 227
>>>>> 4237559 297809 2010-07-03 09:59:19.813 2010-07-03
>>>>> 10:33:02.063 1726 3212 1428 285
>>>>> 4261156 298596 2010-07-04 22:59:09.863 2010-07-04
>>>>> 23:33:45.530 1726 3212 1428 1286
>>>>> 4264114 298646 2010-07-04 23:59:16.967 2010-07-05
>>>>> 00:33:08.107 1726 3212 1428 297
>>>>> 4267067 298690 2010-07-05 00:59:15.187 2010-07-05
>>>>> 01:32:48.300 1726 3212 1428 333
>>>>> 4270023 298734 2010-07-05 01:59:02.497 2010-07-05
>>>>> 02:32:48.780 1726 3212 1428 270
>>>>> 4272977 298778 2010-07-05 02:41:43.737 2010-07-05
>>>>> 03:32:56.043 1726 3212 1428 317
>>>>> 4275927 298822 2010-07-05 03:59:17.027 2010-07-05
>>>>> 04:33:14.947 1726 3212 1428 1623
>>>>> ================================================================================
>>>>>
>>>>> Description:
>>>>> 1. Column: some ID
>>>>> 2. Column: reference to another table
>>>>> 3. and 4. column: timestamp from/to of the item
>>>>> 5. Column: ID R
>>>>> 6. Column: ID L
>>>>> 7. Column: ID B
>>>>> 8. Column: Sum of components
>>>>>
>>>>> Requirement:
>>>>> Sum over all components (from column 8) for each combination of ID R,
>>>>> ID L, ID B, but (!)
>>>>> rows with same keys (R,L,B) should be summed up only until the keys
>>>>> change.
>>>>> Do not sum up the components for identical keys, if there are other
>>>>> keys between them.
>>>>>
>>>>> Example result:
>>>>>
>>>>> idR idL idB SUM
>>>>> 1726 3212 1428 462
>>>>> 318 1846 1012 ...
>>>>> 1706 3541 1511 ...
>>>>> 1726 3212 1428 ...
>>>>>
>>>>>
>>>>> Note that the first and last entry here has the same keys
>>>>>
>>>>> Maybe you find a similar monster SQL solving such a requirement.
>>>>> Thanks for considering!
>>>>>
>>>>> Rainer
>>>>>
>>>>>
>>>>>
>>>>> Am 27.07.2010 12:37, schrieb Oliveiros d'Azevedo Cristina:
>>>>>> Howdy, Rainer.
>>>>>>
>>>>>> It's been a while, so I don't know if you are still interested in
>>>>>> this problem or if you, in the meantime, found yourself a solution,
>>>>>> but I've tried this on a local copy of the example you provided and
>>>>>> it seems to work.
>>>>>>
>>>>>> The problem is that I suspect that if you have several thousands of
>>>>>> records on your table it will become slow...
>>>>>>
>>>>>> Best,
>>>>>> Oliveiros
>>>>>>
>>>>>> SELECT SUM(tudo.parcela),tudo.a
>>>>>> FROM
>>>>>> (
>>>>>> SELECT fo.parcela,fo.a,fo.b,fo.c,MIN(th.c) as d
>>>>>> FROM
>>>>>> (
>>>>>> SELECT se.a as parcela,se.b as a,se.c as b, MAX(pr.c) as c
>>>>>> FROM
>>>>>> yourTable se
>>>>>> LEFT JOIN
>>>>>> (
>>>>>> SELECT a.*
>>>>>> FROM yourTable a
>>>>>> JOIN yourTable b
>>>>>> ON (b.b <> a.b)
>>>>>> AND ((age(a.c,b.c) = '1 day'::interval)
>>>>>>
>>>>>> )
>>>>>> ) pr
>>>>>> ON pr.b = se.b
>>>>>> AND se.c >= pr.c
>>>>>> GROUP BY se.a,se.b,se.c
>>>>>> ) fo
>>>>>> LEFT JOIN
>>>>>> (
>>>>>> SELECT a.*
>>>>>> FROM yourTable a
>>>>>> JOIN yourTable b
>>>>>> ON (b.b <> a.b)
>>>>>> AND ((age(a.c,b.c) = '-1 day'::interval)
>>>>>> )
>>>>>> ) th
>>>>>> ON fo.a = th.b
>>>>>> AND fo.b <= th.c
>>>>>> GROUP BY fo.parcela,fo.a,fo.b,fo.c
>>>>>> ) tudo
>>>>>> GROUP BY tudo.a,tudo.c,tudo.d
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> To: <pgsql-sql(at)postgresql(dot)org>
>>>>>> Sent: Thursday, July 22, 2010 9:09 AM
>>>>>> Subject: [SQL] grouping subsets
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> having a table similar to
>>>>>>>
>>>>>>> | 1 | B | [2010-07-15 Do] |
>>>>>>> | 1 | B | [2010-07-16 Fr] |
>>>>>>> |---+---+-----------------|
>>>>>>> | 2 | C | [2010-07-17 Sa] |
>>>>>>> | 2 | C | [2010-07-18 So] |
>>>>>>> |---+---+-----------------|
>>>>>>> | 1 | B | [2010-07-19 Mo] |
>>>>>>> | 1 | B | [2010-07-20 Di] |
>>>>>>> | 1 | B | [2010-07-21 Mi] |
>>>>>>> | 1 | B | [2010-07-22 Do] |
>>>>>>> |---+---+-----------------|
>>>>>>> | 3 | D | [2010-07-23 Fr] |
>>>>>>>
>>>>>>> a simple group by gives me:
>>>>>>>
>>>>>>> | 6 | B |
>>>>>>> | 4 | C |
>>>>>>> | 3 | D |
>>>>>>>
>>>>>>>
>>>>>>> What I want to get is the values grouped by "subset", where a subset
>>>>>>> is a set of rows with identical column until the colum changes.
>>>>>>> Is there a way to get
>>>>>>>
>>>>>>> | 2 | B |
>>>>>>> | 4 | C |
>>>>>>> | 4 | B |
>>>>>>> | 3 | D |
>>>>>>>
>>>>>>> by SQL only?
>>>>>>>
>>>>>>> - Rainer
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
>>>>>>> To make changes to your subscription:
>>>>>>> http://www.postgresql.org/mailpref/pgsql-sql
>>>>>>
>>>>>>
>>>>
>>
>>

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Joshua Tolley 2010-07-30 15:58:32 Re: grouping subsets
Previous Message Rainer Stengele 2010-07-30 09:35:50 Re: grouping subsets