Re: grouping subsets

From: Rainer Stengele <rainer(dot)stengele(at)diplan(dot)de>
To: Oliveiros d'Azevedo Cristina <oliveiros(dot)cristina(at)marktest(dot)pt>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: grouping subsets
Date: 2010-07-31 19:48:58
Message-ID: 4C547E2A.2040402@diplan.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

The analysis will have to be done over a calendar range which the user
selects via web interface. So - yes - normally not the whole table, but
maybe he chooses one month and thats already a lot of rows in the table ...

Best, too,
Rainer

Am 30.07.2010 17:07, schrieb Oliveiros d'Azevedo Cristina:
> I See.
>
> And the analysis you need to do, the sum of the rows with the same
> keys (until they change) will have to be done over all
> table?
> Or just over some predefined interval ?
>
> Best,
> Oliveiros
> ----- Original Message ----- From: "Rainer Stengele"
> <rainer(dot)stengele(at)diplan(dot)de>
> Newsgroups: gmane.comp.db.postgresql.sql
> To: "Oliveiros d'Azevedo Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt>
> Cc: <pgsql-sql(at)postgresql(dot)org>
> Sent: Friday, July 30, 2010 10:35 AM
> Subject: Re: grouping subsets
>
>
>> the table may include up to maybe 30 entries per day, average maybe
>> 10-15
>> After a year this makes about 10.000 entries - maximum, average about
>> 5000 entries.
>>
>> For the problem described I have to use a Microsoft SQL database and
>> would like to use pure SQL.
>> As I use postgres on my Linux servers I found this newsgroup and
>> thought I ask here.
>>
>> Thanks!
>> Rainer
>>
>> Am 29.07.2010 15:31, schrieb Oliveiros d'Azevedo Cristina:
>>> Fine.
>>>
>>> Please advice me,
>>>
>>> How long can
>>> your table be? Thousands? Millions of records?
>>>
>>> Do you really need it in pure SQL
>>> ?
>>>
>>> It seems to me that it might be possible, I'm just affraid that the
>>> query would become too complex and thus slow...
>>>
>>> Best,
>>> Oliveiros
>>>
>>> ----- Original Message ----- From: "Rainer Stengele"
>>> <rainer(dot)stengele(at)diplan(dot)de>
>>> To: "Oliveiros d'Azevedo Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt>
>>> Cc: <pgsql-sql(at)postgresql(dot)org>
>>> Sent: Thursday, July 29, 2010 1:10 PM
>>> Subject: Re: grouping subsets
>>>
>>>
>>>> No. This is by accident.
>>>> We have to assume that the combinations do change anytime, and many
>>>> times per day.
>>>>
>>>> So
>>>>
>>>> "Or is it possible to have the same combination on one day with
>>>> several sets?"
>>>>
>>>> YES!
>>>>
>>>> Rainer
>>>>
>>>>
>>>>
>>>> Am 29.07.2010 13:47, schrieb Oliveiros d'Azevedo Cristina:
>>>>> Yes. This is somewhat more complicated because it has more
>>>>> constraints.
>>>>> I've noticed that a given combination doesn't appear with holes on
>>>>> a certain day.
>>>>>
>>>>> For ex, on a daily basis, we have every three key combinations
>>>>> together.
>>>>>
>>>>> We dont have things like
>>>>> 2010-7-01 1726 3212 1428
>>>>> 2010-7-01 1726 3212 1428
>>>>> ... 318 1846 1012
>>>>> 2010-7-01 1726 3212 1428
>>>>>
>>>>> Can I assume that, for a certain day , the records for the same
>>>>> three combination are all together? There is just one set per day
>>>>> for a given combination?
>>>>>
>>>>> Or is it possible to have the same combination on one day with
>>>>> several sets?
>>>>>
>>>>> Best,
>>>>> Oliveiros
>>>>>
>>>>>
>>>>> ----- Original Message ----- From: "Rainer Stengele"
>>>>> <rainer(dot)stengele(at)diplan(dot)de>
>>>>> Newsgroups: gmane.comp.db.postgresql.sql
>>>>> To: "Oliveiros d'Azevedo Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt>
>>>>> Cc: <>
>>>>> Sent: Thursday, July 29, 2010 10:41 AM
>>>>> Subject: Re: grouping subsets
>>>>>
>>>>>
>>>>>> Howdy Cristina,
>>>>>>
>>>>>> unfortunately things are more complicated. I have inserted an
>>>>>> excerpt of the real data here:
>>>>>>
>>>>>> ================================================================================
>>>>>>
>>>>>> TableID MasterID dtBegin dtEnd
>>>>>> idR idL idB consumption
>>>>>> 4057312 295530 2010-07-01 00:59:21.077
>>>>>> 2010-07-01 01:32:59.670 1726 3212 1428 279
>>>>>> 4061043 295574 2010-07-01 01:59:31.137
>>>>>> 2010-07-01 02:32:09.373 1726 3212 1428 183
>>>>>> 4083397 295838 2010-07-01 07:57:51.327
>>>>>> 2010-07-01 08:28:28.117 318 1846 1012 30
>>>>>> 4090858 295920 2010-07-01 09:52:33.777
>>>>>> 2010-07-01 10:31:34.393 318 1846 1012 487
>>>>>> 4094589 295961 2010-07-01 10:47:59.370
>>>>>> 2010-07-01 11:32:20.903 318 1846 1012 472
>>>>>> 4098330 296013 2010-07-01 11:58:53.890
>>>>>> 2010-07-01 12:31:35.730 318 1846 1012 195
>>>>>> 4102069 296058 2010-07-01 12:36:19.170
>>>>>> 2010-07-01 13:32:13.950 318 1846 1012 338
>>>>>> 4105809 296102 2010-07-01 13:58:53.170
>>>>>> 2010-07-01 14:02:57.710 318 1846 1012 105
>>>>>> 4109555 296150 2010-07-01 14:59:11.663
>>>>>> 2010-07-01 15:32:33.810 318 1846 1012 187
>>>>>> 4113305 296194 2010-07-01 15:59:01.797
>>>>>> 2010-07-01 16:02:27.260 318 1846 1012 108
>>>>>> 4117048 296238 2010-07-01 16:20:47.997
>>>>>> 2010-07-01 17:32:49.367 318 1846 1012 179
>>>>>> 4120791 296282 2010-07-01 17:58:27.657
>>>>>> 2010-07-01 18:29:01.733 318 1846 1012 256
>>>>>> 4128291 296370 2010-07-01 19:54:17.687
>>>>>> 2010-07-01 20:32:53.850 318 1846 1012 239
>>>>>> 4132044 296413 2010-07-01 20:31:37.653
>>>>>> 2010-07-01 21:29:13.497 318 1846 1012 39
>>>>>> 4135797 296458 2010-07-01 21:59:13.983
>>>>>> 2010-07-01 22:32:46.503 318 1846 1012 157
>>>>>> 4139572 296506 2010-07-01 22:58:49.530
>>>>>> 2010-07-01 23:32:22.543 318 1846 1012 218
>>>>>> 4142941 296554 2010-07-01 23:59:13.857
>>>>>> 2010-07-02 00:32:30.390 318 1846 1012 248
>>>>>> 4146289 296598 2010-07-02 00:58:55.763
>>>>>> 2010-07-02 01:32:41.983 318 1846 1012 204
>>>>>> 4149616 296642 2010-07-02 01:46:57.357
>>>>>> 2010-07-02 02:32:56.983 318 1846 1012 42
>>>>>> 4152952 296686 2010-07-02 02:55:19.653
>>>>>> 2010-07-02 03:32:28.013 318 1846 1012 135
>>>>>> 4156289 296730 2010-07-02 03:43:52.777
>>>>>> 2010-07-02 04:32:55.250 318 1846 1012 743
>>>>>> 4159624 296774 2010-07-02 04:43:15.310
>>>>>> 2010-07-02 05:32:44.547 318 1846 1012 277
>>>>>> 4162961 296817 2010-07-02 05:58:59.483
>>>>>> 2010-07-02 06:32:37.340 318 1846 1012 121
>>>>>> 4166303 296862 2010-07-02 06:58:50.733
>>>>>> 2010-07-02 07:32:39.113 318 1846 1012 239
>>>>>> 4172981 296950 2010-07-02 07:28:55.293
>>>>>> 2010-07-02 09:33:01.200 318 1846 1012 512
>>>>>> 4176322 296993 2010-07-02 09:59:04.607
>>>>>> 2010-07-02 10:33:01.903 318 1846 1012 139
>>>>>> 4179667 297038 2010-07-02 10:55:27.760
>>>>>> 2010-07-02 11:32:56.560 318 1846 1012 722
>>>>>> 4183012 297082 2010-07-02 11:59:33.650
>>>>>> 2010-07-02 12:32:14.700 318 1846 1012 163
>>>>>> 4186351 297126 2010-07-02 12:23:45.997
>>>>>> 2010-07-02 13:32:59.500 318 1846 1012 284
>>>>>> 4189689 297169 2010-07-02 13:44:21.253
>>>>>> 2010-07-02 14:18:05.080 318 1846 1012 254
>>>>>> 4196371 297258 2010-07-02 16:16:19.123
>>>>>> 2010-07-02 16:32:53.437 1706 3541 1511 161
>>>>>> 4199720 297301 2010-07-02 16:59:35.127
>>>>>> 2010-07-02 17:32:57.950 1706 3541 1511 250
>>>>>> 4203068 297346 2010-07-02 17:59:34.027
>>>>>> 2010-07-02 18:32:54.337 1706 3541 1511 302
>>>>>> 4206413 297389 2010-07-02 18:59:28.730
>>>>>> 2010-07-02 19:32:37.950 1706 3541 1511 276
>>>>>> 4209758 297434 2010-07-02 19:54:00.243
>>>>>> 2010-07-02 20:32:57.433 1706 3541 1511 209
>>>>>> 4213102 297473 2010-07-02 20:49:10.963
>>>>>> 2010-07-02 21:30:44.540 1706 3541 1511 76
>>>>>> 4216447 297511 2010-07-02 21:59:34.810
>>>>>> 2010-07-02 22:33:00.603 1706 3541 1511 287
>>>>>> 4219818 297569 2010-07-02 22:56:52.750
>>>>>> 2010-07-02 23:59:31.607 1706 3541 1511 1877
>>>>>> 4219819 297570 2010-07-02 23:59:21.577
>>>>>> 2010-07-03 00:54:40.153 1706 3541 1511 1798
>>>>>> 4219821 297572 2010-07-03 00:48:03.310
>>>>>> 2010-07-03 01:59:37.920 1706 3541 1511 1125
>>>>>> 4219823 297574 2010-07-03 01:51:01.057
>>>>>> 2010-07-03 02:59:45.433 1706 3541 1511 1629
>>>>>> 4219820 297571 2010-07-03 02:59:29.393
>>>>>> 2010-07-03 03:59:54.920 1706 3541 1511 2462
>>>>>> 4219822 297573 2010-07-03 03:59:18.663
>>>>>> 2010-07-03 04:01:48.810 1706 3541 1511 70
>>>>>> 4225738 297656 2010-07-03 06:13:34.980
>>>>>> 2010-07-03 06:28:09.697 1726 3212 1428 46
>>>>>> 4228694 297695 2010-07-03 06:59:15.560
>>>>>> 2010-07-03 07:32:45.653 1726 3212 1428 251
>>>>>> 4231649 297733 2010-07-03 07:59:11.937
>>>>>> 2010-07-03 08:32:57.217 1726 3212 1428 284
>>>>>> 4234604 297771 2010-07-03 08:57:00.357
>>>>>> 2010-07-03 09:32:47.903 1726 3212 1428 227
>>>>>> 4237559 297809 2010-07-03 09:59:19.813
>>>>>> 2010-07-03 10:33:02.063 1726 3212 1428 285
>>>>>> 4261156 298596 2010-07-04 22:59:09.863
>>>>>> 2010-07-04 23:33:45.530 1726 3212 1428 1286
>>>>>> 4264114 298646 2010-07-04 23:59:16.967
>>>>>> 2010-07-05 00:33:08.107 1726 3212 1428 297
>>>>>> 4267067 298690 2010-07-05 00:59:15.187
>>>>>> 2010-07-05 01:32:48.300 1726 3212 1428 333
>>>>>> 4270023 298734 2010-07-05 01:59:02.497
>>>>>> 2010-07-05 02:32:48.780 1726 3212 1428 270
>>>>>> 4272977 298778 2010-07-05 02:41:43.737
>>>>>> 2010-07-05 03:32:56.043 1726 3212 1428 317
>>>>>> 4275927 298822 2010-07-05 03:59:17.027
>>>>>> 2010-07-05 04:33:14.947 1726 3212 1428 1623
>>>>>> ================================================================================
>>>>>>
>>>>>>
>>>>>> Description:
>>>>>> 1. Column: some ID
>>>>>> 2. Column: reference to another table
>>>>>> 3. and 4. column: timestamp from/to of the item
>>>>>> 5. Column: ID R
>>>>>> 6. Column: ID L
>>>>>> 7. Column: ID B
>>>>>> 8. Column: Sum of components
>>>>>>
>>>>>> Requirement:
>>>>>> Sum over all components (from column 8) for each combination of
>>>>>> ID R, ID L, ID B, but (!)
>>>>>> rows with same keys (R,L,B) should be summed up only until the
>>>>>> keys change.
>>>>>> Do not sum up the components for identical keys, if there are
>>>>>> other keys between them.
>>>>>>
>>>>>> Example result:
>>>>>>
>>>>>> idR idL idB SUM
>>>>>> 1726 3212 1428 462
>>>>>> 318 1846 1012 ...
>>>>>> 1706 3541 1511 ...
>>>>>> 1726 3212 1428 ...
>>>>>>
>>>>>>
>>>>>> Note that the first and last entry here has the same keys
>>>>>>
>>>>>> Maybe you find a similar monster SQL solving such a requirement.
>>>>>> Thanks for considering!
>>>>>>
>>>>>> Rainer
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 27.07.2010 12:37, schrieb Oliveiros d'Azevedo Cristina:
>>>>>>> Howdy, Rainer.
>>>>>>>
>>>>>>> It's been a while, so I don't know if you are still interested
>>>>>>> in this problem or if you, in the meantime, found yourself a
>>>>>>> solution,
>>>>>>> but I've tried this on a local copy of the example you
>>>>>>> provided and it seems to work.
>>>>>>>
>>>>>>> The problem is that I suspect that if you have several thousands
>>>>>>> of records on your table it will become slow...
>>>>>>>
>>>>>>> Best,
>>>>>>> Oliveiros
>>>>>>>
>>>>>>> SELECT SUM(tudo.parcela),tudo.a
>>>>>>> FROM
>>>>>>> (
>>>>>>> SELECT fo.parcela,fo.a,fo.b,fo.c,MIN(th.c) as d
>>>>>>> FROM
>>>>>>> (
>>>>>>> SELECT se.a as parcela,se.b as a,se.c as b, MAX(pr.c) as c
>>>>>>> FROM
>>>>>>> yourTable se
>>>>>>> LEFT JOIN
>>>>>>> (
>>>>>>> SELECT a.*
>>>>>>> FROM yourTable a
>>>>>>> JOIN yourTable b
>>>>>>> ON (b.b <> a.b)
>>>>>>> AND ((age(a.c,b.c) = '1 day'::interval)
>>>>>>>
>>>>>>> )
>>>>>>> ) pr
>>>>>>> ON pr.b = se.b
>>>>>>> AND se.c >= pr.c
>>>>>>> GROUP BY se.a,se.b,se.c
>>>>>>> ) fo
>>>>>>> LEFT JOIN
>>>>>>> (
>>>>>>> SELECT a.*
>>>>>>> FROM yourTable a
>>>>>>> JOIN yourTable b
>>>>>>> ON (b.b <> a.b)
>>>>>>> AND ((age(a.c,b.c) = '-1 day'::interval)
>>>>>>> )
>>>>>>> ) th
>>>>>>> ON fo.a = th.b
>>>>>>> AND fo.b <= th.c
>>>>>>> GROUP BY fo.parcela,fo.a,fo.b,fo.c
>>>>>>> ) tudo
>>>>>>> GROUP BY tudo.a,tudo.c,tudo.d
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> To: <pgsql-sql(at)postgresql(dot)org>
>>>>>>> Sent: Thursday, July 22, 2010 9:09 AM
>>>>>>> Subject: [SQL] grouping subsets
>>>>>>>
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> having a table similar to
>>>>>>>>
>>>>>>>> | 1 | B | [2010-07-15 Do] |
>>>>>>>> | 1 | B | [2010-07-16 Fr] |
>>>>>>>> |---+---+-----------------|
>>>>>>>> | 2 | C | [2010-07-17 Sa] |
>>>>>>>> | 2 | C | [2010-07-18 So] |
>>>>>>>> |---+---+-----------------|
>>>>>>>> | 1 | B | [2010-07-19 Mo] |
>>>>>>>> | 1 | B | [2010-07-20 Di] |
>>>>>>>> | 1 | B | [2010-07-21 Mi] |
>>>>>>>> | 1 | B | [2010-07-22 Do] |
>>>>>>>> |---+---+-----------------|
>>>>>>>> | 3 | D | [2010-07-23 Fr] |
>>>>>>>>
>>>>>>>> a simple group by gives me:
>>>>>>>>
>>>>>>>> | 6 | B |
>>>>>>>> | 4 | C |
>>>>>>>> | 3 | D |
>>>>>>>>
>>>>>>>>
>>>>>>>> What I want to get is the values grouped by "subset", where a
>>>>>>>> subset is a set of rows with identical column until the colum
>>>>>>>> changes.
>>>>>>>> Is there a way to get
>>>>>>>>
>>>>>>>> | 2 | B |
>>>>>>>> | 4 | C |
>>>>>>>> | 4 | B |
>>>>>>>> | 3 | D |
>>>>>>>>
>>>>>>>> by SQL only?
>>>>>>>>
>>>>>>>> - Rainer
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
>>>>>>>> To make changes to your subscription:
>>>>>>>> http://www.postgresql.org/mailpref/pgsql-sql
>>>>>>>
>>>>>>>
>>>>>
>>>
>>>
>

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message John Hasler 2010-08-03 00:03:53 What does PostgreSQL do when time goes backward?
Previous Message Joshua Tolley 2010-07-30 15:58:32 Re: grouping subsets