Re: Given a set of daterange, finding the continuous range that includes a particular date (aggregates)

From: Ken Tanzer <ken(dot)tanzer(at)gmail(dot)com>
To: mariusz <marius(at)mtvk(dot)pl>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Given a set of daterange, finding the continuous range that includes a particular date (aggregates)
Date: 2018-02-24 02:11:51
Message-ID: CAD3a31X_swvaODUt=yShCvN1fBvnEq3RPsto-p1-GB6q6UqKjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
> On Fri, Feb 23, 2018 at 6:10 AM, mariusz <marius(at)mtvk(dot)pl> wrote:

> i guess, you can easily get max continuous range for each row with
> something like this:
>
> CREATE OR REPLACE FUNCTION
> append_daterange(d1 daterange, d2 daterange)
> RETURNS daterange
> LANGUAGE sql
> AS
> $$
> SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END;
> $$;
>
> CREATE AGGREGATE agg_daterange (daterange) (
> sfunc = append_daterange,
> stype = daterange
> );
>
> SELECT dr,
> lower(agg_daterange(dr) OVER (ORDER BY dr ASC)),
> upper(agg_daterange(dr) OVER (ORDER BY dr DESC))
> FROM ...
>
> above example is simplified to selecting only daterange column "dr" for
> readability, which in your case should be something like
>
> daterange(staff_assign_date,staff_assign_date_end,'[)')
>
> please note that daterange would be normalized to [) format so upper()
> above returns exactly your max "staff_assign_date_end" for each
> continuous range when dateranges are created with '[)' format.
>
> the key point is ... ELSE d2 in append_daterange() which starts with new
> value each time that new value is discontinuous with agg's state value
> and order in which rows are processed (ASC for lower of daterange, DESC
> for upper of daterange).
>
> unfortunately this involves reading all rows for "client_id" and
> additional sorting for each window.
> i recall reading that you already pay the price of reading all rows for
> client_id anyway, so the only question is the cost of two additional
> sorts (maybe extracting dateranges to subset on which to do windows and
> rejoining result of continuous ranges to original set would help to
> lower the cost).
>
>
Thank you, and I wanted to follow up on this. I couldn't quite get your
example working as described, but I also ended up trying something very
similar that got me very close but not quite there. Basically, I can see
that it is correctly calculating the ranges (in the notices), but it is
only returning the last range for each client. (Because I said PARTITION
BY client_id).

So I'm not sure if I should be calling this differently, or if the function
needs to work differently, or if this just isn't possible. Do I need to
partition by something else, and if so what? I don't see what I could
specify that would indicate a new episode.

Also, it's not clear to me how an aggregate might define/return different
values within a partition. Although this must be possible, since functions
like rank() and row_number() seem to do it.

Hoping there is something easy that can be tweaked here. See below for
copy/pastable test stuff. It includes output from both functions. Both
look to be returning the same results, which makes me wonder if my passing
in a start date was a waste of time, though it seems to me it would be
necessary.

Cheers,
Ken

BEGIN;
CREATE TEMP TABLE sample_data (
client_id INTEGER,
start_date DATE,
end_date DATE,
episode INTEGER -- Just a label, for clarity
);

INSERT INTO sample_data VALUES
(1,'1990-01-01','1990-12-31',1),
(1,'1991-01-01','1991-12-31',1),

(1,'2000-01-01','2000-12-31',2),
(1,'2001-01-01','2001-12-31',2),
(1,'2002-01-01','2002-12-31',2),
(1,'2003-01-01','2003-12-31',2),
(1,'2004-01-01','2004-12-31',2),
(1,'2005-01-01','2005-12-31',2),
(1,'2006-01-01','2006-12-31',2),

(1,'2014-01-01','2014-12-31',3),
(1,'2015-01-01','2015-12-31',3),
(1,'2017-06-30','2017-12-31',4),
(1,'2018-01-01',NULL,4),

(2,'2014-02-01','2015-01-31',1),
(2,'2015-02-01','2015-12-31',1),
(2,'2017-09-30','2018-01-31',2),
(2,'2018-02-01','2018-02-14',2)
;

CREATE OR REPLACE FUNCTION
append_daterange(d1 daterange, d2 daterange)
RETURNS daterange
LANGUAGE sql
AS
$$
SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END;
$$;

CREATE AGGREGATE agg_daterange (daterange) (
sfunc = append_daterange,
stype = daterange
);

CREATE OR REPLACE FUNCTION range_continuous_merge( daterange, daterange,
date ) RETURNS daterange AS $$

DECLARE
res daterange;

BEGIN
res:= CASE
WHEN $1 IS NULL AND NOT $2 @> $3 THEN NULL
WHEN $1 IS NULL AND $2 @> $3 THEN $2
WHEN ($1 && $2) OR ($1 -|- $2) THEN $1 + $2
WHEN NOT $1 @> $3 THEN $2
ELSE $1
END;
RAISE NOTICE 'Inputs: %,%,%. Returning %',$1::text,$2::text,$3::text,res;
RETURN res;
END;
$$ LANGUAGE plpgsql STABLE;

CREATE AGGREGATE range_continuous( daterange, date ) (

sfunc = range_continuous_merge,
stype = daterange
-- initcond = '{0,0,0}'

);

SELECT
client_id,episode,start_date,end_date,range_continuous(daterange(start_date,end_date,'[]'),start_date)
OVER (PARTITION by client_id) FROM sample_data ;

SELECT
client_id,episode,start_date,end_date,agg_daterange(daterange(start_date,end_date,'[]'))
OVER (PARTITION by client_id) FROM sample_data ;

-- RANGE_CONTINUOUS_MERGE

NOTICE: Inputs: <NULL>,[1990-01-01,1991-01-01),1990-01-01. Returning
[1990-01-01,1991-01-01)
NOTICE: Inputs:
[1990-01-01,1991-01-01),[1991-01-01,1992-01-01),1991-01-01. Returning
[1990-01-01,1992-01-01)
NOTICE: Inputs:
[1990-01-01,1992-01-01),[2000-01-01,2001-01-01),2000-01-01. Returning
[2000-01-01,2001-01-01)
NOTICE: Inputs:
[2000-01-01,2001-01-01),[2001-01-01,2002-01-01),2001-01-01. Returning
[2000-01-01,2002-01-01)
NOTICE: Inputs:
[2000-01-01,2002-01-01),[2002-01-01,2003-01-01),2002-01-01. Returning
[2000-01-01,2003-01-01)
NOTICE: Inputs:
[2000-01-01,2003-01-01),[2003-01-01,2004-01-01),2003-01-01. Returning
[2000-01-01,2004-01-01)
NOTICE: Inputs:
[2000-01-01,2004-01-01),[2004-01-01,2005-01-01),2004-01-01. Returning
[2000-01-01,2005-01-01)
NOTICE: Inputs:
[2000-01-01,2005-01-01),[2005-01-01,2006-01-01),2005-01-01. Returning
[2000-01-01,2006-01-01)
NOTICE: Inputs:
[2000-01-01,2006-01-01),[2006-01-01,2007-01-01),2006-01-01. Returning
[2000-01-01,2007-01-01)
NOTICE: Inputs:
[2000-01-01,2007-01-01),[2014-01-01,2015-01-01),2014-01-01. Returning
[2014-01-01,2015-01-01)
NOTICE: Inputs:
[2014-01-01,2015-01-01),[2015-01-01,2016-01-01),2015-01-01. Returning
[2014-01-01,2016-01-01)
NOTICE: Inputs:
[2014-01-01,2016-01-01),[2017-06-30,2018-01-01),2017-06-30. Returning
[2017-06-30,2018-01-01)
NOTICE: Inputs: [2017-06-30,2018-01-01),[2018-01-01,),2018-01-01.
Returning [2017-06-30,)
NOTICE: Inputs: <NULL>,[2014-02-01,2015-02-01),2014-02-01. Returning
[2014-02-01,2015-02-01)
NOTICE: Inputs:
[2014-02-01,2015-02-01),[2015-02-01,2016-01-01),2015-02-01. Returning
[2014-02-01,2016-01-01)
NOTICE: Inputs:
[2014-02-01,2016-01-01),[2017-09-30,2018-02-01),2017-09-30. Returning
[2017-09-30,2018-02-01)
NOTICE: Inputs:
[2017-09-30,2018-02-01),[2018-02-01,2018-02-15),2018-02-01. Returning
[2017-09-30,2018-02-15)

client_id | episode | start_date | end_date | range_continuous
-----------+---------+------------+------------+-------------------------
1 | 1 | 1990-01-01 | 1990-12-31 | [2017-06-30,)
1 | 1 | 1991-01-01 | 1991-12-31 | [2017-06-30,)
1 | 2 | 2000-01-01 | 2000-12-31 | [2017-06-30,)
1 | 2 | 2001-01-01 | 2001-12-31 | [2017-06-30,)
1 | 2 | 2002-01-01 | 2002-12-31 | [2017-06-30,)
1 | 2 | 2003-01-01 | 2003-12-31 | [2017-06-30,)
1 | 2 | 2004-01-01 | 2004-12-31 | [2017-06-30,)
1 | 2 | 2005-01-01 | 2005-12-31 | [2017-06-30,)
1 | 2 | 2006-01-01 | 2006-12-31 | [2017-06-30,)
1 | 3 | 2014-01-01 | 2014-12-31 | [2017-06-30,)
1 | 3 | 2015-01-01 | 2015-12-31 | [2017-06-30,)
1 | 4 | 2017-06-30 | 2017-12-31 | [2017-06-30,)
1 | 4 | 2018-01-01 | | [2017-06-30,)
2 | 1 | 2014-02-01 | 2015-01-31 | [2017-09-30,2018-02-15)
2 | 1 | 2015-02-01 | 2015-12-31 | [2017-09-30,2018-02-15)
2 | 2 | 2017-09-30 | 2018-01-31 | [2017-09-30,2018-02-15)
2 | 2 | 2018-02-01 | 2018-02-14 | [2017-09-30,2018-02-15)

(17 rows)

-- AGG_DATERANGE

client_id | episode | start_date | end_date | agg_daterange
-----------+---------+------------+------------+-------------------------
1 | 1 | 1990-01-01 | 1990-12-31 | [2017-06-30,)
1 | 1 | 1991-01-01 | 1991-12-31 | [2017-06-30,)
1 | 2 | 2000-01-01 | 2000-12-31 | [2017-06-30,)
1 | 2 | 2001-01-01 | 2001-12-31 | [2017-06-30,)
1 | 2 | 2002-01-01 | 2002-12-31 | [2017-06-30,)
1 | 2 | 2003-01-01 | 2003-12-31 | [2017-06-30,)
1 | 2 | 2004-01-01 | 2004-12-31 | [2017-06-30,)
1 | 2 | 2005-01-01 | 2005-12-31 | [2017-06-30,)
1 | 2 | 2006-01-01 | 2006-12-31 | [2017-06-30,)
1 | 3 | 2014-01-01 | 2014-12-31 | [2017-06-30,)
1 | 3 | 2015-01-01 | 2015-12-31 | [2017-06-30,)
1 | 4 | 2017-06-30 | 2017-12-31 | [2017-06-30,)
1 | 4 | 2018-01-01 | | [2017-06-30,)
2 | 1 | 2014-02-01 | 2015-01-31 | [2017-09-30,2018-02-15)
2 | 1 | 2015-02-01 | 2015-12-31 | [2017-09-30,2018-02-15)
2 | 2 | 2017-09-30 | 2018-01-31 | [2017-09-30,2018-02-15)
2 | 2 | 2018-02-01 | 2018-02-14 | [2017-09-30,2018-02-15)
(17 rows)

--
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ <http://agency-software.org/>*
*https://demo.agency-software.org/client
<https://demo.agency-software.org/client>*
ken(dot)tanzer(at)agency-software(dot)org
(253) 245-3801

Subscribe to the mailing list
<agency-general-request(at)lists(dot)sourceforge(dot)net?body=subscribe> to
learn more about AGENCY or
follow the discussion.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Kazimiers 2018-02-24 03:47:48 Unexpected behavior with transition tables in update statement trigger
Previous Message Ken Tanzer 2018-02-24 01:07:19 Re: Given a set of daterange, finding the continuous range that includes a particular date