Re: Re: Add generate_series(date,date) and generate_series(date,date,integer)

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Add generate_series(date,date) and generate_series(date,date,integer)
Date: 2016-03-17 14:57:36
Message-ID: CADkLM=dY0aeS2QvoB7+QzE4_p=WMgR2A_Xexro-PY_u8UXo+3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 17, 2016 at 10:00 AM, David Steele <david(at)pgmasters(dot)net> wrote:

> On 3/17/16 4:49 AM, Dean Rasheed wrote:
>
> > On 16 March 2016 at 23:32, David Steele <david(at)pgmasters(dot)net> wrote:
> >
> >>
> >> I think in this case it comes down to a committer's judgement so I have
> >> marked this "ready for committer" and passed the buck on to Álvaro.
> >
> > So I was pretty much "meh" on this patch too, because I'm not
> > convinced it actually saves much typing, if any.
> >
> > However, I now realise that it introduces a backwards-compatibility
> > breakage. Today it is possible to type
> >
> > SELECT * FROM generate_series('01-01-2000'::date, '01-04-2000', '7
> days');
>
> It can also be broken as below and this is even scarier to me:
>
> postgres=# SELECT * FROM generate_series('01-01-2000'::date,
> '01-04-2000'::date, '7 days');
> ERROR: invalid input syntax for integer: "7 days"
> LINE 1: ...te_series('01-01-2000'::date, '01-04-2000'::date, '7 days');
>
> And only works when:
>
> postgres=# SELECT * FROM generate_series('01-01-2000'::date,
> '01-04-2000'::date, '7 days'::interval);
> generate_series
> ------------------------
> 2000-01-01 00:00:00+00
> (1 row)
>
> One might argue that string constants for dates in actual code might be
> rare but I think it's safe to say that string constants for intervals
> are pretty common. I also think it unlikely that they are all
> explicitly cast.
>
> I marked this "waiting for author" so Corey can respond. I actually
> tested with the v3 patch since the v4 patch seems to be broken with git
> apply or patch:
>
> $ patch -p1 < ../other/generate_series_date.v4.diff
> patching file doc/src/sgml/func.sgml
> Hunk #1 succeeded at 14787 (offset 87 lines).
> Hunk #2 succeeded at 14871 (offset 87 lines).
> patching file src/backend/utils/adt/date.c
> patch: **** malformed patch at line 163: diff --git
> a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
>
> $ git apply -3 ../other/generate_series_date.v4.diff
> fatal: corrupt patch at line 163
>
> --
> -David
> david(at)pgmasters(dot)net
>

Not sure what's wrong with the patch, but I get a clean one to you pending
the outcome of the design discussion. v4 just ripped out the infinity
tests, so v3 is valid for the issue you found..

So the function clobbers the situation where the user meant a timestamp
range but instead gave two dates, and meant an interval but gave a string.
I'm curious how generate_series() for numeric didn't have the same issue
with floats.

I see two ways around this:

1. Drop the step parameter entirely. My own use cases only ever require the
step values 1 and -1, and which one the user wants can be inferred from
(start < end). This would still change the output type where a person
wanted timestamps, but instead input two dates.
2. Rename the function date_series() or generate_series_date()

I still think this is an important function because at the last several
places I've worked, I've found myself manufacturing a query where some
event data is likely to have date gaps, but we want to see the date gaps or
at least have the 0 values contribute to a later average aggregate.

Like this:

select d.dt, sum(s.v1), sum(s2.v2), sum(t.v1), sum(t2.v2)
from date_series_function(input1,input2) as d(dt),
some_dimensional_characteristic c
left join sparse_table s on s.event_date = d.dt and s.c1 = c.cval
left join sparse_table s2 on s2.event_date = d.dt and s2.c2 = c.cval
left join other_sparse t on t.event_date = d.dt and t.c1 = c.cval
left join other_sparse t2 on t2.event_date = d.dt and t2.c2 = c.cval
group by d.dt
order by d.dt

This gets cumbersome when you have to cast every usage of your date column.

Furthermore, this is 90%+ of my usage of generate_series(), the remaining
10% being the integer case to populate sample data for tests.

Any thoughts on how to proceed?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-03-17 15:03:12 Re: silent data loss with ext4 / all current versions
Previous Message Andres Freund 2016-03-17 14:57:04 Re: Performance degradation in commit ac1d794