From: | Jason Long <mailing(dot)lists(at)octgsoftware(dot)com> |
---|---|
To: | Rob Sargent <robjsargent(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Overlapping ranges |
Date: | 2014-06-19 00:45:45 |
Message-ID: | 1403138745.32369.5.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, 2014-06-18 at 18:08 -0600, Rob Sargent wrote:
> On 06/18/2014 05:47 PM, Jason Long wrote:
>
>
> > I have a large table of access logs to an application.
> >
> > I want is to find all rows that overlap startdate and enddate with any
> > other rows.
> >
> > The query below seems to work, but does not finish unless I specify a
> > single id.
> >
> > select distinct a1.id
> > from t_access a1,
> > t_access a2
> > where tstzrange(a1.startdate, a1.enddate) &&
> > tstzrange(a2.startdate, a2.enddate)
> >
> >
> >
> >
>
> I'm sure you're best bet is a windowing function, but your
> descriptions suggests there is no index on start/end date columns.
> Probably want those in any event.
There are indexs on startdate and enddate.
If I specify a known a1.id=1234 then the query returns all records that
overlap it, but this takes 1.7 seconds.
There are about 2 million records in the table.
I will see what I come up with on the window function.
If anyone else has some suggestions let me know.
I get with for EXPLAIN ANALYZE the id specified.
Nested Loop (cost=0.43..107950.50 rows=8825 width=84) (actual
time=2803.932..2804.558 rows=11 loops=1)
Join Filter: (tstzrange(a1.startdate, a1.enddate) &&
tstzrange(a2.startdate, a2.enddate))
Rows Removed by Join Filter: 1767741
-> Index Scan using t_access_pkey on t_access a1 (cost=0.43..8.45
rows=1 width=24) (actual time=0.016..0.019 rows=1 loops=1)
Index Cond: (id = 1928761)
-> Seq Scan on t_access a2 (cost=0.00..77056.22 rows=1764905
width=60) (actual time=0.006..1200.657 rows=1767752 loops=1)
Filter: (enddate IS NOT NULL)
Rows Removed by Filter: 159270
Total runtime: 2804.599 ms
and for EXPLAIN without the id specified. EXPLAIN ANALYZE will not
complete without the id specified.
Nested Loop (cost=0.00..87949681448.20 rows=17005053815 width=84)
Join Filter: (tstzrange(a1.startdate, a1.enddate) &&
tstzrange(a2.startdate, a2.enddate))
-> Seq Scan on t_access a2 (cost=0.00..77056.22 rows=1764905
width=60)
Filter: (enddate IS NOT NULL)
-> Materialize (cost=0.00..97983.33 rows=1927022 width=24)
-> Seq Scan on t_access a1 (cost=0.00..77056.22 rows=1927022
width=24)
From | Date | Subject | |
---|---|---|---|
Next Message | Edson Richter | 2014-06-19 02:50:11 | Global value/global variable? |
Previous Message | Rob Sargent | 2014-06-19 00:08:33 | Re: Overlapping ranges |