Re: Finding overlapping records

From: Jon Erdman <postgresql(at)thewickedtribe(dot)net>
To: Frank Sheiness <frank(at)dough(dot)net>
Cc: austinpug(at)postgresql(dot)org
Subject: Re: Finding overlapping records
Date: 2009-12-10 17:12:32
Message-ID: 4B212C00.4010608@thewickedtribe.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: austinpug

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frank,

First of all, you've got a big hole in your overlap check. You're only
checking for (new span is --- existing is +++):

*+++++++*
*-------*

when you really need to check for:

*++++++++*
*---------*
*-------*
*--------------*
*---*

/me pulls out Celko's SQL For Smarties...

So what you would naturally write is perhaps (s1 and e1 are start and
end of existing span, s2 and e2 are the new span):

WHERE
s2 between s1 and e1
OR e2 between s1 and e1
OR s1 between s2 and e2
OR e1 between s2 and e2;

which is a bit long and ugly. There's a shortcut you can take, here's
how you would search for things that *don't* overlap:

*+++++*
*----*
*-----*

so you can write it as:

WHERE NOT ((e2 < s1) OR (s2 > e1));

which is *much* cleaner, no? ;)

Credit goes to Joe Celko, SQL for Smarties, Chapter 13: Between and
Overlaps Predicate, 13.2 Overlaps Predicate, page 279.

Postgres actually has OVERLAPS, so you can just say:

WHERE (s2, e2) OVERLAPS (s1, e1);

however, at least in 8.1, that doesn't use the indexes on the start_date
and end_date. The shortcut above does use those indexes and is nice and
fast.

You should test and see if 8.3 or 8.4 will use the indexes for OVERLAPS
though...
- --

Jon T Erdman

Chief Information Officer voice: (210) 400-5717
Progressive Practice, Inc. jon(at)progressivepractice(dot)com
P.O. Box 17288 www.progressivepractice.com
Rochester, NY 14617

Frank Sheiness wrote:
> Hello,
>
> I had a question that I meant to bring up during the meeting but forgot
> until the end..
>
> I have these tables representing DHCP leases:
>
> CREATE TABLE ips (
> ip serial PRIMARY KEY,
> address inet NOT NULL
> );
>
> CREATE TABLE leases (
> lease serial PRIMARY KEY,
> ip integer NOT NULL REFERENCES ips (ip) ON UPDATE CASCADE ON DELETE RESTRICT,
> mac macaddr NOT NULL,
> starts timestamp NOT NULL,
> ends timestamp NOT NULL,
> server integer NOT NULL,
> UNIQUE (ip, mac, starts, ends)
> );
>
> To optimize the number of records, I would like to make it so when I insert a
> new lease, it checks to see if there's a relevant overlapping lease for that
> mac/ip combination. If so, I want to update the end time of the old record
> with the end time of the new record instead of doing the insert.
>
> Currently, I'm accomplishing this with a trigger that calls this function:
>
> CREATE FUNCTION optimize_lease_fx() RETURNS trigger AS $optimize_lease_fx$
> DECLARE
> id integer;
> BEGIN
> SELECT lease INTO id FROM leases WHERE mac = NEW.mac and
> ip = NEW.ip and NEW.starts between starts AND ends AND
> NEW.ends >= ends;
> IF NOT FOUND THEN
> RETURN NEW;
> ELSE
> EXECUTE 'UPDATE leases SET ends = '
> || quote_literal(NEW.ends)
> || ' where lease = '
> || quote_literal(id);
> RETURN NULL;
> END IF;
> END
> $optimize_lease_fx$ LANGUAGE plpgsql;
>
> It works, but I'm wondering if there's a better way to do this. I'll be
> flushing data when it's older than say.. 3 months. Also, keep in mind that
> some pre-optimized data may look like this (below). The new rows will
> overlap with each other and/or existing rows.
>
>
> lease | ip | mac | starts | ends | server
> --------+-------+-------------------+---------------------+---------------------+---------
> 255646 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 04:21:02 | 2009-12-10 06:21:02 | 400001
> 256948 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:13:38 | 2009-12-10 08:13:38 | 400001
> 256951 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:13:42 | 2009-12-10 08:13:42 | 400001
> 256954 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:13:50 | 2009-12-10 08:13:50 | 400001
> 257073 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:17:34 | 2009-12-10 08:17:34 | 400001
> 257077 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:17:38 | 2009-12-10 08:17:38 | 400001
> 257084 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:17:47 | 2009-12-10 08:17:47 | 400001
> 257157 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:19:33 | 2009-12-10 08:19:33 | 400001
> 257158 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:19:37 | 2009-12-10 08:19:37 | 400001
> 257168 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:19:44 | 2009-12-10 08:19:44 | 400001
> 257212 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:20:31 | 2009-12-10 08:20:31 | 400001
> 257215 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:20:34 | 2009-12-10 08:20:34 | 400001
> 257217 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:20:43 | 2009-12-10 08:20:43 | 400001
> 257230 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:20:59 | 2009-12-10 08:20:59 | 400001
> 257234 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:21:02 | 2009-12-10 08:21:02 | 400001
> 257236 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:21:03 | 2009-12-10 08:21:03 | 400001
>
>
>
> Thanks,
> Frank
>

-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAkshLAAACgkQRAk1+p0GhSHjEACfRVbMyUIOxAQCfFpsjwe+Yp5f
XREAn1GEUsrriKu9z3giBmxGL/5SxtOH
=rm2l
-----END PGP SIGNATURE-----

In response to

Responses

Browse austinpug by date

  From Date Subject
Next Message Jon Erdman 2009-12-10 17:15:38 Re: Finding overlapping records
Previous Message Frank Sheiness 2009-12-10 08:30:14 Finding overlapping records