Re: Why LIMIT and OFFSET are commutative

From: Marco Colombo <pgsql(at)esiway(dot)net>
To: Andrus <kobruleht2(at)hot(dot)ee>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Why LIMIT and OFFSET are commutative
Date: 2007-12-03 11:24:11
Message-ID: 4753E75B.9080000@esiway.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Andrus wrote:
>> Under what interpretation would the results differ?
>
> Results must differ for easy creation of LinQ-PostgreSQL driver.
> If results are always the same , PostgreSQL should not allow to use both
> order of clauses.
>
> Nicholas explains:
>
> Assuming the ordering is the same on each of them (because Skip and Take
> make no sense without ordering, LINQ to SQL will create an order for you,
> which irritates me to no end, but that's a separate thread), they will
> produce different results.
>
> Say your query will produce the ordered set {1, 2, 3}. Let n = 1, m =
> 2.
>
> The first query:
>
> var query = query.Skip(n).Take(m);
>
> converted to SELECT ... OFFSET n LIMIT m
>
> Will return the ordered set {2, 3}, while the second query:
>
> var query = query.Take(m).Skip(n);
>
> converted to SELECT ... LIMIT m OFFSET n
>
> Will return the ordered set {2}.
>
> The reason for this is that in the first query, the Skip method skips
> one element, then takes the remaining two, while in the second query, the
> first two elements are taken, and then the first one is skipped.

This semantics implies subqueries.

In SQL LIMIT and OFFSET refer to the whole query (that's why in PG you
can swap them). If you want to think OO, both are _attributes_ for the
query object, whose default values OFFSET 1 and LIMIT *inf*.

They are not _operators_ on the query result, as you seem to imply. The
Take() and Skip() should not be real methods, they should just set
internal instance variables.

(Pardon my pythonic syntax - I know zero of LinQ)

q = Query("SELECT ...") # create a new query object
q.limit = 2 # object properties
q.offset = 1
result = q.execute() # perform the query

using wrapper methods:

q = Query("SELECT ...") # create a new query object
q.limit(2)
q.offset(1)
result = q.execute()

Methods allow a more pythonic way (the same you use):

result = Query("SELECT ...").offset(1).limit(2).execute()

which matches quite closely the SQL syntax, if you have those methods
return "self".

@Erik Jones
There's no real object-relational impedance mismatch here.
The above is pure OO, yet:

result = Query("SELECT ...").limit(2).offset(1).execute()
^ methods swapped
is perfectly equivalent.

Setting properties on an instance object is "commutative" in general,
unless the set_property operation has side-effects, which I wouldn't
call good programming style.

The OP just maps LIMIT and OFFSET into operations ("OFFSET and LIMIT
_operations_ are NOT commutative in general") instead of object
properties. Once you do the correct mapping, objects behave like queries
in SQL.

As Gregory Stark pointed out, if you want LIMIT and OFFSET to work as
"operators", you need to nest queries.

Let's drop the idea there's SQL behind the scene, and let's think of a
more abstract DB model:

q = Query("SELECT ...").execute()
# this executes the query, and returs an object you can perform other
queries on

q = q.limit(2).execute()
# again, the query is executed, and the result set is in turn querable

q = q.offset(1).execute()

Now, we can just make the execute() method implied. With this new
semantics, we have:

q1 = Query("SELECT ...")
q1 = q.limit(2)
q1 = q.offset(1)

q2 = Query("SELECT ...")
q2 = q.offset(1)
q2 = q.limit(2)

and the results differ. I think that's what the OP meant.

But we needed to drop the idea of SQL behind the scene because there's
no way in SQL to directly query the result set from a previous query.
Here there's "impedance mismatch". The closest thing is subqueries, but
you don't store intermediate results anywhere, like we do above with
objects.

One could implement the above by executing the query every time, but
that's a nightmare for performance on big tables. The only way to have
decent performance is to do "lazy" execution of the query, and use
subqueries as Gregory suggested.

Please note that there are ORM tools that do that. SQLAlchemy even
allows you to build a query (much of the above is valid sqlalchemy) and
then treat it as a list, even using array slices which would generate
the convenient LIMIT/OFFSET clauses automagically:

q = session.query(Table)
q1 = q[1:5] # adds LIMIT 4 OFFSET 2

of course the query is actually executed only when you start using the
results.

.TM.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Christian Rengstl 2007-12-03 11:29:39 Archiving problem on Windows
Previous Message Norberto Delle 2007-12-03 11:23:48 Re: Dump/Restore Large Object OID