Re: BUG #14020: row_number() over(partition by order by) - weird behavior

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: b(dot)yordanov2(at)gmail(dot)com
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14020: row_number() over(partition by order by) - weird behavior
Date: 2016-03-15 04:00:07
Message-ID: CAKFQuwbWiSwSf42hd-h9MgU9CSicnurxv7sY607Uw1D6dLMW9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Mar 14, 2016 at 1:43 PM, <b(dot)yordanov2(at)gmail(dot)com> wrote:

>
> db=# update offers_past_data a set position = b.position from (select id,
> feed, row_number() over(partition by product order by grossprice asc) as
> position from offers_testing) b where a.id = b.id and a.feed = b.feed and
> a.position <> b.position;
> UPDATE 0
>

​Updating offers_past_data

>
> This should update every row in offers_past_data when its “position”
> changes. In the example above no changes were introduced since the last run
> so nothing is updated (expected).
>
> db=# select count(*) from offers_testing where product = 2;
> count
> -------
> 99
> (1 row)
>
So there are 99 offers for product 2.
>

​Counting offers_testing​

> Getting a single offer:
>
> db=# select id,grossprice from offers_testing where product = 2 limit 1;
> id | grossprice
> ---------+------------
> 4127918 | 5000.00
> (1 row)
>
>
​Counting offers_testing​

Updating its grossprice:
>
> db=# update offers_testing set grossprice = 20 where id = 4127918;
> UPDATE 1
>
>
​Updating offers_testing​

Now when executing the first query again I expect that no more than 99 rows
> get updated in offers_past_data since this is the maximum amount of
> positions that would be affected by offer 4127918 grossprice change.
>

You haven't proven to us that a single row in offers_testing cannot match
more than one row in offers_past_data. Assuming a 1-to-many situation the
update count for offers_past_data can definitely be more than the number of
rows returned by the sub-query.
​​

> db=# update offers_past_data a set position = b.position from (select id,
> feed, row_number() over(partition by product order by grossprice asc) as
> position from offers_testing) b where a.id = b.id and a.feed = b.feed and
> a.position <> b.position;
> UPDATE 104
>
> 104 rows get updated.
>
> Executing the same query again a few minutes later (no changes meanwhile in
> either table):
>
> db=# update offers_past_data a set position = b.position from (select id,
> feed, row_number() over(partition by product order by grossprice asc) as
> position from offers_testing) b where a.id = b.id and a.feed = b.feed and
> a.position <> b.position;
> UPDATE 28058
>
> This time it updates 28058 rows.
>
> This is a test environment and nothing reads or writes to these tables.
>
> Is this a bug or am I missing something obvious?

​Its likely data related, not a bug.

Using the "UPDATE ... RETURNING *" form should provide good insight.
Specifically, look for all rows having the same (id, feed) pair.

Also, "ORDER BY grossprice" seems inadequate. The potential for duplicates
here - which would then make the assignment of row numbers within the
product partition random - is non-zero and is a quite likely source of your
problem - along with the probable one-to-many relationship between
offers_testing and offers_past_data.

David J.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Boyko Yordanov 2016-03-15 08:20:25 Re: BUG #14020: row_number() over(partition by order by) - weird behavior
Previous Message Tom Lane 2016-03-15 02:57:15 Re: BUG #14009: pg_restore not restoring privilegs when one login/group role is missing