Re: BUG #14020: row_number() over(partition by order by) - weird behavior

From: Boyko Yordanov <b(dot)yordanov2(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14020: row_number() over(partition by order by) - weird behavior
Date: 2016-03-15 09:04:35
Message-ID: 5BD57C5B-14E8-4472-96B5-EB5CBB35854F@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thinking further on this, I now got your point on the “duplicate grossprices is ordered randomly” suggestion.

What I missed to realize is that the update query updates *every* product partition that has reordered due to duplicate grossprice being ordered randomly, resulting in thousands of updates instead of just < 148 (or < 99 in the case of product = 2 partition).

Is there a way to ensure persistence of “over(order by duplicate_columns)” ordering, except for ordering by a second (or even third) column?

Overall this now makes sense and is obviously not a bug. I apologize for bothering this list (instead of ‘general’) w/ this issue.

Thanks once again!

Regards,
Boyko

> On Mar 15, 2016, at 10:20 AM, Boyko Yordanov <b(dot)yordanov2(at)gmail(dot)com> wrote:
>
> Hi and thanks for your time on this.
>
> You haven't proven to us that a single row in offers_testing cannot match more than one row in offers_past_data. Assuming a 1-to-many situation the update count for offers_past_data can definitely be more than the number of rows returned by the sub-query.
>
> It is a one-to-one relationship between the tables as there is a primary key on (id, feed) on both tables (which I missed to point out):
>
> Indexes:
> "offers_past_data_id_feed" PRIMARY KEY, btree (id, feed)
>
> Indexes:
> "offers_testing_id_feed" PRIMARY KEY, btree (id, feed)
>
> I assume that this guarantees that a single grossprice change in offers_testing where product = 2 translates to up to (count(id,feed) where product = 2) position updates in both offers_testing and offers_past_data.
>
> Adding "returning *" to the questionable query, it seems to update rows that are not related to product 2 (and on my opinion should not have changed positions).
>
> Also, "ORDER BY grossprice" seems inadequate. The potential for duplicates here - which would then make the assignment of row numbers within the product partition random - is non-zero and is a quite likely source of your problem - along with the probable one-to-many relationship between offers_testing and offers_past_data.
>
> Dismissing the one-to-many relationship suggestion as it isn't the case.
>
> Your point on duplicate grossprices is valid, but I believe that if I update a single grossprice, even in the case of duplicate grossprices, this should not translate in more position updates than the rows in the modified product partition. And in offers_testing there are no more than 148 rows per product partition:
>
> db=# select max(partition_count) from (select count(*) over (partition by product) as partition_count from offers_testing) sq;
> max
> -----
> 148
> (1 row)
>
> And yet the update query updates 28k records for some reason, most of which are outside the modified product partition.
>
> Boyko
>
> --
>
> Boyko
>
> 2016-03-15 6:00 GMT+02:00 David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com <mailto:david(dot)g(dot)johnston(at)gmail(dot)com>>:
> On Mon, Mar 14, 2016 at 1:43 PM, <b(dot)yordanov2(at)gmail(dot)com <mailto:b(dot)yordanov2(at)gmail(dot)com>> wrote:
>
> db=# update offers_past_data a set position = b.position from (select id,
> feed, row_number() over(partition by product order by grossprice asc) as
> position from offers_testing) b where a.id <http://a.id/> = b.id <http://b.id/> and a.feed = b.feed and
> a.position <> b.position;
> UPDATE 0
>
> ​Updating offers_past_data
> ​
>
> This should update every row in offers_past_data when its “position”
> changes. In the example above no changes were introduced since the last run
> so nothing is updated (expected).
>
> db=# select count(*) from offers_testing where product = 2;
> count
> -------
> 99
> (1 row)
> So there are 99 offers for product 2.
>
> ​Counting offers_testing​
>
>
> Getting a single offer:
>
> db=# select id,grossprice from offers_testing where product = 2 limit 1;
> id | grossprice
> ---------+------------
> 4127918 | 5000.00
> (1 row)
>
>
> ​Counting offers_testing​
>
> Updating its grossprice:
>
> db=# update offers_testing set grossprice = 20 where id = 4127918;
> UPDATE 1
>
>
> ​Updating offers_testing​
>
> Now when executing the first query again I expect that no more than 99 rows
> get updated in offers_past_data since this is the maximum amount of
> positions that would be affected by offer 4127918 grossprice change.
>
> You haven't proven to us that a single row in offers_testing cannot match more than one row in offers_past_data. Assuming a 1-to-many situation the update count for offers_past_data can definitely be more than the number of rows returned by the sub-query.
> ​​
>
>
> db=# update offers_past_data a set position = b.position from (select id,
> feed, row_number() over(partition by product order by grossprice asc) as
> position from offers_testing) b where a.id <http://a.id/> = b.id <http://b.id/> and a.feed = b.feed and
> a.position <> b.position;
> UPDATE 104
>
> 104 rows get updated.
>
> Executing the same query again a few minutes later (no changes meanwhile in
> either table):
>
> db=# update offers_past_data a set position = b.position from (select id,
> feed, row_number() over(partition by product order by grossprice asc) as
> position from offers_testing) b where a.id <http://a.id/> = b.id <http://b.id/> and a.feed = b.feed and
> a.position <> b.position;
> UPDATE 28058
>
> This time it updates 28058 rows.
>
> This is a test environment and nothing reads or writes to these tables.
>
> Is this a bug or am I missing something obvious?
>
> ​Its likely data related, not a bug.
>
> Using the "UPDATE ... RETURNING *" form should provide good insight. Specifically, look for all rows having the same (id, feed) pair.
>
> Also, "ORDER BY grossprice" seems inadequate. The potential for duplicates here - which would then make the assignment of row numbers within the product partition random - is non-zero and is a quite likely source of your problem - along with the probable one-to-many relationship between offers_testing and offers_past_data.
>
> David J.
> ​
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message suzhengchun 2016-03-15 10:24:03 BUG #14023: pq odbc driver crashed while get data from boolean column
Previous Message shasanoglu 2016-03-15 08:55:51 BUG #14022: Installer fails complaining about COMSPEC but COMSPEC is correct.