Re: UPDATE many records

From: Israel Brewster <ijbrewster(at)alaska(dot)edu>
To: Mark Zellers <markz(at)adaptiveinsights(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: UPDATE many records
Date: 2020-01-07 20:37:51
Message-ID: 04B4EC66-804A-4D88-ABC6-02FCDD72B779@alaska.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Nice approach! Using that method, I can do the select to generate the new table in only 6 minutes! I’m sure it helps that through a slightly creative use of Joins, I can re-write my function to actually be part of the select, just using bult-in trig functions, rather than having to called a stored function that I wrote for each row. And at only 6 minutes, I don’t need to worry about things like committing in batches or parallelizing.

I haven’t tried selecting to a new table yet, presumably writing the data back could take some time, but the concept seems like it should work well. Thanks!
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

> On Jan 7, 2020, at 10:09 AM, Mark Zellers <markz(at)adaptiveinsights(dot)com> wrote:
>
> You don’t tell us if other users will be concurrently changing any of the records involved. If you could guarantee that the table won’t be changed, you might be better off doing a CREATE TABLE table_new as SELECT … FROM table_old, dropping table_old, and finally renaming table_new. Given the way Postgres handles updates, I would think that might perform significantly better. Even if you did the work in batches (create a new table, insert/select from the old table, drop, rename), that could well be better. Especially if you re-create the indexes after all the data is moved.
>
>
>
> From: Israel Brewster <ijbrewster(at)alaska(dot)edu>
> Sent: Monday, January 6, 2020 10:36 AM
> To: pgsql-general(at)lists(dot)postgresql(dot)org
> Subject: UPDATE many records
>
> Thanks to a change in historical data, I have a need to update a large number of records (around 50 million). The update itself is straight forward, as I can just issue an "UPDATE table_name SET changed_field=new_value();" (yes, new_value is the result of a stored procedure, if that makes a difference) command via psql, and it should work. However, due to the large number of records this command will obviously take a while, and if anything goes wrong during the update (one bad value in row 45 million, lost connection, etc), all the work that has been done already will be lost due to the transactional nature of such commands (unless I am missing something).
>
> Given that each row update is completely independent of any other row, I have the following questions:
>
> 1) Is there any way to set the command such that each row change is committed as it is calculated?
> 2) Is there some way to run this command in parallel in order to better utilize multiple processor cores, other than manually breaking the data into chunks and running a separate psql/update process for each chunk? Honestly, manual parallelizing wouldn’t be too bad (there are a number of logical segregations I can apply), I’m just wondering if there is a more automatic option.
> ---
> Israel Brewster
> Software Engineer
> Alaska Volcano Observatory
> Geophysical Institute - UAF
> 2156 Koyukuk Drive
> Fairbanks AK 99775-7320
> Work: 907-474-5172
> cell: 907-328-9145

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Pavel Stehule 2020-01-07 20:37:55 Re: jsonb_set() strictness considered harmful to data
Previous Message Mark Zellers 2020-01-07 19:09:41 RE: UPDATE many records