Re: Document efficient self-joins / UPDATE LIMIT techniques.

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Document efficient self-joins / UPDATE LIMIT techniques.
Date: 2024-01-14 11:44:38
Message-ID: CALDaNm1UjDO6H9DO7rGxekASzRLv9_3NKubjZRAK1kXbHYc5JQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 31 Oct 2023 at 23:42, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
>>
>>
>> I think the SQL statements should end with semicolons. Our SQL examples
>> are usually written like that.
>
>
> ok
>
>
>>
>>
>> Our general style with CTEs seems to be (according to
>> https://www.postgresql.org/docs/current/queries-with.html):
>>
>> WITH quaxi AS (
>> SELECT ...
>> )
>> SELECT ...;
>
>
> done
>
>>
>>
>> About the DELETE example:
>> -------------------------
>>
>> The text suggests that a single, big DELETE operation can consume
>> too many resources. That may be true, but the sum of your DELETEs
>> will consume even more resources.
>>
>> In my experience, the bigger problem with bulk deletes like that is
>> that you can run into deadlocks easily, so maybe that would be a
>> better rationale to give. You could say that with this technique,
>> you can force the lock to be taken in a certain order, which will
>> avoid the possibility of deadlock with other such DELETEs.
>
>
> I've changed the wording to address your concerns:
>
> While doing this will actually increase the total amount of work performed, it can break the work into chunks that have a more acceptable impact on other workloads.
>
>
>>
>>
>> About the SELECT example:
>> -------------------------
>>
>> That example belongs to UPDATE, I'd say, because that is the main
>> operation.
>
>
> I'm iffy on that suggestion. A big part of putting it in SELECT was the fact that it shows usage of SKIP LOCKED and FOR UPDATE.
>
>>
>>
>> The reason you give (avoid excessive locking) is good.
>> Perhaps you could mention that updating in batches also avoids
>> excessive bload (if you VACUUM between the batches).
>
>
> I went with:
>
> This technique has the additional benefit that it can reduce the overal bloat of the updated table if the table can be vacuumed in between batch updates.
>
>>
>>
>> About the UPDATE example:
>> -------------------------
>>
>> I think that could go, because it is pretty similar to the previous
>> one. You even use ctid in both examples.
>
>
> It is similar, but the idea here is to aid in discovery. A user might miss the technique for update if it's only documented in delete, and even if they did see it there, they might not realize that it works for both UPDATE and DELETE. We could make reference links from one to the other, but that seems like extra work for the reader.

I have changed the status of commitfest entry to "Returned with
Feedback" as Laurenz's comments have not yet been resolved. Please
handle the comments and update the commitfest entry accordingly.

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-01-14 11:49:55 Re: Improvements in pg_dump/pg_restore toc format and performances
Previous Message Matthias Kuhn 2024-01-14 11:37:00 Re: Build versionless .so for Android