Re: Help with slow table update

From: Pawel Veselov <pawel(dot)veselov(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Help with slow table update
Date: 2015-04-15 00:01:18
Message-ID: CAMnJ+BcmXQJMHx7FK3N-A0igXzScBqgxY9oHOqEFP+SUMC7hDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Apr 14, 2015 at 3:29 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:

> On 4/14/15 4:44 PM, Pawel Veselov wrote:
>
>> On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com
>> <mailto:Jim(dot)Nasby(at)bluetreble(dot)com>> wrote:
>>
>> On 4/14/15 1:28 PM, Pawel Veselov wrote:
>>
>>
>> I wonder if what I need to do, considering that I update a lot
>> of "the
>> same" rows as I process this queue, is to create a temp table,
>> update
>> the rows there, and then update the actual tables once at the
>> end...
>>
>>
>> That's what I'd do.
>>
>>
>> Well, in short, I changed (repeat the body of loop for how many tables
>> are there)
>>
>> LOOP (item)
>> UPDATE table with item
>> IF not found INSERT item INTO table; END IF;
>> END LOOP;
>>
>> to:
>>
>> CREATE TEMP TABLE xq_table (like table) on commit drop;
>> LOOP (item)
>> LOOP
>> UPDATE xq_table with item;
>> exit when found;
>> INSERT INTO xq_table select * from table for update;
>> continue when found;
>> INSERT item INTO xq_table;
>> exit;
>> END LOOP;
>> END LOOP;
>> UPDATE table a set (rows) = (xq.rows)
>> FROM xq_table xq
>> WHERE (a.keys) = (xq.keys)
>>
>> That works significantly faster. The final update statement is very
>> fast. The process is somewhat slow in the beginning as it sucks in
>> records from "total" into "xq_total", but once all of that is moved into
>> the temp table, it rushes through the rest.
>>
>
> Databases like to think in sets. It will generally be more efficient to do
> set operations instead of a bunch of row-by-row stuff.
>
> Since you're pulling all of this from some other table your best bet is
> probably something like:
>
> CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *;
>
> CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY;
> UPDATE ar_hourly SET ... FROM hourly_v JOIN ...;
> INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...;
>
> -- Same thing for daily
> -- Same thing for total

In my previous post, there was a problem with that pseudo-code, as it's
missing inserts into the final table at the end of loop, for those records
that need to be inserted and not updated.

This is where using sets becomes really tedious, as Postgres severely lacks
an upsert-like statement.
I don't think there are joins allowed in UPDATE statement, so I will need
to use WITH query, right?
Also, I'm not sure how LEFT JOIN will help me isolate and insert "missed"
entries...

Would it be OK to replace upsert part with merging into a temp table, then
deleting and inserting from temp table? Is there any penalty for
insert/delete comparing to update?

[skipped]

> But remember that if you update or delete a row, removing it from an
>> index, the data will stay in that index until vacuum comes along.
>>
>> Also, there's no point in doing a REINDEX after a VACUUM FULL;
>> vacuum full rebuilds all the indexes for you.
>>
>>
>> I was being desperate :)
>>
>> I still think there is something very wrong with this particular table.
>> First, I have production systems that employ this function on way larger
>> data set, and there is no problem (so far, but still). This machine is
>> part of a test deployment, there is no constant load, the only data that
>> is being written now is when I do these tests. Vacuuming should prune
>> all that dead stuff, and if it's absent, it's unclear where is the time
>> spent navigating/updating the table with 24 rows :)
>>
>
> I think you definitely have a problem with dead rows, as evidenced by the
> huge improvement VACUUM FULL made.
>

But it's not clear why (and not reasonable, IMHO, that) it wouldn't improve
past current point.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Joseph Krogh 2015-04-15 01:09:31 Where does vacuum FULL write temp-files?
Previous Message Jim Nasby 2015-04-14 22:29:26 Re: Help with slow table update