Re: Long Running Update - My Solution

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Harry Mantheakis <harry(dot)mantheakis(at)riskcontrollimited(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Long Running Update - My Solution
Date: 2011-06-28 00:37:08
Message-ID: 4E092234.30200@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Harry Mantheakis wrote:
> The mystery remains, for me: why updating 100,000 records could
> complete in as quickly as 5 seconds, whereas an attempt to update a
> million records was still running after 25 minutes before we killed it?

The way you were doing this originally, it was joining every record in
table A against every record in table B, finding the matches (note the
sequential scans on each in the query plan you showed). Having A * B
possible matches there was using up a bunch of resources to line those
two up for an efficient join, and it's possible that parts of that
required spilling working data over to disk and other expensive
operations. And you were guaranteeing that every row in each table was
being processed in some way.

Now, when you only took a small slice of A instead, and a small slice of
B to match, this was likely using an index and working with a lot less
rows in total--only ones in B that mattered were considered, not every
one in B. And each processing slice was working on less rows, making it
more likely to fit in memory, and thus avoiding both slow spill to disk
operation and work that was less likely to fit into the system cache.

I don't know exactly how much of each of these two components went into
your large run-time difference, but I suspect both were involved. The
way the optimizer switches to using a sequential scan when doing bulk
operations is often the right move. But if it happens in a way that
causes the set of data to be processed to become much larger than RAM,
it can be a bad decision. The performance drop when things stop fitting
in memory is not a slow one, it's like a giant cliff you fall off.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Harry Mantheakis 2011-06-28 09:48:57 Re: Long Running Update - My Solution
Previous Message Denis de Bernardy 2011-06-27 23:00:53 Fw: Getting rid of a seq scan in query on a large table