Re: A long-running transaction

From: John Summerfield <postgres(at)herakles(dot)homelinux(dot)org>
To:
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: A long-running transaction
Date: 2007-04-13 01:02:58
Message-ID: 461ED6C2.2010200@herakles.homelinux.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


Andrew Sullivan wrote:
> On Wed, Apr 11, 2007 at 10:23:34PM +0800, John Summerfield wrote:
>
>>Each row's identified by a key, and the key doesn't change.
>
>
> That doesn't matter.
>
>
>>ADABAS would put the updated record right back where it came from, it
>
>
> That isn't how PostgreSQL works.
>
> I'm having a hard time laying my hands on the bits of the docs that
> explain this (anyone?), but MVCC stands fo Multi-Version Concurrency
> Control. When you write a row, it has a version number. That
> version number tells other transactions what version of the row they
> look at.
>
> Here's a _really over simplified_ explanation, at a conceptual level.
> [Note to experts: please don't correct details of how this is wrong.
> I know that. But John's problem is that he doesn't have this
> conceptually yet. The following explanation has worked for me in the
> past in getting the idea across first, so that the details can later
> make sense.]
>
> Suppose you have two transactions, t1 and t2. They start at
> approximately the same time. I'm assuming they're READ COMMITTED.
>
> Suppose t1 updates row R. That version of R has a number R(t1).
>
> Suppose at the same time, t2 is looking at R. The _version_ of R
> that t2 sees is R(tn), where n<1. This way, t2 does not have to
> wait on t1 in order for t2 to proceed (unless t2 actually modifies R.
> Ignore that for the sake of this explanation, please).
>
> The way this works is that, in marking R(t1), the system says "R(t1) is
> valid for transactions that committed after t1". Since t2 hasn't
> committed, it sees the old version. So when t1 commits, there are
> _two_ versions of R -- R(t1) and R(tn),n<1. When all transactions
> have committed such that there is no (tn),n<1, then the row is marked
> dead, and can be recovered by VACUUM (this is one of the main things
> VACUUM does. The idea is to move the housekeeping of concurrency
> control out of the main transaction. Oracle does something
> conceptually similar, except using rollback segments, which is why
> long-running transactions can exhaust the supply of rollback segments
> on Oracle).
>
> Now, you're thinking, "But this is all in one transaction!" Yes, but
> the implementation details make it impossible that t1 rewrite the
> same row over and over again, so your rewriting of the same row again
> and again actually is creating huge numbers of dead rows. The
> unfortunate thing is that you have to follow the whole dead-row chain
> to find the currently live version of your row.

Why is this implementation preferable to not doing that?

>
> The upshot of this is that updating the same row over and over in one
> transaction will make your transaction go slower on each round, until
> you are nearly stopped. That's why a lot of updates of the sort you
> describe are in fact the worst case under Postgres. EnterpriseDB

Where is enterprisedb? enterprisedb.com seems to be held by a squatter
in Kingston. About where they're playing an important cricket
competition about now.

> seems to have a proposal (which I believe is going to make 8.3) that
> will mitigate but not completely solve some of this.

I can see we need before and after, but why all those intermediate rows?
How can they possibly be used? If I roll back the transaction, they're
all discarded, and if I commit then only the last is required.

Also, I don't see why (apparently) a sequential search is used; surely
if all of these rows might be required, still a pointer to the last
would be right? Or last two, if you need the ability to check the order.

Is there a development version I can try, with this improvement in it?

>
>
>>user : 1d 3:17:04.03 16.0% page in : 131097310 disk 1:
>>3079516r20087664w
>>nice : 0:05:39.64 0.1% page out: 197016649
>>system: 2d 20:38:37.13 40.1% page act: 87906251
>>
>>That line above. The way I figure it the kernel's doing an enormous
>>amount of work handling its buffers.
>
>
> No, I think it's doing an enormous amount of I/O, because it has to
> keep looking for these new rows (and scanning over the old areas
> while it's at it). This is system time, if I recall correctly
> through the haze that is now my understanding of Linux I/O, because
> it's down at the I/O subsystem and can't actually be done by the user
> program. (Feel free to correct me on this. But I still bet it's
> dead rows.)

Your argument is fairly convicing, and it does reflect that the
postmaster seems always to be doing an update, but I am still dubious.
1. For the first day or so, my observation was that the disk was not
particularly busy. It seemed to be running about half the time. It is
fairly noticible because, for reasons I don't understand at all, it (on
the laptop) generates EMI and upsets the radio in the next room.
Unplugging the power cord stops the EMI - go figure, but of course I
only have a few hours before the battery runs out.

2. System time, unless I am mistaken (and that is possible), is time the
CPU is active in the kernel. It is separate from IOWait.

I'm reposting the whole lot:
Bootup: Tue Mar 27 18:50:19 2007 Load average: 2.21 2.65 2.69 2/243 19305

user : 1d 3:17:04.03 16.0% page in : 131097310 disk 1:
3079516r20087664w
nice : 0:05:39.64 0.1% page out: 197016649
system: 2d 20:38:37.13 40.1% page act: 87906251
IOwait: 2d 0:46:37.33 28.5% page dea: 16218135
hw irq: 0:44:46.71 0.4% page flt: 306255213
sw irq: 0:50:04.69 0.5% swap in : 4026
idle : 1d 0:36:29.73 14.4% swap out: 9552
uptime: 7d 2:59:20.97 context : 702502442

If I add user + system + iowait + idle + get to about the 7 days amd
some total uptime: I expect if I did the arithmentic more precisely, the
result would match even better.

>
>
>>Since all the work is within one transaction, there's no real need for
>>postgresql to write data at all, unless it's running out of memory.
>
>
> No, I'm sure this has to do with the multi-version row writing. Your
> alternatives are ok, but I'd be more tempted to commit a range of
> data in a timestamped series, and then join on a subselect picking
> the latest insert (and add another daemon to knock old values off the
> bottom of the list).

I don't plan on programming round this, because it's a conversion
problem, not ongoing and in any event, dumping the source database into
separate files for each date works well enough. but if I did I'd simply
use a vector in Java and cache the updates myself, and write the updates
before comitting.

In the usual case, there's no significant difference.

>
>
>>In both cases, I am using default values. Unless someone thinks there's
>>clearly something to be gained by changing them, I won't.
>
>
> The PostgreSQL instal defaults are there to get things started, not
> to perform well. There is a long-standing debate in the community
> about whether that's a good idea (and the defaults are way more
> sensible than they used to be), but in general you won't get really
> good performance from the default configuration of a PostgreSQL
> installation.

and the distribution's defaults are not necessarily the same as yours.
That said, I didn't see anything in postgresql.conf that obviously
needed changing; with only one connexion I couldn't see a need to
increase shared memory, and I didn't see anything at all about buffer pools.

>
> Here is a possibly useful bit of reading to start with on tuning,
> although it's getting long in the tooth:
>
> http://www.powerpostgresql.com/Downloads/annotated_conf_80.html
>
> Also
>
> http://varlena.com/varlena/GeneralBits/Tidbits/perf.html

There's nothing that leaps out at me; thw WAL stuff doesn't apply until
I commit, and I haven't. Work memory, maybe, but I'm not doing complex
queries.

Effective cache size might.

>
> And the -performance list will probably help you more than this one.

At present I'm trying to clarify in my mind the nature of the problem.
What I'm trying to do seems to me reasonable. I have some data, and I
want it all in or none of it, so it fits the idea of a single transaction.

It might be that my demands exceed Postgresql's current capabilities,
but by itself it doesn't make what I'm trying to do unreasonable.

>
> Hope this helps.

I appreciate the time you're spending on this. I will be especially
pleased if the final result is an improvement in postgresql.

>
> A
>

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message A. Kretschmer 2007-04-13 06:28:39 Re: question on plpgsql block
Previous Message Phillip Smith 2007-04-13 00:50:35 Re: [SQL] setting up a mirroring or replication database