Re: PostgreSQL design question

From: Jeffrey Tenny <jeffrey(dot)tenny(at)comcast(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: postgres jdbc <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: PostgreSQL design question
Date: 2004-01-02 15:15:51
Message-ID: 3FF58B27.5060006@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Tom Lane wrote:

>Jeffrey Tenny <jeffrey(dot)tenny(at)comcast(dot)net> writes:
>
>
>>To determine the cache coherency of my java memory cache of database
>>table content,
>>I maintain an "update table" for all tables to be cached.
>>Whenever a transaction updates a table that is cached, it must also
>>update the 'update table'.
>>
>>
>
>Hm. I do not actually see what it buys you to store n_rows in the
>update table; seems like the update time is the only thing that is
>useful. (With concurrent inserts under MVCC, n_rows is in the eye
>of the beholder anyway, so if your caching code *is* relying on it
>then it is wrong.)
>
>
Since many of my tables are append-only, the logic is like this:

I start the transaction and query n_row for append-only tables. If I
have fewer rows in my cache,
I know I need to read the rows inserted after n_rows (basically those
records whose primary key is > n_rows).
I do this based on primary key and not update time, since I don't
want to store update times in the append-only table rows in order to
know which were added 'since' a given time.

If someone updates the table that's all well and good, I'm in
SERIALIZABLE state anyway,
so nothing about my view of the database will see any updates since I
queried the update table.

Or am I still missing the point?

>However, getting rid of n_rows doesn't eliminate the fundamental problem
>that rows of the update table are serialization bottlenecks.
>
I wasn't trying to get rid of n_rows, but yes, the table represents a
bottleneck.

>
>Recognizing that update_time is all you can usefully propagate anyway,
>consider this design:
>
>* Get rid of primary key --- there are allowed to be multiple rows in
>the update table for any given data table.
>
>* Updating transactions insert (not update) a row identifying the table
>they changed and giving their update_time.
>
>* Caching transactions look at the max update_time for each table they
>are interested in. This is cheap to do efficiently with the right query
>and index --- see past discussions about optimizing max() operations.
>(Hint: you do *not* write max().)
>
Yup, been there, done that.
And I can logically max() n_rows (unless I see the light as to why
that's worthless, I'm still missing that).

>
>* Periodically, a maintenance process deletes all but the newest row for
>each table, and then does a VACUUM of the update table.
>
>This avoids serializing on any individual row of the update table.
>The maintenance process doesn't block anything either (it's okay if
>it happens to leave more than one valid row for a particular data
>table on any given run).
>
Couldn't I take it a step further and simply delete all existing rows
for the table (as the transaction sees it)
and insert a new row, every time I update the update-table? Or is that
going to fragment the database somehow?

>
>However, this idea has a serious problem: it does not quite work because
>transactions may not commit in the same order they start. Consider this
>scenario:
>
>Transaction A starts
>
>A inserts some data
>
> Transaction B starts
>
> B inserts some data
>
> B inserts its update_time
>
> B commits
>
> C inspects max(update_time)
>
> C updates its cache
>
>A inserts its update_time
>
>A commits
>
> C inspects max(update_time)
>
> C updates its cache??
>
>
Unless C is two transactions, the update table is inspected only once at
the start of each transaction,
then updated at the end of the transaction if necessary.

Since I'm using SERIALIZABLE transaction isolation level, I should see
B's commit, but not A's.
It sounds like you're considering the READ COMMITTED state of things,
which is very interesting
because I'd like to be there for performance reasons, but right now I'm
using SERIALIZABLE.

>If update_time is the transaction's start time (now()), then C will in
>fact fail to update its cache for A's updates, because it will still see
>max(update_time) as being B's value. A's update_time is smaller than B's.
>
>
As long as C has a transaction coherent snapshot of all tables as they
existed when C first inspected
the update table, which I /believe/ (falsely?) is guaranteed by
SERIALIZABLE transaction semantics,
A's updates don't matter. My database view, and my memory cache view
for C will all reflect a consistent data set.
(That's trickier than it sounds, since I share data in my memory cache
across threads, and had to implement my
own memory cache transaction system to provide the right semantics).

I'm updating the row with CURRENT_TIMESTAMP(6) in all cases.

But perhaps you've illustrated a bug even with my assumptions about
SERIALIZABLE transactions.
C could set the timestamp of the row to a time that postdates A's even
though C commits first,
depending on each transactions delays between update and commit time.

But if this happens with serializable semantics, A would be rolled back
because of a single row update conflict.

Using the idea of multiple rows per table, then A would have a row with
an earlier timestamp than C's even though
it commits first. I think that's ok. Any transaction that postdates
one or both of A and C
will be looking for the max update time.

>We can narrow the window for this failure by using current_timestamp
>(wall clock time) taken as close as possible before the transaction
>commits --- but I don't think we can eliminate the window entirely.
>
>
Let me know if I've missed your point entirely or if I've explained why
it isn't a problem.

I think the truly glaring problem is that if ever move to a READ
COMMITTED isolation level,
what the update table says and what subsequent queries on the tables say
will be potentially two different things.
I can handle that with my append-only tables, but for update/delete
enabled tables I'd have to essentially lock the tables or rows
in suitable lock hierarchy order.

>BTW, I think your original design is vulnerable to this same problem.
>You may or may not be saved from it by the fact that you are serializing
>on update-table rows; it would depend on the details of your coding.
>
>I thought about using a sequence generator instead of now() to generate
>the update_time values, but that doesn't seem to help. Does anyone
>see a way to make this 100% reliable?
>
>
>
>>I've avoided using any notifications from the server (I'm not sure JDBC
>>even supports it, haven't tried),
>>since I'm unsure such mechanisms are reliable.
>>
>>
>
>LISTEN/NOTIFY would be more reliable than this, mainly because it
>ensures that notifications are not delivered until after the sending
>transaction commits (and therefore you cannot miss seeing its updates).
>However, I'm not sure JDBC supports it either :-(. Anyone know?
>
>
>
>>Furthermore, app servers in some hostile customer network environments
>>have idle network connections dropped after some number of hours. If
>>that happens, the app server will miss messages and have to
>>potentially completely reinitialize its cache.
>>
>>
>
>[ shrug... ] Use dummy keepalive transactions to prevent the connection
>from looking idle for long periods. In any case, I can't believe that
>infrequent cache reloads are a fatal objection.
>
>
Depends on the deployment topology. If the app servers have cached a
gigabyte of data
in Singapore and the database server is in Kansas, the cache becomes
very important.
The Singapore folks would be pissed if their first application
transaction of the morning took 30 minutes to execute
while loading lots of data from Kansas.

Since I'm not currently supporting any distributed database deployments
the cache is important.
Still, a cache invalidation doesn't mean I need to re-read everything,
especially for those append-only tables.

I haven't even looked at the database replication or 2 phase commit
capabilities for PostgreSQL. I'm trying to write
code that is fairly least-common-denominator with respect to database
capability. There are some requirements
of the databases I use though. MVCC is crucial to app performance given
my use of complex serializable transactions,
MySql would just be stalled all the time since it doesn't have MVCC.

>
>
>>I'm also curious if anybody has developed a sequence generator for
>>PostgreSQL that doesn't permit
>>gaps in the sequence based on failed transactions.
>>
>>
>
>It's easy to see that any such thing *must* be a serialization
>bottleneck: you can't assign the next sequence ID till you see if
>the guy who took the previous one commits. Otherwise you have gaps.
>
>You can imagine mechanisms that will reissue the IDs taken by failed
>transactions to the next requestors. However, this just means that you
>have transient instead of permanent gaps in the ID sequence (since
>later IDs might be used by xacts that commit before those that
>successfully use reissued IDs). Most of the scenarios I can think of
>for wanting non-gapping ID sequences will fail just as miserably with
>transient gaps as with permanent ones.
>
Most of my ID use would be fine to have temporary gaps. I have only one
entity that absolutely has to have monotonically increasing
primary keys and strongly prefers them to be adjacent integer values
(the latter more for performance than absolute necessity).

>
>In fact, if your criterion for "no gaps" is "no one who looks at the
>current set of committed values ever sees a gap", then you really don't
>have any choice: you have to ensure that the xact that took ID n
>commits before the one that took ID n+1 commits; it's not enough they
>commit, the commits have to occur in the right order. So you'd have to
>have serialization at commit time even if you avoided serializing at the
>ID-issuance point.
>
>I would counsel taking a very hard look at why you think you need
>non-gapping IDs, rather than engaging in wishful thinking about finding
>an efficient way to have them.
>
Rewording the my above reply, most of my non-gap requirements are only
in the long term to avoid
performance problems. Where I really lost sleep and decided I had to
go with my own sequence allocator
was the nightmare scenario of my app end-user doing some thing really
stupid in a script that caused
an endless loop of failed transactions beyond my control. Then the
sequence generator gaps would be prohobitive.
Depending on the damage done to the database, that could require one of
those nasty trips customer site to
repair the database, not to mention server downtime (anathema to my
customers).

Better slow and steady than fast and unreliable, but there are limits to
how slow...
(Fast, cheap, good, pick two. Sigh).

Thanks,

Dave

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Jeffrey Tenny 2004-01-02 15:31:06 Re: SELECT ... FOR UPDATE and ResultSet
Previous Message Dave Cramer 2004-01-02 14:09:03 Re: PL/Java issues