| From: | Vadim Mikheev <vadim(at)krs(dot)ru> | 
|---|---|
| To: | hackers(at)postgresql(dot)org | 
| Subject: | proposals for LLL, part 1 | 
| Date: | 1998-07-15 23:46:25 | 
| Message-ID: | 35AD3F51.35DF0E8B@krs.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Ok, I'm not sure that LLL will appear in 6.4 but it's good time to             
discuss about it.                                                              
                                                                               
First, PostgreSQL is multi-version system due to its                           
non-overwriting storage manager. And so, first proposal is use                 
this feature (multi-versioning) in LLL implementation.                         
                                                                               
In multi-version systems access methods don't use locks to read                
consistent data and so readers don't block writers, writers don't              
block readers and only the same-row writers block writers. In such             
systems access methods returns snapshot of data as they were in                
_some_ point in time. For read committed isolation level this                  
moment is the time when statement began. For serialized isolation              
level this is the time when current transaction began.                         
                                                                               
Oracle uses rollback segments to reconstract blocks that were                  
changed after statement/transaction began and so statement sees                
only data that were committed by then.                                         
                                                                               
In our case we have to analyze tuple xmin/xmax to determine _when_             
corresponding transaction was committed in regard to the last                  
transaction (LCX) that was committed when statement/transaction
began.                                                                         
                                                                               
If xmin/xmax was committed before LCX then tuple                               
insertion/deletion is visible to statement, else - not visible.                
                                                                               
To achieve this, the second proposal is to use special SCN -                   
System Change Number (C) Oracle :) - that will be incremented by 1             
by each transaction commit. Each commited transaction will have                
corresponding SCN (4 bytes -> equal to sizeof XID).                            
                                                                               
We have to keep XID --> SCN mapping as long as there is running                
transaction that is "interested" in XID: when transaction begins               
it will determine the first (the oldest) running transaction XID               
and this will be the minimum XID whose SCN transaction would like              
to know.                                                                       
                                                                               
Access methods will have to determine SCN for xmin/xmax only if                
FRX <= xmin/xmax <= LSX, where FRX is XID of first (oldest)                    
running transactions and LSX is last started transaction - in the              
moment when statement (for read committed) or transaction (for                 
serialized) began. For such xmin/xmax their SCNs will be compared
with SCN determined in the moment of statement/transaction                     
begin...                                                                       
                                                                               
Changes made by xmin/xmax < FRX are visible to                                 
statement/transaction, and changes made by xmin/xmax > LSX are not             
visible. Without xmin/xmax SCN lookup.                                         
                                                                               
For XID --> SCN mapping I propose to use the simplest schema:                  
ordered queue of SCNs (or something like this) - i.e. keep SCNs                
for all transactions from the first one whose SCN could be                     
required by some running transaction to the last started.                      
                                                                               
This queue must be shared!                                                     
                                                                               
The size of this queue and average number of commits/aborts per                
second will define how long transactions will be able to run.  30              
xacts/sec and 400K of queue will enable 30 - 60 minuts running                 
transactions...                                                                
                                                                               
Keeping queue in shared memmory may be unacceptable in some                    
cases... mmap or shared buffer pool could be used to access queue.
We'll see...                                                                   
                                                                               
Also note that Oracle has special READ ONLY transactions mode.                 
READ ONLY transactions are disallowed to change anything in the                
database. This is good mode for pg_dump (etc) long running                     
applications. Because of no one will be "interested" in SCN of                 
READ ONLY transactions - such transactions can make private copy               
of the queue part and after this queue could be truncated...                   
                                                                               
Having 4 bytes per SCN enable to use special values to mark                    
corresponding transaction as running or aborted and avoid pg_log               
lookup when we need in both SCN and state of transaction.                      
                                                                               
...Well, it's time to sleep :)                                                 
                                                                               
To be continued...                                                             
                                                                               
Comments ?                                                                     
                                                                               
Vadim
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 1998-07-16 01:22:33 | Re: [INTERFACES] Re: [HACKERS] changes in 6.4 | 
| Previous Message | Bruce Momjian | 1998-07-15 23:39:05 | Re: [INTERFACES] Re: [HACKERS] changes in 6.4 |