Skip site navigation (1) Skip section navigation (2)

8.3beta1 testing on Solaris

From: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To: pgsql-performance(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: 8.3beta1 testing on Solaris
Date: 2007-10-25 20:24:18
Message-ID: 4720FB72.8060800@sun.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Update on my testing 8.3beta1 on Solaris.

* CLOG reads
* Asynchronous Commit benefit
* Hot CPU Utilization

Regards,
Jignesh

__Background_:_
We were using PostgreSQL 8.3beta1 testing on our latest Sun SPARC 
Enterprise T5220 Server using Solaris 10 8/07 and Sun Fire X4200 using 
Solaris 10 8/07. Generally for performance benefits in Solaris  we put 
file systems on forcedirectio we bypass the filesystem cache and go 
direct to disks.

__Problem_:_
What we were observing that there were lots of reads happening  about 
4MB/sec on the file system holding $PGDATA and the database tables 
during an OLTP Benchmark run. Initially we thought that our bufferpools 
were not big enough. But thanks to 64-bit builds  we could use bigger 
bufferpools. However even with extraordinary bufferpool sizes we still 
saw lots of reads going to the disks.

__DTrace to the Rescue_:_

I modified iosnoop.d to just snoop on reads. The modified rsnoop.d is as 
follows:
$ cat rsnoop.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname=="postgres"/
{
       printf("pid %d reading  %s\n", pid, fds[arg0].fi_pathname);
}

Based on it I found that most postgresql  processes were doing lots of 
reads from pg_clog directory.
CLOG or commit logs keep track of transactions in flight. Writes of CLOG 
comes from recording of transaction commits( or when it aborts) or when 
an XLOG is generated. However though I am not clear on reads yet, it 
seems every process constantly reads it to get some status. CLOG data is 
not cached in any PostgreSQL shared memory segments and hence becomes 
the bottleneck as it has to constantly go to the filesystem to get the 
read data.
# ./rsnoop.d
dtrace: script './rsnoop.d' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  49222                       read:entry pid 8739 reading  
/export/home0/igen/pgdata/pg_clog/000C

  0  49222                       read:entry pid 9607 reading  
/export/home0/igen/pgdata/pg_clog/000C

  0  49222                       read:entry pid 9423 reading  
/export/home0/igen/pgdata/pg_clog/000C

  0  49222                       read:entry pid 8731 reading  
/export/home0/igen/pgdata/pg_clog/000C

  0  49222                       read:entry pid 8719 reading  
/export/home0/igen/pgdata/pg_clog/000C

  0  49222                       read:entry pid 9019 reading  
/export/home0/igen/pgdata/pg_clog/000C

  1  49222                       read:entry pid 9255 reading  
/export/home0/igen/pgdata/pg_clog/000C

  1  49222                       read:entry pid 8867 reading  
/export/home0/igen/pgdata/pg_clog/000C


Later on  during another run I added  ustack() after the printf in the 
above script to get the function name also:

# ./rsnoop.d
dtrace: script './rsnoop.d' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  49222                       read:entry pid 10956 reading  
/export/home0/igen/pgdata/pg_clog/0011
              libc.so.1`_read+0xa
              postgres`SimpleLruReadPage+0x3e6
              postgres`SimpleLruReadPage_ReadOnly+0x9b
              postgres`TransactionIdGetStatus+0x1f
              postgres`TransactionIdDidCommit+0x42
              postgres`HeapTupleSatisfiesVacuum+0x21a
              postgres`heap_prune_chain+0x14b
              postgres`heap_page_prune_opt+0x1e6
              postgres`index_getnext+0x144
              postgres`IndexNext+0xe1
              postgres`ExecScan+0x189
              postgres`ExecIndexScan+0x43
              postgres`ExecProcNode+0x183
              postgres`ExecutePlan+0x9e
              postgres`ExecutorRun+0xab
              postgres`PortalRunSelect+0x47a
              postgres`PortalRun+0x262
              postgres`exec_execute_message+0x565
              postgres`PostgresMain+0xf45
              postgres`BackendRun+0x3f9

  0  49222                       read:entry pid 10414 reading  
/export/home0/igen/pgdata/pg_clog/0011
              libc.so.1`_read+0xa
              postgres`SimpleLruReadPage+0x3e6
              postgres`SimpleLruReadPage_ReadOnly+0x9b
              postgres`TransactionIdGetStatus+0x1f
              postgres`TransactionIdDidCommit+0x42
              postgres`HeapTupleSatisfiesVacuum+0x21a
              postgres`heap_prune_chain+0x14b
              postgres`heap_page_prune_opt+0x1e6
              postgres`index_getnext+0x144
              postgres`IndexNext+0xe1
              postgres`ExecScan+0x189
^C              libc.so.1`_read+0xa
              postgres`SimpleLruReadPage+0x3e6
              postgres`SimpleLruReadPage_ReadOnly+0x9b
              postgres`TransactionIdGetStatus+0x1f
              postgres`TransactionIdDidCommit+0x42
              postgres`HeapTupleSatisfiesMVCC+0x34f
              postgres`index_getnext+0x29e
              postgres`IndexNext+0xe1
              postgres`ExecScan+0x189
              postgres`ExecIndexScan+0x43
              postgres`ExecProcNode+0x183
              postgres`ExecutePlan+0x9e
              postgres`ExecutorRun+0xab
              postgres`PortalRunSelect+0x47a
              postgres`PortalRun+0x262
              postgres`exec_execute_message+0x565
              postgres`PostgresMain+0xf45
              postgres`BackendRun+0x3f9
              postgres`BackendStartup+0x271
              postgres`ServerLoop+0x259

  0  49222                       read:entry pid 10186 reading  
/export/home0/igen/pgdata/pg_clog/0011
              libc.so.1`_read+0xa
              postgres`SimpleLruReadPage+0x3e6
              postgres`SimpleLruReadPage_ReadOnly+0x9b
              postgres`TransactionIdGetStatus+0x1f
              postgres`TransactionIdDidCommit+0x42
              postgres`HeapTupleSatisfiesVacuum+0x21a
              postgres`heap_prune_chain+0x14b
              postgres`heap_page_prune_opt+0x1e6
              postgres`index_getnext+0x144
              postgres`IndexNext+0xe1
              postgres`ExecScan+0x189
              postgres`ExecIndexScan+0x43
              postgres`ExecProcNode+0x183
              postgres`ExecutePlan+0x9e
              postgres`ExecutorRun+0xab
              postgres`PortalRunSelect+0x47a
              postgres`PortalRun+0x262
              postgres`exec_execute_message+0x565
              postgres`PostgresMain+0xf45
              postgres`BackendRun+0x3f9


So multiple processes are reading the same file. In this case since the 
file system is told not to cache files, hence all read ios are being 
sent to the disk to read the file again.



__Workaround for the high reads on CLOG on Solaris_ :
_Start with the cluster $PGDATA on regular UFS (which is buffered and 
logging is enabled). Always create a new tablespace for your database on 
forcedirectio mounted file system which bypasses the file system cache.  
This allows all PostgreSQL CLOG files to be cached in UFS greatly 
reducing stress on the underlying storage. For writes to the best of my 
knowledge, PostgreSQL will still do fsync to force the writes the CLOGs 
onto the disks so it is consistent. But the reads are spared from going 
to the disks and returned from the cache.

__Result_:_
With rightly sized bufferpool now all database data can be in PostgreSQL 
cache and hence reads are spared from the tablespaces. As for PGDATA 
data, UFS will do the caching of CLOG files, etc and hence sparring 
reads from going to the disks again. In the end what we achieve is a 
right sized bufferpool where there are no reads required during a high 
OLTP environment and the disks are just busy doing the writes of updates 
and inserts.


_Asynchronous Commit_:

Also as requested by Josh, I tried out Asynchronous Commit in 8.3beta 1
I compared four scenarios on internal disks (the prime target)
1. Default options (commit_delay off, synchronous_commit=true)
2. With Commit_delay on
3. With Asynchronous and Commit_delay on
4. With Asynchronous commit but Commit_delay off
5. With Fsync off


In 8.2 I found compared to (1),( 2) gave me a huge boost (2X) but fsync 
would be eventually even 2.8X faster than (2)

In 8.3 hence I did not even test the default option and took (2) as my 
baseline run and found (3),(4),(5) pretty much gave me the similar boost 
2.55X over my baseline run (2) since eventually I was CPU bound on my 
box and IO ended up handling well.

(Though I found (5) was better in 8.2 compared to (5) in 8.3beta1 since 
it was getting CPU saturated slightly earlier)


_Additional CPU consumption Findings_:
In the lightweight OLTP Testing that was performed with about 1000 
users, with 8.3 with the above workaround in place for CLOG. I reached a 
scenario where the system was out of CPU resources with about 1000 
users. Anyway doing a quick profiling using the "hotuser" program 
available in the DTraceToolkit the top function is postgres`GetSnapshotData

# ./hotuser -p 10559

....
postgres`hash_seq_term                                   1   2.1%
postgres`SearchCatCache                                  2   4.2%
postgres`hash_search_with_hash_value                     4   8.3%
postgres`GetSnapshotData                                 6  12.5%



Also Lock Waits during the 1000 User run was as follows:

# ./83_lwlock_wait.d 9231

             Lock Id            Mode           Count
       WALInsertLock       Exclusive               1
       ProcArrayLock       Exclusive              19

             Lock Id   Combined Time (ns)
       WALInsertLock               428507
       ProcArrayLock           1009652776

# ./83_lwlock_wait.d 9153

             Lock Id            Mode           Count
     CLogControlLock       Exclusive               1
       WALInsertLock       Exclusive               1
       ProcArrayLock       Exclusive              15

             Lock Id   Combined Time (ns)
     CLogControlLock                25536
       WALInsertLock               397265
       ProcArrayLock            696894211

#


My Guess is that the ProcArrayLock is coming from the GetSnapShotData 
function ... or maybe caused by it.. But I guess I will let the experts 
comment ..

I am of the opionion that if we tune GetSnapShotData, then we should be 
able to handle more users.


So overall I think I am excited with the 8.3beta1 performance specially 
in terms of asynchronous_commit however to get the CPU performance in 
line with GetSnapshotData() and also fixing the CLOG reading problem 
occuring in SimpleLruRead() will greatly enhance the performance of 8.3 
for OLTP benchmarks.




Responses

pgsql-performance by date

Next:From: Chris BrowneDate: 2007-10-25 20:46:46
Subject: Re: Bunching "transactions"
Previous:From: Jean-David BeyerDate: 2007-10-25 19:15:31
Subject: Re: Bunching "transactions"

pgsql-hackers by date

Next:From: Stefan KaltenbrunnerDate: 2007-10-25 20:28:54
Subject: Re: Re: [COMMITTERS] pgsql: Extract catalog info for error reporting before an error actually
Previous:From: Alvaro HerreraDate: 2007-10-25 20:07:41
Subject: Autovacuum cancellation

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group