Re: [HACKERS] 8.3beta1 testing on Solaris

From: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] 8.3beta1 testing on Solaris
Date: 2007-10-26 15:45:26
Message-ID: 47220B96.4080009@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Tom,

Here is what I did:

I started aggregating all read information:

First I also had added group by pid (arg0,arg1, pid) and the counts
were all coming as 1

Then I just grouped by filename and location (arg0,arg1 of reads) and
the counts came back as

# cat read.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname=="postgres"/
{
@read[fds[arg0].fi_pathname, arg1] = count();
}

# ./read.d
dtrace: script './read.d' matched 1 probe
^C

/export/home0/igen/pgdata/pg_clog/0014
-2753028293472 1
/export/home0/igen/pgdata/pg_clog/0014
-2753028277088 1
/export/home0/igen/pgdata/pg_clog/0015
-2753028244320 2
/export/home0/igen/pgdata/pg_clog/0015
-2753028268896 14
/export/home0/igen/pgdata/pg_clog/0015
-2753028260704 25
/export/home0/igen/pgdata/pg_clog/0015
-2753028252512 27
/export/home0/igen/pgdata/pg_clog/0015
-2753028277088 28
/export/home0/igen/pgdata/pg_clog/0015
-2753028293472 37

FYI I pressed ctrl-c within like less than a second

So to me this seems that multiple processes are reading the same page
from different pids. (This was with about 600 suers active.

Aparently we do have a problem that we are reading the same buffer
address again. (Same as not being cached anywhere or not finding it in
cache anywhere).

I reran lock wait script on couple of processes and did not see
CLogControlFileLock as a problem..

# ./83_lwlock_wait.d 14341

Lock Id Mode Count
WALInsertLock Exclusive 1
ProcArrayLock Exclusive 16

Lock Id Combined Time (ns)
WALInsertLock 383109
ProcArrayLock 198866236

# ./83_lwlock_wait.d 14607

Lock Id Mode Count
WALInsertLock Exclusive 2
ProcArrayLock Exclusive 15

Lock Id Combined Time (ns)
WALInsertLock 55243
ProcArrayLock 69700140

#

What will help you find out why it is reading the same page again?

-Jignesh

Jignesh K. Shah wrote:
> I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is
> just avoiding the symptom to a later value.. I promise to look more
> into it before making any recommendations to increase NUM_CLOG_BUFFERs.
>
>
> Because though "iGen" showed improvements in that area by increasing
> num_clog_buffers , EAStress had shown no improvements.. Plus the
> reason I think this is not the problem in 8.3beta1 since the Lock
> Output clearly does not show CLOGControlFile as to be the issue which
> I had seen in earlier case. So I dont think that increasing
> NUM_CLOG_BUFFERS will change thing here.
>
> Now I dont understand the code pretty well yet I see three hotspots
> and not sure if they are related to each other
> * ProcArrayLock waits - causing Waits as reported by
> 83_lockwait.d script
> * SimpleLRUReadPage - causing read IOs as reported by
> iostat/rsnoop.d
> * GetSnapshotData - causing CPU utilization as reported by hotuser
>
> But I will shut up and do more testing.
>
> Regards,
> Jignesh
>
>
>
> Tom Lane wrote:
>> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>
>>> Actually, 32 made a significant difference as I recall ... do you
>>> still have the figures for that, Jignesh?
>>>
>>
>> I'd want to see a new set of test runs backing up any call for a change
>> in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that
>> benchmarks using code from a few months back shouldn't carry a lot of
>> weight.
>>
>> regards, tom lane
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2007-10-26 16:24:40 Re: 8.2.3: Server crashes on Windows using Eclipse/Junit
Previous Message Sebastien FLAESCH 2007-10-26 14:01:35 Re: PostgreSQL 8.3, libpq and WHERE CURRENT OF

Browse pgsql-performance by date

  From Date Subject
Next Message Jignesh K. Shah 2007-10-26 16:46:14 Re: [HACKERS] 8.3beta1 testing on Solaris
Previous Message Greg Smith 2007-10-26 15:08:07 Re: Bunching "transactions"