Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] 8.3beta1 testing on Solaris

From: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] 8.3beta1 testing on Solaris
Date: 2007-10-26 15:45:26
Message-ID: 47220B96.4080009@sun.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Tom,

Here is what I did:

I started aggregating all read information:

First I also had added group by pid    (arg0,arg1, pid) and the counts 
were all coming as 1

Then I just grouped by filename and location (arg0,arg1 of reads) and 
the counts came back as

# cat read.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname=="postgres"/
{
    @read[fds[arg0].fi_pathname, arg1] = count();
}


# ./read.d
dtrace: script './read.d' matched 1 probe
^C

  /export/home0/igen/pgdata/pg_clog/0014               
-2753028293472                1
  /export/home0/igen/pgdata/pg_clog/0014               
-2753028277088                1
  /export/home0/igen/pgdata/pg_clog/0015               
-2753028244320                2
  /export/home0/igen/pgdata/pg_clog/0015               
-2753028268896               14
  /export/home0/igen/pgdata/pg_clog/0015               
-2753028260704               25
  /export/home0/igen/pgdata/pg_clog/0015               
-2753028252512               27
  /export/home0/igen/pgdata/pg_clog/0015               
-2753028277088               28
  /export/home0/igen/pgdata/pg_clog/0015               
-2753028293472               37


FYI  I pressed ctrl-c within like less than a second

So to me this seems that multiple processes are reading the same page 
from different pids. (This was with about 600 suers active.

Aparently we do have a problem that we are reading the same buffer 
address again.  (Same as not being cached anywhere or not finding it in 
cache anywhere).

I reran lock wait script on couple of processes and did not see 
CLogControlFileLock  as a problem..

# ./83_lwlock_wait.d 14341

             Lock Id            Mode           Count
       WALInsertLock       Exclusive               1
       ProcArrayLock       Exclusive              16

             Lock Id   Combined Time (ns)
       WALInsertLock               383109
       ProcArrayLock            198866236

# ./83_lwlock_wait.d 14607

             Lock Id            Mode           Count
       WALInsertLock       Exclusive               2
       ProcArrayLock       Exclusive              15

             Lock Id   Combined Time (ns)
       WALInsertLock                55243
       ProcArrayLock             69700140

#

What will help you find out why it is reading the same page again?


-Jignesh



Jignesh K. Shah wrote:
> I agree with Tom..  somehow I think  increasing NUM_CLOG_BUFFERS is 
> just avoiding the symptom to a later value.. I promise to look more 
> into it before making any recommendations to increase NUM_CLOG_BUFFERs.
>
>
> Because though "iGen"  showed improvements in that area by increasing 
> num_clog_buffers , EAStress had shown no improvements.. Plus the 
> reason I think this is not the problem in 8.3beta1 since the Lock 
> Output clearly does not show CLOGControlFile as to be the issue which 
> I had seen in earlier case.  So I dont think that increasing 
> NUM_CLOG_BUFFERS will change thing here.
>
> Now I dont understand the code pretty well yet I see three hotspots 
> and not sure if they are related to each other
> * ProcArrayLock waits  - causing Waits          as reported by 
> 83_lockwait.d script
> * SimpleLRUReadPage - causing read IOs             as reported by 
> iostat/rsnoop.d
> * GetSnapshotData - causing CPU utilization  as reported by hotuser
>
> But I will shut up and do more testing.
>
> Regards,
> Jignesh
>
>
>
> Tom Lane wrote:
>> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>  
>>> Actually, 32 made a significant difference as I recall ... do you 
>>> still have the figures for that, Jignesh?
>>>     
>>
>> I'd want to see a new set of test runs backing up any call for a change
>> in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that
>> benchmarks using code from a few months back shouldn't carry a lot of
>> weight.
>>
>>             regards, tom lane
>>   
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

In response to

Responses

pgsql-performance by date

Next:From: Jignesh K. ShahDate: 2007-10-26 16:46:14
Subject: Re: [HACKERS] 8.3beta1 testing on Solaris
Previous:From: Greg SmithDate: 2007-10-26 15:08:07
Subject: Re: Bunching "transactions"

pgsql-hackers by date

Next:From: Dave PageDate: 2007-10-26 16:24:40
Subject: Re: 8.2.3: Server crashes on Windows using Eclipse/Junit
Previous:From: Sebastien FLAESCHDate: 2007-10-26 14:01:35
Subject: Re: PostgreSQL 8.3, libpq and WHERE CURRENT OF

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group