| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | Bruce Guenter <bruceg(at)em(dot)ca> | 
| Cc: | pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: Using Threads? | 
| Date: | 2000-12-05 05:06:41 | 
| Message-ID: | 13424.975992801@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Bruce Guenter <bruceg(at)em(dot)ca> writes:
> [ some very interesting datapoints ]
>
> So, forking a process with lots of data is expensive.  However, most of
> the PostgreSQL data is in a SysV IPC shared memory segment, which
> shouldn't affect the fork numbers.
I believe (but don't have numbers to prove it) that most of the present
backend startup time has *nothing* to do with thread vs process
overhead.  Rather, the primary startup cost has to do with initializing
datastructures, particularly the system-catalog caches.  A backend isn't
going to get much real work done until it's slurped in a useful amount
of catalog cache --- for example, until it's got the cache entries for
pg_class and the indexes thereon, it's not going to accomplish anything
at all.
Switching to a thread model wouldn't help this cost a bit, unless
we also switch to a shared cache model.  That's not necessarily a win
when you consider the increased costs associated with cross-backend
or cross-thread synchronization needed to access or update the cache.
And if it *is* a win, we could get most of the same benefit in the
multiple-process model by keeping the cache in shared memory.
The reason that a new backend has to do all this setup work for itself,
rather than inheriting preloaded cache entries via fork/copy-on-write
from the postmaster, is that the postmaster isn't part of the ring of
processes that can access the database files directly.  That was done
originally for robustness reasons: since the PM doesn't have to deal
with database access, cache invalidation messages, etc etc yadda yadda,
it is far simpler and less likely to crash than a real backend.  If we
conclude that shared syscache is not a reasonable idea, it might be
interesting to look into making the PM into a full-fledged backend
that maintains a basic set of cache entries, so that these entries are
immediately available to new backends.  But we'd have to take a real
hard look at the implications for system robustness/crash recovery.
In any case I think we're a long way away from the point where switching
to threads would make a big difference in connection startup time.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thomas Lockhart | 2000-12-05 05:29:36 | Re: beta testing version | 
| Previous Message | Tom Samplonius | 2000-12-05 04:43:24 | Re: Using Threads? |