Serial vs. Parallel Queries

From: "Paul D(dot) Boyle" <boyle(at)laue(dot)chem(dot)ncsu(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Cc: boyle(at)laue(dot)chem(dot)ncsu(dot)edu (Paul D(dot) Boyle)
Subject: Serial vs. Parallel Queries
Date: 1998-06-11 20:03:50
Message-ID: 199806112003.QAA19591@laue.chem.ncsu.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I wrote a PostgreSQL client (using libpq) which queries 7 different
tables in a database I have constructed using PostgreSQL 6.2.1. The
queries are more or less independent of one another so I thought this
would be more efficient if I did some parallel or multiprocess programming.
The program forks off a child for each table searched, and if necessary
the child and parent use SysV messages queues to communicate. To see how
much of an improvement over a serialized search, I added in some conditional
compilation directives which used a single process sequential search of
the seven tables needed for the returned results. I then did some timings
using the 'time' command.

The difference was not as big as I had hoped, and the there were several
curiosities on which I'd like peoples' opinions. I ran the searches in
parallel/serial pairs both the client and backend were on (different)
linux boxes connected on our departmental subnet ethernet. Here are
the timings (10 runs each):

/* Parallel Search version: */
time sinfo x97101
0.010u 0.000s 0:04.55 0.2% 0+0k 0+0io 128pf+0w
0.010u 0.010s 0:03.03 0.6% 0+0k 0+0io 128pf+0w
0.010u 0.000s 0:05.44 0.1% 0+0k 0+0io 128pf+0w
0.010u 0.000s 0:03.55 0.2% 0+0k 0+0io 128pf+0w
0.010u 0.000s 0:05.56 0.1% 0+0k 0+0io 128pf+0w
0.000u 0.010s 0:04.08 0.2% 0+0k 0+0io 128pf+0w
0.010u 0.010s 0:04.77 0.4% 0+0k 0+0io 128pf+0w<- *.txt deleted before run
0.020u 0.010s 0:04.66 0.6% 0+0k 0+0io 128pf+0w<- *.txt deleted before run
0.010u 0.010s 0:05.93 0.3% 0+0k 0+0io 128pf+0w<- *.txt deleted before run
0.010u 0.010s 0:03.36 0.5% 0+0k 0+0io 128pf+0w<- *.txt deleted before run
Averages:
0.01 0.006 0:04.47 0.32%

/* Serialized Search Version: */
time sinfo_serial x97101
0.070u 0.020s 0:06.02 1.4% 0+0k 0+0io 180pf+0w
0.040u 0.030s 0:06.04 1.1% 0+0k 0+0io 180pf+0w
0.050u 0.030s 0:06.04 1.3% 0+0k 0+0io 180pf+0w
0.040u 0.020s 0:06.13 0.9% 0+0k 0+0io 180pf+0w
0.070u 0.000s 0:06.07 1.1% 0+0k 0+0io 180pf+0w
0.060u 0.020s 0:06.05 1.3% 0+0k 0+0io 180pf+0w
0.090u 0.040s 0:06.05 2.1% 0+0k 0+0io 180pf+0w<- *.txt deleted before run
0.060u 0.020s 0:06.04 1.3% 0+0k 0+0io 180pf+0w<- *.txt deleted before run
0.060u 0.030s 0:06.10 1.4% 0+0k 0+0io 180pf+0w<- *.txt deleted before run
0.050u 0.020s 0:06.11 1.1% 0+0k 0+0io 180pf+0w<- *.txt deleted before run
Averages:
0.059 0.023 0:06.07 1.3%

The *.txt files mentioned are the text files produced by the program.

I was expecting the wall clock time (col 3) to be much shorter for the
parallel search than for the serialized search, I was also expecting
the wall clock time to be less than it is for either case. The user
cpu and system cpu time are both much less than the wall clock time.
My hypothesis is that the wall clock time is more related to either the
network latency (probably not much) and/or the granularity of the record
locking done by postgresql. My guess is that postgresql uses a fairly
coarse locking mechanism. I would like to know if my "explanation" is
correct. In any case, I would appreciate it if someone could supply a
discussion of how postgresql locks records during a query.

The queries I am doing with this client are "read-only" (i.e. SELECT's).
Is there anyway to improve performance say with the -F switch invoked
during the clients' query?

Thanks,

Paul

--
Paul D. Boyle | boyle(at)laue(dot)chem(dot)ncsu(dot)edu
Director, X-ray Structural Facility | phone: (919) 515-7362
Department of Chemistry - Box 8204 | FAX: (919) 515-5079
North Carolina State University |
Raleigh, NC, 27695-8204
http://laue.chem.ncsu.edu/web/xray.welcome.html

Browse pgsql-general by date

  From Date Subject
Next Message Ricardo Romero 1998-06-11 23:09:57 BIG PROBLEM
Previous Message Jackson, DeJuan 1998-06-11 18:57:43 RE: [GENERAL] Sequences : getting back the nextval() result on an insert