Quick Links

Re: Index Scans become Seq Scans after VACUUM ANALYSE

From:	Mark Pritchard <mark(dot)pritchard(at)tangent(dot)net(dot)au>
To:	Mark Pritchard <mark(at)tangent(dot)net(dot)au>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Luis Alberto Amigo Navarro <lamigo(at)atc(dot)unican(dot)es>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Index Scans become Seq Scans after VACUUM ANALYSE
Date:	2002-04-18 01:51:38
Message-ID:	1019094698.25540.116.camel@set.tangent.net.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Apologies for the naff double post, but I meant to add that obviously
the figures for the solaris box are bogus after the first run...imagine
a file system cache of an entire 2gb file. I tried creating a file of
4gb on this box, but it bombed with a "file too large error".
Unfortunately, I can't rip memory out of this box as I don't have
exclusive access.

On Thu, 2002-04-18 at 11:49, Mark Pritchard wrote:
> I threw together the attached program (compiles fine with gcc 2.95.2 on
> Solaris 2.6 and egcs-2.91.66 on RedHat Linux 6.2) and ran it a few
> times. Data is below. Usual disclaimers about hastily written code etc
> :)
>
> Machine = ghoul (generic intel, 384mb ram, dual p3-800, ide disk running
> dma)
>
> Sequential
> Bytes Read Time Bytes / Sec
> 536870912 27.14 19783933.74
> 536870912 27.14 19783990.60
> 536870912 27.11 19801872.14
> 536870912 26.92 19942928.41
> 536870912 27.31 19657408.43
> 19794026.66 (avg)
>
> Random
> Bytes Read Time Bytes / Sec
> 1073741824 519.57 2066589.21
> 1073741824 517.78 2073751.44
> 1073741824 516.92 2077193.23
> 1073741824 513.18 2092333.29
> 1073741824 510.68 2102579.88
> 2082489.41 (avg)
>
> Machine = jedi (Sun E420, 3gb ram, dual 400s, test on single scsi disk)
>
> Sequential
> Bytes Read Time Bytes / Sec
> 2097152000 65.19 32167675.28
> 2097152000 65.22 32154114.65
> 2097152000 65.16 32182561.99
> 2097152000 65.12 32206105.12
> 2097152000 64.67 32429463.26
> 32227984.06 (avg)
>
> Random
> Bytes Read Time Bytes / Sec
> 4194304000 1522.22 2755394.79
> 4194304000 278.18 15077622.05
> 4194304000 91.43 45874730.07
> 4194304000 61.43 68273795.19
> 4194304000 54.55 76890231.51
> 41774354.72
>
> If I interpret Tom's "divide" instruction correctly, is that a factor of
> 10 on the linux box?
>
> On Thu, 2002-04-18 at 01:16, Tom Lane wrote:
> > "Luis Alberto Amigo Navarro" <lamigo(at)atc(dot)unican(dot)es> writes:
> > > On my own few experience I think this could be solved decreasing
> > > random_page_cost, if you would prefer to use indexes than seq scans, then
> > > you can lower random_page_cost to a point in which postgres works as you
> > > want. So the planner would prefer indexes when in standard conditions it
> > > would prefer seq scans.
> >
> > It's entirely possible that the default value of random_page_cost is too
> > high, at least for many modern machines. The experiments I did to get
> > the 4.0 figure were done a couple years ago, on hardware that wasn't
> > exactly new at the time. I have not heard of anyone else trying to
> > measure it though.
> >
> > I don't think I have the source code I used anymore, but the principle
> > is simple enough:
> >
> > 1. Make a large file (several times the size of your machine's RAM, to
> > ensure you swamp out kernel disk buffering effects). Fill with random
> > data. (NB: do not fill with zeroes, some filesystems optimize this away.)
> >
> > 2. Time reading the file sequentially, 8K per read request.
> > Repeat enough to get a statistically trustworthy number.
> >
> > 3. Time reading randomly-chosen 8K pages from the file. Repeat
> > enough to get a trustworthy number (the total volume of pages read
> > should be several times the size of your RAM).
> >
> > 4. Divide.
> >
> > The only tricky thing about this is making sure you are measuring disk
> > access times and not being fooled by re-accessing pages the kernel still
> > has cached from a previous access. (The PG planner does try to account
> > for caching effects, but that's a separate estimate; the value of
> > random_page_cost isn't supposed to include caching effects.) AFAIK the
> > only good way to do that is to use a large test, which means it takes
> > awhile to run; and you need enough spare disk space for a big test file.
> >
> > It'd be interesting to get some numbers for this across a range of
> > hardware, filesystems, etc ...
> >
> > regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
> >
>
>
> ----
>

> #include <errno.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <time.h>
> #include <sys/stat.h>
> #include <sys/time.h>
>
> /**
> * Constants
> */
>
> #define BLOCK_SIZE (8192)
>
> /**
> * Prototypes
> */
>
> // Creates the test file filled with random data
> void createTestFile(char *testFileName, long long fileSize);
>
> // Handles runtime errors by displaying the function, activity and error number
> void handleError(char *functionName, char *activity);
>
> // Standard entry point
> int main(int argc, char *args[]);
>
> // Prints correct usage and quits
> void printUsageAndQuit();
>
> // Tests performance of random reads of the given file
> void testRandom(char *testFileName, long long amountToRead);
>
> // Tests performance of sequential reads of the given file
> void testSeq(char *testFileName);
>
> /**
> * Definitions
> */
>
> /**
> * createTestFile()
> */
> void createTestFile(char *testFileName, long long fileSize)
> {
> FILE *testFile;
> long long reps, i, j, bufferReps;
> time_t timetmp;
> long long *buffer;
> size_t written;
>
> // Indicate op
> printf("Creating test file %s of %lld mb\n",testFileName,fileSize);
>
> // Adjust file size to bytes
> fileSize *= (1024*1024);
>
> // Allocate a buffer for writing out random long longs
> if (!(buffer = malloc(BLOCK_SIZE)))
> handleError("createTestFile()","malloc");
>
> // Open the file for writing
> if (!(testFile = fopen(testFileName, "wb")))
> handleError("createTestFile()","fopen");
>
> // Initialise the random number generator
> srandom(time(NULL));
>
> // Write data
> reps = fileSize / BLOCK_SIZE;
> bufferReps = BLOCK_SIZE / sizeof(long long);
> for (i = 0; i < reps; i++)
> {
> // Fill buffer with random data
> for (j = 0; j < bufferReps; j++)
> buffer[j] = random();
>
> // Write
> written = fwrite(buffer, sizeof(long long), bufferReps, testFile);
> if (written != bufferReps)
> handleError("createTestFile()","fwrite");
> }
>
> // Flush and close
> if (fflush(testFile))
> handleError("createTestFile()","fflush");
> if (fclose(testFile))
> handleError("createTestFile()","fclose");
>
> // Free buffer
> free(buffer);
> }
>
> /**
> * handleError()
> */
> void handleError(char *functionName, char *activity)
> {
> fprintf(stderr, "Error in %s while attempting %s. Error %d (%s)\n", functionName, activity, errno, strerror(errno));
> exit(1);
> }
>
> /**
> * main()
> */
> int main(int argc, char *argv[])
> {
> // Print usage and quit if argument count is definitely incorrect
> if (argc < 3)
> {
> // Definitely wrong
> printUsageAndQuit();
> }
> else
> {
> // Dispatch
> if (!strcmp(argv[1], "create"))
> {
> if (argc != 4)
> printUsageAndQuit();
>
> // Create the test file of the specified size
> createTestFile(argv[2], atol(argv[3]));
> }
> else if (!strcmp(argv[1], "seqtest"))
> {
> if (argc != 3)
> printUsageAndQuit();
>
> // Test performance of sequential reads
> testSeq(argv[2]);
> }
> else if (!strcmp(argv[1], "rndtest"))
> {
> if (argc != 4)
> printUsageAndQuit();
>
> // Test performance of random reads
> testRandom(argv[2], atol(argv[3]));
> }
> else
> {
> // Unknown command
> printUsageAndQuit();
> }
> }
>
> return 0;
> }
>
> /**
> * printUsageAndQuit()
> */
> void printUsageAndQuit()
> {
> puts("USAGE: rndpgcst [create <file> <size_in_mb>] | [seqtest <file>] | [rndtest <file> <read_in_mb>]");
>
> exit(1);
> }
>
> /**
> * testSeq()
> */
> void testSeq(char *testFileName)
> {
> FILE *testFile;
> char *buffer;
> long long reps, totalRead, thisRead, timeTaken;
> struct timeval startTime, endTime;
> struct timezone timezoneDiscard;
>
> // Indicate op
> printf("Sequential read test of %s\n",testFileName);
>
> // Grab a buffer
> buffer = malloc(BLOCK_SIZE);
>
> // Open the file for reading
> if (!(testFile = fopen(testFileName, "rb")))
> handleError("testSeq()","fopen");
>
> // Start timer
> if (gettimeofday(&startTime, &timezoneDiscard) == -1)
> handleError("testSeq()", "gettimeofday start");
>
> // Read all data from file
> totalRead = 0;
> while ((thisRead = fread(buffer, 1, BLOCK_SIZE, testFile)) != 0)
> totalRead += thisRead;
>
> // End timer
> if (gettimeofday(&endTime, &timezoneDiscard) == -1)
> handleError("testSeq()", "gettimeofday start");
>
> // Close
> if (fclose(testFile))
> handleError("testSeq()","fclose");
>
> // Free the buffer
> free(buffer);
>
> // Display time taken
> timeTaken = (endTime.tv_sec - startTime.tv_sec) * 1000000;
> timeTaken += (endTime.tv_usec - startTime.tv_usec);
> printf("%lld bytes read in %f seconds\n", totalRead, (double) timeTaken / (double) 1000000);
> }
>
> /**
> * testRandom()
> */
> void testRandom(char *testFileName, long long amountToRead)
> {
> FILE *testFile;
> long long reps, i, fileSize, timeTaken, totalRead, readPos, thisRead, offsetMax;
> struct stat fileStats;
> char *buffer;
> struct timeval startTime, endTime;
> struct timezone timezoneDiscard;
>
> // Indicate op
> printf("Random read test of %s for %lld mb\n", testFileName, amountToRead);
>
> // Initialise the random number generator
> srandom(time(NULL));
>
> // Adjust amount to read
> amountToRead *= (1024*1024);
>
> // Determine file size
> if (stat(testFileName, &fileStats) == -1)
> handleError("testRandom()", "stat");
> fileSize = fileStats.st_size;
>
> // Grab a buffer
> buffer = malloc(BLOCK_SIZE);
>
> // Open the file for reading
> if (!(testFile = fopen(testFileName, "rb")))
> handleError("testRandom()","fopen");
>
> // Start timer
> if (gettimeofday(&startTime, &timezoneDiscard) == -1)
> handleError("testRandom()", "gettimeofday start");
>
> // Read data from file
> reps = amountToRead / BLOCK_SIZE;
> offsetMax = fileSize / BLOCK_SIZE;
> for (i = 0; i < reps; i++)
> {
> // Determine read position
> readPos = (random() % offsetMax) * BLOCK_SIZE;
>
> // Seek and read
> if (fseek(testFile, readPos, SEEK_SET) == -1)
> handleError("testRandom()","fseek");
> if ((thisRead = fread(buffer, 1, BLOCK_SIZE, testFile)) != BLOCK_SIZE)
> handleError("testRandom()","fread");
> }
>
> // End timer
> if (gettimeofday(&endTime, &timezoneDiscard) == -1)
> handleError("testRandom()", "gettimeofday start");
>
> // Close
> if (fclose(testFile))
> handleError("testRandom()","fclose");
>
> // Free the buffer
> free(buffer);
>
> // Display time taken
> timeTaken = (endTime.tv_sec - startTime.tv_sec) * 1000000;
> timeTaken += (endTime.tv_usec - startTime.tv_usec);
> printf("%lld bytes read in %f seconds\n", amountToRead, (double) timeTaken / (double) 1000000);
> }

In response to

Re: Index Scans become Seq Scans after VACUUM ANALYSE at 2002-04-18 01:49:03 from Mark Pritchard

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Lockhart	2002-04-18 02:06:45	Re: Index Scans become Seq Scans after VACUUM ANALYSE
Previous Message	Mark Pritchard	2002-04-18 01:49:03	Re: Index Scans become Seq Scans after VACUUM ANALYSE