Skip site navigation (1) Skip section navigation (2)

Re: Simple query, 10 million records...MySQL ten times faster

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
Cc: zardozrocks <zardozrocks(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Simple query, 10 million records...MySQL ten times faster
Date: 2007-04-27 04:46:21
Message-ID: Pine.LNX.4.64.0704270843020.12152@sn.sai.msu.ru (view raw or flat)
Thread:
Lists: pgsql-performance
Folks,

we in astronomy permanently work with billiards objects with spherical
atributes and have several sky-indexing schemes. See my page
for links http://www.sai.msu.su/~megera/wiki/SkyPixelization

We have q3c package for PostgreSQL available from q3c.sf.net, which 
we use in production with terabytes-sized database.

Oleg
On Thu, 26 Apr 2007, Scott Marlowe wrote:

> On Tue, 2007-04-24 at 16:26, zardozrocks wrote:
>> I have this table:
>>
>> CREATE TABLE test_zip_assoc (
>>     id serial NOT NULL,
>>     f_id integer DEFAULT 0 NOT NULL,
>>     lat_radians numeric(6,5) DEFAULT 0.00000 NOT NULL,
>>     long_radians numeric(6,5) DEFAULT 0.00000 NOT NULL
>> );
>
> Like someone else mentioned numeric types are SLOW.  See if you can use
> integers, or at least floats.
>
> I also wonder if you might be better served with geometric types and
> GiST indexes on them than using your lat / long grid.  With a geometric
> type, you could define the lat / long as a point and use geometric
> operations with it.
>
> See the pgsql manual:
>
> http://www.postgresql.org/docs/8.1/static/datatype-geometric.html
> http://www.postgresql.org/docs/8.1/static/functions-geometry.html
>
>> It's basically a table that associates some foreign_key (for an event,
>> for instance) with a particular location using longitude and
>> latitude.  I'm basically doing a simple proximity search.  I have
>> populated the database with *10 million* records.  I then test
>> performance by picking 50 zip codes at random and finding the records
>> within 50 miles with a query like this:
>
> I assume that there aren't 10 million zip codes, right?
>
> Are you storing the lat / long of the individual venues?  Or the zip
> codes?  If you're storing the lat / long of the zips, then I can't
> imagine there are 10 million zip codes.  If you could use the lat / long
> numbers to find the zip codes that are in your range, then join that to
> a venue table that fks off of the zip code table, I would think it would
> be much faster, as you'd have a smaller data set to trundle through.
>
>> SELECT id
>> 	FROM test_zip_assoc
>> 	WHERE
>> 		lat_radians > 0.69014816041
>> 		AND lat_radians < 0.71538026567
>> 		AND long_radians > -1.35446228028
>> 		AND long_radians < -1.32923017502
>>
>>
>> On my development server (dual proc/dual core Opteron 2.8 Ghz with 4GB
>> ram) this query averages 1.5 seconds each time it runs after a brief
>> warmup period.  In PostGreSQL it averages about 15 seconds.
>
> I wonder how well it would run if you had 10, 20, 30, 40 etc... users
> running it at the same time.  My guess is that you'll be very lucky to
> get anything close to linear scaling in any database.  That's because
> this is CPU / Memory bandwidth intensive, so it's gonna kill your DB.
> OTOH, if it was I/O bound you could throw more hardware at it (bb cache
> RAID controller, etc)
>
>> Both of those times are too slow.  I need the query to run in under a
>> second with as many as a billion records.  I don't know if this is
>> possible but I'm really hoping someone can help me restructure my
>> indexes (multicolumn?, multiple indexes with a 'where' clause?) so
>> that I can get this running as fast as possible.
>
> You're trying to do a whole lot of processing in a little time.  You're
> either gonna have to accept a less exact answer (i.e. base it on zip
> codes) or come up with some way of mining the data for the answers ahead
> of time, kind of like a full text search for lat and long.
>
> So, have you tried what I suggested about increasing shared_buffers and
> work_mem yet?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                http://www.postgresql.org/about/donate
>

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

pgsql-performance by date

Next:From: Greg SmithDate: 2007-04-27 04:50:55
Subject: Re: Fragmentation of WAL files
Previous:From: Andres RetzlaffDate: 2007-04-27 04:43:06
Subject: Usage up to 50% CPU

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group