Re: Avoiding tuple construction/deconstruction during joining

From: Miroslav Šulc <miroslav(dot)sulc(at)startnet(dot)cz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, pgsql-performance(at)postgreSQL(dot)org
Subject: Re: Avoiding tuple construction/deconstruction during joining
Date: 2005-03-14 19:11:43
Message-ID: 4235E1EF.2000402@startnet.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Tom Lane wrote:

>>So I have some results. I have tested the query on both PostgreSQL 8.0.1
>>and MySQL 4.1.8 with LIMIT set to 30 and OFFSET set to 6000. PostgreSQL
>>result is 11,667.916 ms, MySQL result is 448.4 ms.
>>
>>
>
>That's a fairly impressive discrepancy :-(, and even the slot_getattr()
>patch that Atsushi Ogawa provided isn't going to close the gap.
>(I got about a 4x speedup on Miroslav's example in my testing, which
>leaves us still maybe 6x slower than MySQL.)
>
>
As I wrote, the comparison is not "fair". Here are the conditions:

"Both databases are running on the same machine (my laptop) and contain
the same data. However there are some differences in the data table
definitions:
1) in PostgreSQL I use 'varchar(1)' for a lot of fields and in MySQL I
use 'enum'
2) in PostgreSQL in some cases I use connection fields that are not of
the same type (smallint <-> integer (SERIAL)), in MySQL I use the same
types"

For those not used to MySQL, enum is an integer "mapped" to a text
string (that's how I see it). That means that when you have field such
as enum('yes','no','DK'), in the table there are stored numbers 1, 2 and
3 which are mapped to the text values 'yes', 'no' and 'DK'. The
description is not accurate (I'm not MySQL programmer, I didn't check it
recently and I didn't inspect the code - I wouldn't understand it
either) but I think it's not that important. What is important is the
fact that MySQL has to work with some dozen fields that are numbers and
PostgreSQL has to work with the same fields as varchar(). Some of the
other fields are also varchars. This might (or might not) cause the
speed difference. However I think that if devs figure out how to speed
this up, other cases will benefit from the improvement too.

As I understood from the contributions of other, the 2) shouldn't have a
great impact on the speed.

>Looking at the post-patch profile for the test case, there is still
>quite a lot of cycles going into tuple assembly and disassembly:
>
>Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 24.47 4.49 4.49 _mcount
> 8.01 5.96 1.47 9143692 0.00 0.00 ExecEvalVar
> 6.92 7.23 1.27 6614373 0.00 0.00 slot_deformtuple
> 6.54 8.43 1.20 9143692 0.00 0.00 slot_getattr
> 6.21 9.57 1.14 103737 0.01 0.03 ExecTargetList
> 5.56 10.59 1.02 103775 0.01 0.01 DataFill
> 3.22 11.18 0.59 103775 0.01 0.01 ComputeDataSize
> 2.83 11.70 0.52 ExecEvalVar
> 2.72 12.20 0.50 9094122 0.00 0.00 memcpy
> 2.51 12.66 0.46 encore
> 2.40 13.10 0.44 427448 0.00 0.00 nocachegetattr
> 2.13 13.49 0.39 103775 0.00 0.02 heap_formtuple
> 2.07 13.87 0.38 noshlibs
> 1.20 14.09 0.22 225329 0.00 0.00 _doprnt
> 1.20 14.31 0.22 msquadloop
> 1.14 14.52 0.21 chunks
> 0.98 14.70 0.18 871885 0.00 0.00 AllocSetAlloc
> 0.98 14.88 0.18 $$dyncall
> 0.76 15.02 0.14 594242 0.00 0.00 FunctionCall3
> 0.71 15.15 0.13 213312 0.00 0.00 comparetup_heap
> 0.65 15.27 0.12 6364 0.02 0.13 printtup
> 0.60 15.38 0.11 790702 0.00 0.00 pfree
>
>(_mcount is profiling overhead, ignore it.) It looks to me like just
>about everything in the top dozen functions is there as a result of the
>fact that join steps form new tuples that are the merge of their input
>tuples. Even our favorite villains, palloc and pfree, are down in the
>sub-percent range.
>
>I am guessing that the reason MySQL wins on this is that they avoid
>doing any data copying during a join step. I wonder whether we could
>accomplish the same by taking Ogawa's patch to the next level: allow
>a TupleTableSlot to contain either a "materialized" tuple as now,
>or a "virtual" tuple that is simply an array of Datums and null flags.
>(It's virtual in the sense that any pass-by-reference Datums would have
>to be pointing to data at the next level down.) This would essentially
>turn the formtuple and deformtuple operations into no-ops, and get rid
>of a lot of the associated overhead such as ComputeDataSize and
>DataFill. The only operations that would have to forcibly materialize
>a tuple would be ones that need to keep the tuple till after they fetch
>their next input tuple --- hashing and sorting are examples, but very
>many plan node types don't ever need to do that.
>
>I haven't worked out the details, but it seems likely that this could be
>a relatively nonintrusive patch. The main thing that would be an issue
>would be that direct reference to slot->val would become verboten (since
>you could no longer be sure there was a materialized tuple there).
>I think this would possibly affect some contrib stuff, which is a strong
>hint that it'd break some existing user-written code out there.
>
>Thoughts?
>
>
Mine thought is that I don't know what you are talking about :-) Now
seriously, I am far below this level of knowledge. But I can contribute
a test that (maybe) can help. I have rewritten the query so it JOINs the
varchar() fields (in fact all fields except the IDPK) at the last INNER
JOIN. Though there is one more JOIN, the query is more than 5 times
faster (1975.312 ms) :-)

So my silly opinion is that if the planner could decide that there are
too much time expensive fields that are not needed during performing
JOINs and these could be JOINed at the last step, it would do it this
way :-)

Below is the adjusted query and the EXPLAIN ANALYZE output. (Tom, you
can run it on the data I have sent you and it should run without changes.)

SELECT AdDevicesSites.IDPK, AdDevicesSites2.AdDevicesSiteSizeIDFK,
AdDevicesSites2.AdDevicesSiteRegionIDFK,
AdDevicesSites2.AdDevicesSiteCountyIDFK,
AdDevicesSites2.AdDevicesSiteCityIDFK,
AdDevicesSites2.AdDevicesSiteDistrictIDFK,
AdDevicesSites2.AdDevicesSiteStreetIDFK,
AdDevicesSites2.AdDevicesSiteStreetDescriptionIDFK,
AdDevicesSites2.AdDevicesSitePositionIDFK,
AdDevicesSites2.AdDevicesSiteVisibilityIDFK,
AdDevicesSites2.AdDevicesSiteStatusTypeIDFK,
AdDevicesSites2.AdDevicesSitePartnerIdentificationOperatorIDFK,
AdDevicesSites2.AdDevicesSitePartnerElectricitySupplierIDFK,
AdDevicesSites2.AdDevicesSitePartnerMaintainerIDFK,
AdDevicesSites2.AdDevicesSitePartnerStickerIDFK,
AdDevicesSites2.CadastralUnitIDFK, AdDevicesSites2.MediaType,
AdDevicesSites2.Mark, AdDevicesSites2.Amount, AdDevicesSites2.Distance,
AdDevicesSites2.OwnLightening, AdDevicesSites2.LocationDownTown,
AdDevicesSites2.LocationSuburb,
AdDevicesSites2.LocationBusinessDistrict,
AdDevicesSites2.LocationResidentialDistrict,
AdDevicesSites2.LocationIndustrialDistrict,
AdDevicesSites2.LocationNoBuildings, AdDevicesSites2.ParkWayHighWay,
AdDevicesSites2.ParkWayFirstClassRoad, AdDevicesSites2.ParkWayOtherRoad,
AdDevicesSites2.ParkWayStreet, AdDevicesSites2.ParkWayAccess,
AdDevicesSites2.ParkWayExit, AdDevicesSites2.ParkWayParkingPlace,
AdDevicesSites2.ParkWayPassangersOnly, AdDevicesSites2.ParkWayCrossRoad,
AdDevicesSites2.PositionStandAlone,
AdDevicesSites2.NeighbourhoodPublicTransportation,
AdDevicesSites2.NeighbourhoodInterCityTransportation,
AdDevicesSites2.NeighbourhoodPostOffice,
AdDevicesSites2.NeighbourhoodNewsStand,
AdDevicesSites2.NeighbourhoodAmenities,
AdDevicesSites2.NeighbourhoodSportsSpot,
AdDevicesSites2.NeighbourhoodHealthServiceSpot,
AdDevicesSites2.NeighbourhoodShops,
AdDevicesSites2.NeighbourhoodShoppingCenter,
AdDevicesSites2.NeighbourhoodSuperMarket,
AdDevicesSites2.NeighbourhoodPetrolStation,
AdDevicesSites2.NeighbourhoodSchool, AdDevicesSites2.NeighbourhoodBank,
AdDevicesSites2.NeighbourhoodRestaurant,
AdDevicesSites2.NeighbourhoodHotel,
AdDevicesSites2.RestrictionCigarettes,
AdDevicesSites2.RestrictionPolitics, AdDevicesSites2.RestrictionSpirits,
AdDevicesSites2.RestrictionSex, AdDevicesSites2.RestrictionOther,
AdDevicesSites2.RestrictionNote, AdDevicesSites2.SpotMapFile,
AdDevicesSites2.SpotPhotoFile, AdDevicesSites2.SourcePhotoTimeStamp,
AdDevicesSites2.SourceMapTimeStamp, AdDevicesSites2.Price,
AdDevicesSites2.WebPrice, AdDevicesSites2.CadastralUnitCode,
AdDevicesSites2.BuildingNumber, AdDevicesSites2.ParcelNumber,
AdDevicesSites2.GPSLatitude, AdDevicesSites2.GPSLongitude,
AdDevicesSites2.GPSHeight, AdDevicesSites2.MechanicalOpticalCoordinates,
AdDevicesSites2.Deleted, AdDevicesSites2.Protected,
AdDevicesSites2.DateCreated, AdDevicesSites2.DateLastModified,
AdDevicesSites2.DateDeleted, AdDevicesSites2.CreatedByUserIDFK,
AdDevicesSites2.LastModifiedByUserIDFK,
AdDevicesSites2.DeletedByUserIDFK,
AdDevicesSites2.PhotoLastModificationDate,
AdDevicesSites2.MapLastModificationDate,
AdDevicesSites2.DateLastImported, AdDevicesSiteRegions.Name AS
AdDevicesSiteRegionName, AdDevicesSiteCounties.Name AS
AdDevicesSiteCountyName, AdDevicesSiteCities.Name AS
AdDevicesSiteCityName, AdDevicesSiteStreets.Name AS
AdDevicesSiteStreetName, AdDevicesSiteDistricts.Name AS
AdDevicesSiteDistrictName, AdDevicesSiteStreetDescriptions.Name_cs AS
AdDevicesSiteStreetDescriptionName_cs,
AdDevicesSiteStreetDescriptions.Name_en AS
AdDevicesSiteStreetDescriptionName_en, AdDevicesSiteSizes.Name AS
AdDevicesSiteSizeName, SUBSTRING(AdDevicesSiteVisibilities.Name_cs, 3)
AS AdDevicesSiteVisibilityName_cs,
SUBSTRING(AdDevicesSiteVisibilities.Name_en, 3) AS
AdDevicesSiteVisibilityName_en, AdDevicesSitePositions.Name_cs AS
AdDevicesSitePositionName_cs, AdDevicesSitePositions.Name_en AS
AdDevicesSitePositionName_en, AdDevicesSiteStatusTypes.Name_cs AS
AdDevicesSiteStatusTypeName_cs, AdDevicesSiteStatusTypes.Name_en AS
AdDevicesSiteStatusTypeName_en, PartnerIdentificationsOperator.Name AS
PartnerIdentificationOperatorName, PartnersElectricitySupplier.Name AS
PartnerElectricitySupplierName, PartnersMaintainer.Name AS
PartnerMaintainerName, PartnersSticker.Name AS PartnerStickerName,
CadastralUnits.Code AS CadastralUnitCodeNative, CadastralUnits.Name AS
CadastralUnitName
FROM AdDevicesSites
LEFT JOIN AdDevicesSiteRegions ON AdDevicesSites.AdDevicesSiteRegionIDFK
= AdDevicesSiteRegions.IDPK
LEFT JOIN AdDevicesSiteCounties ON
AdDevicesSites.AdDevicesSiteCountyIDFK = AdDevicesSiteCounties.IDPK
LEFT JOIN AdDevicesSiteCities ON AdDevicesSites.AdDevicesSiteCityIDFK =
AdDevicesSiteCities.IDPK
LEFT JOIN AdDevicesSiteStreets ON AdDevicesSites.AdDevicesSiteStreetIDFK
= AdDevicesSiteStreets.IDPK
LEFT JOIN AdDevicesSiteStreetDescriptions ON
AdDevicesSites.AdDevicesSiteStreetDescriptionIDFK =
AdDevicesSiteStreetDescriptions.IDPK
LEFT JOIN AdDevicesSiteDistricts ON
AdDevicesSites.AdDevicesSiteDistrictIDFK = AdDevicesSiteDistricts.IDPK
LEFT JOIN AdDevicesSiteSizes ON AdDevicesSites.AdDevicesSiteSizeIDFK =
AdDevicesSiteSizes.IDPK
LEFT JOIN AdDevicesSiteVisibilities ON
AdDevicesSites.AdDevicesSiteVisibilityIDFK = AdDevicesSiteVisibilities.IDPK
LEFT JOIN AdDevicesSitePositions ON
AdDevicesSites.AdDevicesSitePositionIDFK = AdDevicesSitePositions.IDPK
LEFT JOIN AdDevicesSiteStatusTypes ON
AdDevicesSites.AdDevicesSiteStatusTypeIDFK = AdDevicesSiteStatusTypes.IDPK
LEFT JOIN PartnerIdentifications AS PartnerIdentificationsOperator ON
AdDevicesSites.AdDevicesSitePartnerIdentificationOperatorIDFK =
PartnerIdentificationsOperator.IDPK
LEFT JOIN Partners AS PartnersElectricitySupplier ON
AdDevicesSites.AdDevicesSitePartnerElectricitySupplierIDFK =
PartnersElectricitySupplier.IDPK
LEFT JOIN Partners AS PartnersMaintainer ON
AdDevicesSites.AdDevicesSitePartnerMaintainerIDFK = PartnersMaintainer.IDPK
LEFT JOIN Partners AS PartnersSticker ON
AdDevicesSites.AdDevicesSitePartnerStickerIDFK = PartnersSticker.IDPK
LEFT JOIN CadastralUnits ON AdDevicesSites.CadastralUnitIDFK =
CadastralUnits.IDPK
INNER JOIN AdDevicesSites AS AdDevicesSites2 ON AdDevicesSites.IDPK =
AdDevicesSites2.IDPK

QUERY PLAN

Hash Join (cost=193867.68..235417.92 rows=142556 width=815) (actual time=1577.080..2200.677 rows=6364 loops=1)

Hash Cond: ("outer".idpk = "inner".idpk)

-> Seq Scan on addevicessites addevicessites2 (cost=0.00..13118.56 rows=142556 width=473) (actual time=40.650..49.195 rows=6364 loops=1)

-> Hash (cost=186898.29..186898.29 rows=142556 width=350) (actual time=1534.080..1534.080 rows=0 loops=1)

-> Merge Right Join (cost=184758.38..186898.29 rows=142556 width=350) (actual time=1187.653..1244.955 rows=6364 loops=1)

Merge Cond: ("outer".idpk = "inner".cadastralunitidfk)

-> Index Scan using cadastralunits_pkey on cadastralunits (cost=0.00..314.49 rows=13027 width=31) (actual time=0.034..0.128 rows=63 loops=1)

-> Sort (cost=184758.38..185114.77 rows=142556 width=325) (actual time=1187.582..1190.111 rows=6364 loops=1)

Sort Key: addevicessites.cadastralunitidfk

-> Hash Left Join (cost=93887.04..143074.76 rows=142556 width=325) (actual time=502.584..1129.145 rows=6364 loops=1)

Hash Cond: ("outer".addevicessitepartnerstickeridfk = "inner".idpk)

-> Hash Left Join (cost=93884.28..140933.65 rows=142556 width=307) (actual time=502.388..1075.572 rows=6364 loops=1)

Hash Cond: ("outer".addevicessitepartnermaintaineridfk = "inner".idpk)

-> Hash Left Join (cost=93881.51..138792.55 rows=142556 width=289) (actual time=502.208..1023.802 rows=6364 loops=1)

Hash Cond: ("outer".addevicessitepartnerelectricitysupplieridfk = "inner".idpk)

-> Hash Left Join (cost=93878.75..136651.45 rows=142556 width=271) (actual time=502.041..969.959 rows=6364 loops=1)

Hash Cond: ("outer".addevicessitepartneridentificationoperatoridfk = "inner".idpk)

-> Nested Loop Left Join (cost=93875.99..134510.35 rows=142556 width=253) (actual time=501.849..915.256 rows=6364 loops=1)

Join Filter: ("outer".addevicessitestatustypeidfk = "inner".idpk)

-> Nested Loop Left Join (cost=93874.93..118471.74 rows=142556 width=228) (actual time=501.826..818.436 rows=6364 loops=1)

Join Filter: ("outer".addevicessitepositionidfk = "inner".idpk)

-> Nested Loop Left Join (cost=93873.90..108848.18 rows=142556 width=207) (actual time=501.802..737.137 rows=6364 loops=1)

Join Filter: ("outer".addevicessitevisibilityidfk = "inner".idpk)

-> Merge Right Join (cost=93872.86..96017.09 rows=142556 width=177) (actual time=501.741..554.834 rows=6364 loops=1)

Merge Cond: ("outer".idpk = "inner".addevicessitesizeidfk)

-> Index Scan using addevicessitesizes_pkey on addevicessitesizes (cost=0.00..5.62 rows=110 width=14) (actual time=0.009..0.264 rows=110 loops=1)

-> Sort (cost=93872.86..94229.25 rows=142556 width=169) (actual time=501.697..504.267 rows=6364 loops=1)

Sort Key: addevicessites.addevicessitesizeidfk

-> Hash Left Join (cost=57752.91..65764.23 rows=142556 width=169) (actual time=234.321..466.130 rows=6364 loops=1)

Hash Cond: ("outer".addevicessitedistrictidfk = "inner".idpk)

-> Hash Left Join (cost=57743.17..63616.15 rows=142556 width=164) (actual time=233.456..421.267 rows=6364 loops=1)

Hash Cond: ("outer".addevicessitestreetdescriptionidfk = "inner".idpk)

-> Hash Left Join (cost=57634.86..62707.43 rows=142556 width=137) (actual time=223.396..362.945 rows=6364 loops=1)

Hash Cond: ("outer".addevicessitestreetidfk = "inner".idpk)

-> Hash Left Join (cost=57566.95..61844.20 rows=142556 width=119) (actual time=217.101..312.605 rows=6364 loops=1)

Hash Cond: ("outer".addevicessitecityidfk = "inner".idpk)

-> Merge Right Join (cost=57561.85..59700.76 rows=142556 width=110) (actual time=216.635..266.672 rows=6364 loops=1)

Merge Cond: ("outer".idpk = "inner".addevicessitecountyidfk)

-> Sort (cost=6.19..6.48 rows=117 width=17) (actual time=0.350..0.389 rows=116 loops=1)

Sort Key: addevicessitecounties.idpk

-> Seq Scan on addevicessitecounties (cost=0.00..2.17 rows=117 width=17) (actual time=0.008..0.146 rows=117 loops=1)

-> Sort (cost=57555.66..57912.05 rows=142556 width=99) (actual time=216.250..218.611 rows=6364 loops=1)

Sort Key: addevicessites.addevicessitecountyidfk

-> Merge Right Join (cost=33573.63..35712.03 rows=142556 width=99) (actual time=173.646..199.036 rows=6364 loops=1)

Merge Cond: ("outer".idpk = "inner".addevicessiteregionidfk)

-> Sort (cost=1.44..1.48 rows=15 width=23) (actual time=0.055..0.059 rows=13 loops=1)

Sort Key: addevicessiteregions.idpk

-> Seq Scan on addevicessiteregions (cost=0.00..1.15 rows=15 width=23) (actual time=0.016..0.032 rows=15 loops=1)

-> Sort (cost=33572.19..33928.58 rows=142556 width=82) (actual time=173.559..176.398 rows=6364 loops=1)

Sort Key: addevicessites.addevicessiteregionidfk

-> Seq Scan on addevicessites (cost=0.00..13118.56 rows=142556 width=82) (actual time=62.345..164.783 rows=6364 loops=1)

-> Hash (cost=4.48..4.48 rows=248 width=17) (actual time=0.283..0.283 rows=0 loops=1)

-> Seq Scan on addevicessitecities (cost=0.00..4.48 rows=248 width=17) (actual time=0.011..0.162 rows=138 loops=1)

-> Hash (cost=60.53..60.53 rows=2953 width=34) (actual time=6.229..6.229 rows=0 loops=1)

-> Seq Scan on addevicessitestreets (cost=0.00..60.53 rows=2953 width=34) (actual time=0.010..3.816 rows=2984 loops=1)

-> Hash (cost=96.85..96.85 rows=4585 width=43) (actual time=10.017..10.017 rows=0 loops=1)

-> Seq Scan on addevicessitestreetdescriptions (cost=0.00..96.85 rows=4585 width=43) (actual time=0.008..6.371 rows=4585 loops=1)

-> Hash (cost=8.59..8.59 rows=459 width=21) (actual time=0.815..0.815 rows=0 loops=1)

-> Seq Scan on addevicessitedistricts (cost=0.00..8.59 rows=459 width=21) (actual time=0.007..0.541 rows=382 loops=1)

-> Materialize (cost=1.04..1.08 rows=4 width=36) (actual time=0.000..0.002 rows=4 loops=6364)

-> Seq Scan on addevicessitevisibilities (cost=0.00..1.04 rows=4 width=36) (actual time=0.026..0.035 rows=4 loops=1)

-> Materialize (cost=1.03..1.06 rows=3 width=27) (actual time=0.000..0.001 rows=3 loops=6364)

-> Seq Scan on addevicessitepositions (cost=0.00..1.03 rows=3 width=27) (actual time=0.007..0.010 rows=3 loops=1)

-> Materialize (cost=1.05..1.10 rows=5 width=31) (actual time=0.000..0.002 rows=5 loops=6364)

-> Seq Scan on addevicessitestatustypes (cost=0.00..1.05 rows=5 width=31) (actual time=0.006..0.016 rows=5 loops=1)

-> Hash (cost=2.61..2.61 rows=61 width=34) (actual time=0.144..0.144 rows=0 loops=1)

-> Seq Scan on partneridentifications partneridentificationsoperator (cost=0.00..2.61 rows=61 width=34) (actual time=0.007..0.103 rows=61 loops=1)

-> Hash (cost=2.61..2.61 rows=61 width=34) (actual time=0.122..0.122 rows=0 loops=1)

-> Seq Scan on partners partnerselectricitysupplier (cost=0.00..2.61 rows=61 width=34) (actual time=0.004..0.072 rows=61 loops=1)

-> Hash (cost=2.61..2.61 rows=61 width=34) (actual time=0.136..0.136 rows=0 loops=1)

-> Seq Scan on partners partnersmaintainer (cost=0.00..2.61 rows=61 width=34) (actual time=0.005..0.086 rows=61 loops=1)

-> Hash (cost=2.61..2.61 rows=61 width=34) (actual time=0.143..0.143 rows=0 loops=1)

-> Seq Scan on partners partnerssticker (cost=0.00..2.61 rows=61 width=34) (actual time=0.009..0.098 rows=61 loops=1)

Total runtime: 2210.937 ms

> regards, tom lane
>
>
Miroslav

Attachment Content-Type Size
miroslav.sulc.vcf text/x-vcard 387 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-03-14 19:19:14 Re: [pgsql-hackers-win32] snprintf causes regression tests to fail
Previous Message Bruce Momjian 2005-03-14 18:55:16 Re: [pgsql-hackers-win32] snprintf causes regression tests

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-03-14 19:22:35 Re: Avoiding tuple construction/deconstruction during joining
Previous Message Jan Wieck 2005-03-14 18:55:50 Re: column name is "LIMIT"