Instructions ============ CSV File -------- The land registry file from http://data.gov.uk is 3.2GB. A CSV file that can be loaded into PostgreSQL that has organic data. No registration required. See https://theodi.org/blog/the-status-of-csvs-on-datagovuk for details on downloaded the file pp-complete.csv. SQL --- begin; create table land_registry_price_paid_uk( transaction uuid, price numeric, transfer_date date, postcode text, property_type char(1), newly_built boolean, duration char(1), paon text, saon text, street text, locality text, city text, district text, county text, ppd_category_type char(1)); copy land_registry_price_paid_uk FROM '/home/pg/Downloads/pp-complete.csv' with (format csv, freeze true, encoding 'win1252', header false, null '', quote '"', force_null (postcode, saon, paon, street, locality, city, district)); commit; Resulting table --------------- postgres=# \dt+ List of relations Schema │ Name │ Type │ Owner │ Size │ Description ────────┼─────────────────────────────┼───────┼───────┼─────────┼───────────── public │ land_registry_price_paid_uk │ table │ pg │ 2779 MB │ (1 row) Interesting Indexes =================== Many attribute index (Low cardinality leading attributes): postgres=# create index on land_registry_price_paid_uk_suffix(county, district, city, locality, street); UUID pk index (UUID type, high cardinality): postgres=# create index on land_registry_price_paid_uk (transaction); Price index (numeric, moderate cardinality): postgres=# create index on land_registry_price_paid_uk (price); Preview ======= pg@hamster:~$ head ~/Downloads/pp-complete.csv "{0C7ADEF5-878D-4066-B785-0000003ED74A}","163000","2003-02-21 00:00","UB5 4PJ","T","N","F","106","","READING ROAD","NORTHOLT","NORTHOLT","EALING","GREATER LONDON","A" "{35F67271-ABD4-40DA-AB09-00000085B9D3}","247500","2005-07-15 00:00","TA19 9DD","D","N","F","58","","ADAMS MEADOW","ILMINSTER","ILMINSTER","SOUTH SOMERSET","SOMERSET","A" "{B20B1C74-E8E1-4137-AB3E-0000011DF342}","320000","2010-09-10 00:00","W4 1DZ","F","N","L","58","","WHELLOCK ROAD","","LONDON","EALING","GREATER LONDON","A" "{7D6B0915-C56B-4275-AF9B-00000156BCE7}","104000","1997-08-27 00:00","NE61 2BH","D","N","F","17","","WESTGATE","MORPETH","MORPETH","CASTLE MORPETH","NORTHUMBERLAND","A" "{47B60101-B64C-413D-8F60-000002F1692D}","147995","2003-05-02 00:00","PE33 0RU","D","N","F","4","","MASON GARDENS","WEST WINCH","KING'S LYNN","KING'S LYNN AND WEST NORFOLK","NORFOLK","A" "{51F797CA-7BEB-4958-821F-000003E464AE}","110000","2013-03-22 00:00","NR35 2SF","T","N","F","5","","WILD FLOWER WAY","DITCHINGHAM","BUNGAY","SOUTH NORFOLK","NORFOLK","A" "{AB8CE5AE-DFED-46CE-A682-000004533355}","180000","2003-12-23 00:00","SA8 4DG","D","Y","F","17","","OAKWOOD DRIVE","CLYDACH","SWANSEA","NEATH PORT TALBOT","NEATH PORT TALBOT","A" "{1FB4E294-102B-475C-876A-000005E78B1E}","177000","2012-05-29 00:00","BS4 5AW","S","N","F","11","","WATER LANE","BRISLINGTON","BRISTOL","CITY OF BRISTOL","CITY OF BRISTOL","A" "{3B0DA29C-C89A-4FAA-918A-0000074FA0E0}","190000","2013-10-04 00:00","SN14 8LU","T","N","F","148","","HIGH STREET","MARSHFIELD","CHIPPENHAM","SOUTH GLOUCESTERSHIRE","SOUTH GLOUCESTERSHIRE","A" "{9F669E50-AF1E-4E8A-9A66-0000075006C7}","68000","1999-05-21 00:00","CR0 2JJ","S","N","F","83","","WHITEHORSE ROAD","","CROYDON","CROYDON","GREATER LONDON","A" Notes from UK Land Registry authority: ====================================== Transaction unique identifier A reference number which is generated automatically recording each published sale. The number is unique and will change each time a sale is recorded. Price Sale price stated on the transfer deed. Date of Transfer Date when the sale was completed, as stated on the transfer deed. Postcode This is the postcode used at the time of the original transaction. Note that postcodes can be reallocated and these changes are not reflected in the Price Paid Dataset. Property Type D = Detached, S = Semi-Detached, T = Terraced, F = Flats/Maisonettes, O = Other Note that: - we only record the above categories to describe property type, we do not separately identify bungalows. - end-of-terrace properties are included in the Terraced category above. - ‘Other’ is only valid where the transaction relates to a property type that is not covered by existing values. Old/New Indicates the age of the property and applies to all price paid transactions, residential and non-residential. Y = a newly built property, N = an established residential building Duration Relates to the tenure: F = Freehold, L= Leasehold etc. Note that Land Registry does not record leases of 7 years or less in the Price Paid Dataset. PAON Primary Addressable Object Name. If there is a sub-building for example the building is divided into flats, see Secondary Addressable Object Name (SAON). SAON Secondary Addressable Object Name. If there is a sub-building, for example the building is divided into flats, there will be a SAON. Street Locality Town/City District County PPD Category Type Indicates the type of Price Paid transaction. A = Standard Price Paid entry, includes single residential property sold for full market value. B = Additional Price Paid entry including transfers under a power of sale/repossessions, buy-to-lets (where they can be identified by a Mortgage) and transfers to non-private individuals. Note that category B does not separately identify the transaction types stated. Land Registry has been collecting information on Category A transactions from January 1995. Category B transactions were identified from October 2013. Record Status - monthly file only Indicates additions, changes and deletions to the records.(see guide below). A = Addition C = Change D = Delete. Note that where a transaction changes category type due to misallocation (as above) it will be deleted from the original category type and added to the correct category with a new transaction unique identifier.