High-Performance SQL Cluster Engine PGSpider 3.0.0 released

Posted on 2023-01-20 by Toshiba
Related Open Source

We have just released PGSpider v3.0.0.

PGSpider is High-Performance SQL Cluster Engine for distributed big data.
PGSpider can access a number of data sources using Foreign Data Wrapper(FDW) and retrieves the distributed data source vertically.
Usage of PGSpider is the same as PostgreSQL. You can use any client applications such as libpq and psql.

This release improves following items :

  • Based on PostgreSQL 15.0
  • Supports modification on multi tenant table
  • Supports pushdown the combination of SQL clauses: JOIN, WHERE, aggregates, ORDER BY and LIMIT|OFFSET
  • Developed MULTI TENANT command
  • Developed PGSpider performance measurement

PGSpider supports following features :

Multi-Tenant :
User can get records in multi tables by one SQL easily. If there are tables with similar schema in each data source, PGSpider can view them as a single virtual table: We call it as Multi-Tenant table.
About an insertion for Multi-tenant, data is inserted into the first found data source (This specification will be improved).

Parallel processing :
PGSpider executes queries and fetches results from child nodes in parallel.
PGSpider expands Multi-Tenant table to child tables, creates new threads for each child table to access corresponding data source.

Pushdown :
WHERE clause, aggregation functions, SQL function in the target list are pushed down to child nodes.
JOIN is also pushed down if all tables in a query are located in a single data source.
Pushdown to Multi-tenant tables occur error when AVG, STDDEV and VARIANCE are used. PGSPider improves this error, PGSpider can execute them. JOIN also pushed down if all tables in a query are located in a single data source.

This is developed by Toshiba Software Engineering & Technology Center.
Please see the repository for details, and tell us feedback.

Source repository : https://github.com/pgspider/pgspider

Best Regards,
TAIGA Katayama