Hello hackers,

I’d like to propose an optimization for index creation on large tables.

Currently, when creating multiple indexes on the same table, each index is built independently, triggering a full table scan per index. This leads to significant redundant I/O, especially for very large tables.

For example, suppose we have a 1 TB table and we need to create 10 indexes. If each index takes 1 hour to build (mostly due to the time spent scanning the table), the total time ends up being around 10 hours. However, the data scan part is largely repeated work.

I believe this process could be optimized by introducing a mechanism that builds multiple indexes using **a single shared full table scan**. This way, the table is read once, and the relevant index data is routed to multiple build pipelines concurrently.

If implemented, this could potentially cut the total index build time in half or better, depending on system resources and the number of indexes.

I’m curious:

- Has this been discussed before?

- Are there any technical reasons why this wouldn’t be feasible?

- Would such a patch be of interest to the community?

Thanks,

Ildar Garaev