Parallel Sort
От | Noah Misch |
---|---|
Тема | Parallel Sort |
Дата | |
Msg-id | 20130513142859.GC171500@tornado.leadboat.com обсуждение исходный текст |
Ответы |
Re: Parallel Sort
Re: Parallel Sort Re: Parallel Sort Re: Parallel Sort Re: Parallel Sort Re: Parallel Sort Re: Parallel Sort Re: Parallel Sort Re: Parallel Sort |
Список | pgsql-hackers |
It would be great if one client session could take advantage of multiple CPU cores. EnterpriseDB wishes to start the trek into this problem space for 9.4 by implementing parallel internal (i.e. not spilling to disk) sort. This touches on a notable subset of the infrastructure components we'll need for parallel general query. My intent is to map out the key design topics, hear about critical topics I hadn't considered, and solicit feedback on the quality of the high-level plan. Full designs for key pieces will come later. * Worker Backends A multi-process model, rather than a multi-thread, is best for introducing parallelism to a PostgreSQL session. With threading, all in-memory state would become shared by default; with processes, in-memory data structures remain unshared until we explicitly select them for sharing. Some data structures will require new sharing, but I expect unshared ones to remain the typical case. I refer to these additional processes as worker backends. They will have much in common with ordinary client backends, including a database connection and a PGPROC. They will have no client socket; instead, they will take direction from the client-connected backend, termed the master, via shared memory. Each worker needs to make SnapshotNow visibility decisions coherent with the master. For sorting, this allows us to look up comparison functions, even when the current transaction created or modified those functions. This will also be an essential building block for any parallelism project that consults user tables. Implementing this means copying the subtransaction stack and the combocid hash to each worker. For the sake of completeness, we should also copy the global MVCC snapshot data (sorting probably won't care). It also means forbidding, while a parallel task is in flight, operations that affect the transaction state: - CommandCounterIncrement() - GetCurrentCommandId(true) - AssignTransactionId() - subtransaction-management functions I don't intend to commit us to a design fundamentally antagonistic to, for example, starting a subtransaction in a worker backend. That being said, I think we could achieve a lot in the parallel query space before we would deem it important to relax those restrictions. Workers will copy all GUCs from the master. bttextcmp() needs lc_collate, and we'll inevitably reach the point of needing broad GUC coverage as we make more things run in parallel. Further GUC changes will be forbidden while a parallel task is in flight. (More generally, anything that mutates backend-local state without reference to shared state will either need to be forbidden or converted to route through shared state.) The heavyweight locking mechanism will need to be aware of the association between the master and its workers. For sorting, workers will acquire heavyweight locks on system catalogs only. It is almost enough for the workers to behave as though they are independent of the master for locking purposes. But an undetected deadlock would arise if the master holds an AccessExclusiveLock on a system catalog a worker needs. We can allocate a small amount of permanent shared memory for coordination among a group of processes, but sorting will benefit from a region as large as maintenance_work_mem. Expect on-demand memory sharing. * Identifying Parallel-Compatible Functions Not all functions can reasonably run on a worker backend. We should not presume that a VOLATILE function can tolerate the unstable execution order imposed by parallelism, though a function like clock_timestamp() is perfectly reasonable to run that way. STABLE does not have that problem, but neither does it constitute a promise that the function implementation is compatible with parallel execution. Consider xid_age(), which would need code changes to operate correctly in parallel. IMMUTABLE almost guarantees enough; there may come a day when all IMMUTABLE functions can be presumed parallel-safe. For now, an IMMUTABLE function could cause trouble by starting a (read-only) subtransaction. The bottom line is that parallel-compatibility needs to be separate from volatility classes for the time being. I'm not sure what the specific answer here should look like. Simply having a CREATE FUNCTION ... PARALLEL_IS_FINE flag is not entirely satisfying, because the rules are liable to loosen over time. * Planner & Similar Issues We should decide whether to actually sort in parallel based on the comparator cost and the data size. The system currently has no information on comparator cost: bt*cmp (and indeed almost all built-in functions) all have procost=1, but bttextcmp is at least 1000x slower than btint4cmp. Let's improve the procost estimates of all core B-tree and hash operators. This will have other benefits, but we will need to be cognizant of the risk of upsetting setups that have tuned cpu_operator_cost based on the present situation. The choice of whether to parallelize can probably be made a manner similar to the choice to do an external sort: the planner guesses the outcome for costing purposes, but the actual decision is made at execution time. The planner would determine a tuple count cutoff at which parallelism becomes favorable, and tuplesort would check that to establish its actual decision. Use of parallelism within one process has a distributed effect on the system, similar to the use of work_mem. Add a PGC_USERSET GUC representing the number of worker processes the current session is willing to use. It would default to a small number, say zero or two. On a throughput-oriented system with high concurrent query counts, the DBA would tend to set/leave it at zero in postgresql.conf; parallelism can only help if the system has free resources. Separately, add a GUC limiting the total number of workers across the system at any one time. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Noah MischДата:
Сообщение: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize