Обсуждение: Question about use_physical_tlist() which is applied on Scan path
Hi hackers,
I have a question about `use_physical_tlist()` which is applied in `create_scan_plan()`:
```
if (flags == CP_IGNORE_TLIST)
{
tlist = NULL;
}
else if (use_physical_tlist(root, best_path, flags))
{
if (best_path->pathtype == T_IndexOnlyScan)
{
/* For index-only scan, the preferred tlist is the index's */
tlist = copyObject(((IndexPath *) best_path)->indexinfo->indextlist);
/*
* Transfer sortgroupref data to the replacement tlist, if
* requested (use_physical_tlist checked that this will work).
*/
if (flags & CP_LABEL_TLIST)
apply_pathtarget_labeling_to_tlist(tlist, best_path->pathtarget);
}
else
{
tlist = build_physical_tlist(root, rel);
……
```
And the comment above the code block says:
```
/*
* For table scans, rather than using the relation targetlist (which is
* only those Vars actually needed by the query), we prefer to generate a
* tlist containing all Vars in order. This will allow the executor to
* optimize away projection of the table tuples, if possible.
*
* But if the caller is going to ignore our tlist anyway, then don't
* bother generating one at all. We use an exact equality test here, so
* that this only applies when CP_IGNORE_TLIST is the only flag set.
*/
```
But for some column-oriented database based on Postgres, it may help a lot in case of projection of the table tuples in execution? And is there any other optimization considerations behind this design?
e.g. If we have such table definition and a query:
```
CREATE TABLE partsupp
(PS_PARTKEY INT,
PS_SUPPKEY INT,
PS_AVAILQTY INTEGER,
PS_SUPPLYCOST DECIMAL(15,2),
PS_COMMENT VARCHAR(199),
dummy text);
explain analyze verbose select sum(ps_supplycost * ps_availqty) from partsupp;
```
And the planner would give such plan:
```
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Aggregate (cost=12.80..12.81 rows=1 width=32) (actual time=0.013..0.015 rows=1 loops=1)
Output: sum((ps_supplycost * (ps_availqty)::numeric))
-> Seq Scan on public.partsupp (cost=0.00..11.60 rows=160 width=22) (actual time=0.005..0.005 rows=0 loops=1)
Output: ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, ps_comment, dummy
Planning Time: 0.408 ms
Execution Time: 0.058 ms
(6 rows)
```
It looks the columns besides `ps_supplycost` and `ps_availqty` are not necessary, but fetched from tuples all at once. For the row-based storage such as heap, it looks fine, but for column-based storage, it would result into unnecessary overhead and impact performance. Is there any plan to optimize here?
Thanks.
On 2023-Jul-26, Jian Guo wrote: > It looks the columns besides `ps_supplycost` and `ps_availqty` are not > necessary, but fetched from tuples all at once. For the row-based > storage such as heap, it looks fine, but for column-based storage, it > would result into unnecessary overhead and impact performance. Is > there any plan to optimize here? I suppose that, at some point, it is going to have to be the table AM the one that makes the decision. That is, use_physical_tlist would have to involve some new flag in path->parent->amflags to determine whether to skip using a physical tlist. Right now, we don't have any columnar stores, so there's no way to verify an implementation. If you do have a columnar store implementation, you're welcome to share it. -- Álvaro Herrera PostgreSQL Developer "I am amazed at [the pgsql-sql] mailing list for the wonderful support, and lack of hesitasion in answering a lost soul's question, I just wished the rest of the mailing list could be like this." (Fotis) (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)
On 2023-Jul-26, Jian Guo wrote:
> It looks the columns besides `ps_supplycost` and `ps_availqty` are not
> necessary, but fetched from tuples all at once. For the row-based
> storage such as heap, it looks fine, but for column-based storage, it
> would result into unnecessary overhead and impact performance. Is
> there any plan to optimize here?
I suppose that, at some point, it is going to have to be the table AM
the one that makes the decision. That is, use_physical_tlist would have
to involve some new flag in path->parent->amflags to determine whether
to skip using a physical tlist. Right now, we don't have any columnar
stores, so there's no way to verify an implementation. If you do have a
columnar store implementation, you're welcome to share it.
--
Álvaro Herrera PostgreSQL Developer
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this." (Fotis)
(http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)