Re: Async execution of postgres_fdw.
От | Kyotaro HORIGUCHI |
---|---|
Тема | Re: Async execution of postgres_fdw. |
Дата | |
Msg-id | 20150119.152416.57998318.horiguchi.kyotaro@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: Async execution of postgres_fdw. (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Список | pgsql-hackers |
Hello, that's a silly mistake. fetch_seize = 10000 in the v4 patch. This v5 patch is fixed at the point. > But the v4 patch mysteriously accelerates this query, 6.5 seconds. > > > =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c > > FROM ft1 AS x JOIN ft1 AS y on x.a = y.a; ... > > Execution time: 6512.043 ms fetch_size was 10000 at this run. I got about 13.0 seconds for fetch_size = 100, which is about 19% faster than the original. regards, -- Kyotaro Horiguchi NTT Open Source Software Center ======= 15 17:18:49 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp> > I revised the patch so that async scan will be done more > aggressively, and took execution time for two very simple cases. > > As the result, simple seq scan gained 5% and hash join of two > foreign tables gained 150%. (2.4 times faster). > > While measuring the performance, I noticed that each scan in a > query runs at once rather than alternating with each other in > many cases such as hash join or sorted joins and so. So I > modified the patch so that async fetch is done more > aggressively. The new v4 patch is attached. The following numbers > are taken based on it. > > ======== > Simple seq scan for the first test. > > > CREATE TABLE lt1 (a int, b timestamp, c text); > > CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost'); > > CREATE USER MAPPING FOR PUBLIC SERVER sv1; > > CREATE FOREIGN TABLE ft1 () SERVER sv1 OPTIONS (table_name 'lt1'); > > INSERT INTO lt1 (SELECT a, now(), repeat('x', 128) FROM generate_series(0, 999999) a); > > On this case, I took the the 10 times average of exec time of the > following query for both master head and patched version. The > fetch size is 100. > > > postgres=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1; > > QUERY PLAN > > ------------------------------------------------------------------ > > Foreign Scan on ft1 (actual time=0.79 5..4175.706 rows=1000000 loops=1) > > Planning time: 0.060 ms > > Execution time: 4276.043 ms > > master head : avg = 4256.621, std dev = 17.099 > patched pgfdw: avg = 4036.463, std dev = 2.608 > > The patched version is faster by about 5%. This should be pure > result of asynchronous fetching, not including the effect of > early starting of remote execution in ExecInit. > > Interestingly, as fetch_count gets larger, the gain raises in > spite of the decrease of the number of query sending. > > master head : avg = 2622.759, std dev = 38.379 > patched pgfdw: avg = 2277.622, std dev = 27.269 > > About 15% gain. And for 10000, > > master head : avg = 2000.980, std dev = 6.434 > patched pgfdw: avg = 1616.793, std dev = 13.192 > > 19%.. It is natural that exec time reduces along with increase of > fetch size, but I haven't found the reason why the patch's gain > also increases. > > ====================== > > The second case is a simple join of two foreign tables sharing > one connection. > > The master head runs this query in about 16 seconds with almost > no fluctuation among multiple tries. > > > =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c > > FROM ft1 AS x JOIN ft1 AS y on x.a = y.a; > > QUERY PLAN > > ---------------------------------------------------------------------------- > > Hash Join (actual time=7541.831..15924.631 rows=1000000 loops=1) > > Hash Cond: (x.a = y.a) > > -> Foreign Scan on ft1 x (actual time=1.176..6553.480 rows=1000000 loops=1) > > -> Hash (actual time=7539.761..7539.761 rows=1000000 loops=1) > > Buckets: 32768 Batches: 64 Memory Usage: 2829kB > > -> Foreign Scan on ft1 y (actual time=1.067..6529.165 rows=1000000 loops=1) > > Planning time: 0.223 ms > > Execution time: 15973.916 ms > > But the v4 patch mysteriously accelerates this query, 6.5 seconds. > > > =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c > > FROM ft1 AS x JOIN ft1 AS y on x.a = y.a; > > QUERY PLAN > > ---------------------------------------------------------------------------- > > Hash Join (actual time=2556.977..5812.937 rows=1000000 loops=1) > > Hash Cond: (x.a = y.a) > > -> Foreign Scan on ft1 x (actual time=32.689..1936.565 rows=1000000 loops=1) > > -> Hash (actual time=2523.810..2523.810 rows=1000000 loops=1) > > Buckets: 32768 Batches: 64 Memory Usage: 2829kB > > -> Foreign Scan on ft1 y (actual time=50.345..1928.411 rows=1000000 loops=1) > > Planning time: 0.220 ms > > Execution time: 6512.043 ms > > The result data seems not broken. I don't know the reason yet but > I'll investigate it.
В списке pgsql-hackers по дате отправления: