On Thu, Jan 28, 2016 at 10:50 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> If I would make a proof-of-concept patch with interface itself, it
>> seems to me file_fdw may be a good candidate for this enhancement.
>> It is not a field for postgres_fdw.
>>
> The attached patch is enhancement of FDW/CSP interface and PoC feature
> of file_fdw to scan source file partially. It was smaller enhancement
> than my expectations.
>
> It works as follows. This query tried to read 20M rows from a CSV file,
> using 3 background worker processes.
>
> postgres=# set max_parallel_degree = 3;
> SET
> postgres=# explain analyze select * from test_csv where id % 20 = 6;
> QUERY PLAN
> --------------------------------------------------------------------------------
> Gather (cost=1000.00..194108.60 rows=94056 width=52)
> (actual time=0.570..19268.010 rows=2000000 loops=1)
> Number of Workers: 3
> -> Parallel Foreign Scan on test_csv (cost=0.00..183703.00 rows=94056 width=52)
> (actual time=0.180..12744.655 rows=500000 loops=4)
> Filter: ((id % 20) = 6)
> Rows Removed by Filter: 9500000
> Foreign File: /tmp/testdata.csv
> Foreign File Size: 1504892535
> Planning time: 0.147 ms
> Execution time: 19330.201 ms
> (9 rows)
Could you try it not in parallel and then with 1, 2, 3, and 4 workers
and post the times for all?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company