Re: Loading 500m json files to database

Поиск

Список

Период

Сортировка

От	Rob Sargent
Тема	Re: Loading 500m json files to database
Дата	23 марта 2020 г. 18:16:40
Msg-id	acf05c41-3f9d-64fd-3823-bfd09bbeb009@gmail.com обсуждение исходный текст
Ответ на	Loading 500m json files to database (pinker <pinker@onet.eu>)
Список	pgsql-general

Дерево обсуждения


On 3/23/20 4:24 AM, pinker wrote:
> Hi, do you have maybe idea how to make loading process faster?
>
> I have 500 millions of json files (1 json per file) that I need to load to
> db.
> My test set is "only" 1 million files.
>
> What I came up with now is:
>
> time for i in datafiles/*; do
>    psql -c "\copy json_parts(json_data) FROM $i"&
> done
>
> which is the fastest so far. But it's not what i expect. Loading 1m of data
> takes me ~3h so loading 500 times more is just unacceptable.
>
> some facts:
> * the target db is on cloud so there is no option to do tricks like turning
> fsync off
> * version postgres 11
> * i can spin up huge postgres instance if necessary in terms of cpu/ram
> * i tried already hash partitioning (to write to 10 different tables instead
> of 1)
>
>
> Any ideas?
>
Most advanced languages have a bulk copy implementation.  I've found this to be blindingly fast when the receiving
tablehas no indices, constraints.  It's not clear how large your files are, but you might take this time to
"normalized"them: extract any id, datatype, etc into table attributes.
 

> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
>
>

В списке pgsql-general по дате отправления:

Предыдущее

От: Radu Radutiu
Дата: 23 марта 2020 г., 17:46:02
Сообщение: Runtime partition pruning

Следующее

От: Julien Rouhaud
Дата: 23 марта 2020 г., 18:22:47
Сообщение: Re: PG12 autovac issues

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Loading 500m json files to database

Предыдущее

Следующее