Обсуждение: 2 million queries against a table

Поиск

Список

Период

Сортировка

2 million queries against a table

От

Adam Sanchez

Дата:

15 июля 2020 г., 18:10:30

Hi

I need to run 2 million queries against a three columns table t
(s,p,o) which size is 10 billions rows. The data type of each column
is string.  The server has 512G RAM, 32 cores and 14T SSD (RAID 0)

Only two types of queries:

select s p o from t where s = param
select s p o from t where o = param

If I store the table in a Postgresql database takes 6 hours using a
Java ThreadPoolExecutor.

Do you think Postgresql itself can speed up the queries processing
even more? What would be the best strategy?

These are my ideas:

1. Use Spark to launch queries against the table (extracted from
Postgresql) loaded in a dataframe
2. Use PG-Strom, an extension module of PostgreSQL with GPU support
and launch the queries against the table.


Any suggestion will be appreciated

Best,

Re: 2 million queries against a table

От

Ron

Дата:

15 июля 2020 г., 18:28:18

On 7/15/20 10:10 AM, Adam Sanchez wrote:
> Hi
>
> I need to run 2 million queries against a three columns table t
> (s,p,o) which size is 10 billions rows. The data type of each column
> is string.  The server has 512G RAM, 32 cores and 14T SSD (RAID 0)
>
> Only two types of queries:
>
> select s p o from t where s = param
> select s p o from t where o = param

What's the index cardinality of "s" and "o" (about how many records per key)?

What kind of indexes do you have on them?  Is the table clustered on one of 
the keys?

What version of Postgresql?

> If I store the table in a Postgresql database takes 6 hours using a
> Java ThreadPoolExecutor.

How many threads?

What values of:
shared_buffers
temp_buffers
work_mem

> Do you think Postgresql itself can speed up the queries processing
> even more? What would be the best strategy?

2M queries in 6 hours is 93 queries/second.  Over 32 cores, that's only 
three per second.   Not very much.

> These are my ideas:
>
> 1. Use Spark to launch queries against the table (extracted from
> Postgresql) loaded in a dataframe
> 2. Use PG-Strom, an extension module of PostgreSQL with GPU support
> and launch the queries against the table.
>
>
> Any suggestion will be appreciated

IO is -- as usual -- the bottleneck, followed closely by cache efficiency.  
Are you issuing the queries in a random order, or sequentially by key (which 
would enhance cache efficiency)?

-- 
Angular momentum makes the world go 'round.

Re: 2 million queries against a table

От

Tom Lane

Дата:

15 июля 2020 г., 18:38:57

Adam Sanchez <a.sanchez75@gmail.com> writes:
> I need to run 2 million queries against a three columns table t
> (s,p,o) which size is 10 billions rows. The data type of each column
> is string.  The server has 512G RAM, 32 cores and 14T SSD (RAID 0)

> Only two types of queries:

> select s p o from t where s = param
> select s p o from t where o = param

TBH, this is a pretty silly way to use a SQL database.  Put
the probe values into a temporary table (in batches, perhaps)
and do a join.  The per-row cost of that sort of approach
will be multiple orders of magnitude smaller than a query
per row.

            regards, tom lane

Re: 2 million queries against a table

От

Rob Sargent

Дата:

15 июля 2020 г., 18:55:59


On 7/15/20 9:38 AM, Tom Lane wrote:
> Adam Sanchez <a.sanchez75@gmail.com> writes:
>> I need to run 2 million queries against a three columns table t
>> (s,p,o) which size is 10 billions rows. The data type of each column
>> is string.  The server has 512G RAM, 32 cores and 14T SSD (RAID 0)
> 
>> Only two types of queries:
> 
>> select s p o from t where s = param
>> select s p o from t where o = param
> 
> TBH, this is a pretty silly way to use a SQL database.  Put
> the probe values into a temporary table (in batches, perhaps)
> and do a join.  The per-row cost of that sort of approach
> will be multiple orders of magnitude smaller than a query
> per row.
> 
>             regards, tom lane
> 
> 
I had assumed the OP was planning a performance test for say 2M separate 
user/requests (perhaps non-unique) rather than asking for the specific 
2M records.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: 2 million queries against a table

2 million queries against a table

Re: 2 million queries against a table

Re: 2 million queries against a table

Re: 2 million queries against a table