RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs

Поиск
Список
Период
Сортировка
От Godfrin, Philippe E
Тема RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs
Дата
Msg-id SA0PR15MB39334214C78D456BD112A39682749@SA0PR15MB3933.namprd15.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [EXTERNAL] Re: performance expectations for table(s) with 2B recs  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Список pgsql-general

>From: Peter J. Holzer hjp-pgsql@hjp.at

>Sent: Friday, December 10, 2021 3:43 PM

>To: pgsql-general@lists.postgresql.org

>Subject: Re: [EXTERNAL] Re: performance expectations for table(s) with 2B recs

>On 2021-12-10 18:04:07 +0000, Godfrin, Philippe E wrote:

>> >But in my experience the biggest problem with large tables are unstable

>> >execution plans - for most of the parameters the optimizer will choose

>> >to use an index, but for some it will erroneously think that a full

>> >table scan is faster. That can lead to a situation where a query

>> >normally takes less than a second, but sometimes (seemingly at random)

>> >it takes several minutes

>[...]

>> For Peter I have a question. What exactly causes ‘unstable execution plans’ ??

>>

>> Besides not using bind variables, bad statistics, would you elaborate

>> in what would contribute to that instability?

>Not using bind variables and bad statistics are certainly big factors:

>On one hand not using bind variables gives a lot more information to the

>optimizer, so it can choose a better plan at run time. On the other hand

>that makes hard to predict what plan it will choose.

>Bad statistics come in many flavours: They might just be wrong, that's

>usually easy to fix. More problematic are statistics which just don't

>describe reality very well - they may not show a correlation, causing

>the optimizer to assume that two distributions are independent when they

>really aren't (since PostgreSQL 10 you can create statistics on multiple

>columns which helps in many but not all cases) or not show some other

>peculiarity of the data. Or they may be just so close to a flipping

>point that a small change causes the optimizer to choose a wildly

>different plan.

>Another source is dynamically generated SQL. Your application may just

>put together SQL from fragments or it might use something like

>SQLalchemy or an ORM. In any of these cases what looks like one query

>from a user's perspective may really be a whole family of related

>queries - and PostgreSQL will try to find the optimal plan for each of

>them. Which is generally a good thing, but it adds opportunities to mess

>up.

>hp

>--

>_ | Peter J. Holzer | Story must make more sense than reality.

>|_|_) | |

>| | | hjp@hjp.at | -- Charles Stross, "Creative writing

>__/ | http://www.hjp.at/ | challenge!"

 

Good answer Peter, I agree wholeheartedly. I was curious if there was something specific to Postgresql .

phil

В списке pgsql-general по дате отправления:

Предыдущее
От: Achilleas Mantzios
Дата:
Сообщение: Re: Postgresql + containerization possible use case
Следующее
От: Kaushal Shriyan
Дата:
Сообщение: Error : /usr/local/share/lua/5.1/pgmoon/init.lua:211: don’t know how to auth: 10