Обсуждение: Use case stuck due to Partitioning

Поиск
Список
Период
Сортировка

Use case stuck due to Partitioning

От
Shubham Mittal
Дата:
Hi All,

Please find my use case below and suggest the appropriate solution:

We have done declarative partitioning over our table A based on some key and further range sub partitioned each partition on the basis of createddate range of 90 days.

In the search query to display last 90 day data , I give both column B and  createddate which may pick any of the required partitions. 

But how can I display last 90 days data in which a record which is modified last is also shown at the top but its created date might not lie in the last 90 days range.. 

Basically, I need to partition my huge table based on some key and also on date to split the data into smaller dataset for faster query . The UI needs to display the latest modified records first but the created date also needs to be given in the query to pick the right partition...

Can we do partition differently to achieve this?
Should i find the max modified date  and give it in the range of the created date.

Thanks & Regards

Re: Use case stuck due to Partitioning

От
Michael Lewis
Дата:
Do you have an index on the "updated_at" field and found that the query is too slow? Do you have an example query?

Re: Use case stuck due to Partitioning

От
Shubham Mittal
Дата:
Hi Michael,

Yes I have a btree index on the modified_date column currently.

SELECT  *  FROM partitioned_table  where
 A ='Value'
AND created_date >= '2021-03-01 08:16:13.589' and created_date <= '2021-04-02 08:16:13.589'  ORDER BY viewpriority desc OFFSET 0 ROWS FETCH NEXT 200 ROWS ONLY;

Here viewpriority is basically a long value containing created_date in milliseconds.

Issue here is : I want to  somehow include modified_date also in this query to get the records sorted by modified_date . But that sorting will happen only in the specified created_date range only.  I want those latest modified records also whose created_date might not lie in this range.

Thanks 



On Tue, Jun 29, 2021 at 1:22 AM Michael Lewis <mlewis@entrata.com> wrote:
Do you have an index on the "updated_at" field and found that the query is too slow? Do you have an example query?

Re: Use case stuck due to Partitioning

От
Michael Lewis
Дата:
I am unclear exactly what you want to do with modified_date. Can you write pseudo code perhaps?

It sounds like you are wanting to union all the results of the query you gave, with the results of a similar query where modified_date >= '2021-03-01 08:16:13.589' and created_at < '2021-03-01 08:16:13.589'.

However, if modified date is not null, and is set to the created_date by default, then there seems no reason to check the created_date except if partition pruning at plan time is very important.

Are you experiencing an actual performance problem, are you unsure how to write the query for the data you want, or is this premature optimization?

Re: Use case stuck due to Partitioning

От
"David G. Johnston"
Дата:
On Mon, Jun 28, 2021 at 2:51 PM Michael Lewis <mlewis@entrata.com> wrote:
I am unclear exactly what you want to do with modified_date. Can you write pseudo code perhaps?


I second this.  While I'm not all that familiar with partitioning I am readily getting the feeling that whether or not partitioning is used here is immaterial to the question of how best to retrieve most recent data.  My take is that when choosing to partition on time its usually best done because the older data becomes irrelevant over time and thus can be readily ignored.  If that data doesn't become stale then the benefit of partitioning is lost since you now have meaningful data on all partitions.

Partitioning, like indexes, are not some super feature whose presence solves all performance problems.  They need to be designed and used in a thoughtful manner.  Saying "Basically, I need to partition my huge table based on some key and also on date to split the data into smaller dataset for faster query." doesn't indicate that much designing has taken place - only that there is a belief that "smaller datasets make for faster queries".  Partitioning doesn't necessarily make the dataset smaller - it just moves parts around.  The queries need to only look at a subset of that main dataset and the query described here doesn't do that - and there isn't a feature of the current partitioning implementation, that I know of (limited experience admittedly), that will overcome this reality of the data.

David J.