Обсуждение: splitting up tables based on read/write frequency of columns
This is really a theoretical/anecdotal question, as I'm not at a scale yet where this would measurable. I want to investigatewhile this is fresh in my mind... I recall reading that unless a row has columns that are TOASTed, an `UPDATE` is essentially an `INSERT + DELETE`, with theprevious row marked for vacuuming. A few of my tables have the following characteristics: - The Primary Key has many other tables/columns that FKEY onto it. - Many columns (30+) of small data size - Most columns (90%) are 1 WRITE(UPDATE) for 1000 READS - Some columns (10%) do a bit of internal bookkeeping and are 1 WRITE(UPDATE) for 50 READS Has anyone done testing/benchmarking on potential efficiency/savings by consolidating the frequent UPDATE columns into theirown table?
Hi I'm pretty sure PostgreSQL can handle this. But since you asked with a theoretic background, it's probably worthwhile to look at column stores (like [1]). -S. [*] http://citusdata.github.io/cstore_fdw/ 2015-01-19 22:47 GMT+01:00 Jonathan Vanasco <postgres@2xlp.com>: > This is really a theoretical/anecdotal question, as I'm not at a scale yet where this would measurable. I want to investigatewhile this is fresh in my mind... > > I recall reading that unless a row has columns that are TOASTed, an `UPDATE` is essentially an `INSERT + DELETE`, withthe previous row marked for vacuuming. > > A few of my tables have the following characteristics: > - The Primary Key has many other tables/columns that FKEY onto it. > - Many columns (30+) of small data size > - Most columns (90%) are 1 WRITE(UPDATE) for 1000 READS > - Some columns (10%) do a bit of internal bookkeeping and are 1 WRITE(UPDATE) for 50 READS > > Has anyone done testing/benchmarking on potential efficiency/savings by consolidating the frequent UPDATE columns intotheir own table? > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
On Jan 19, 2015, at 5:07 PM, Stefan Keller wrote: > Hi > > I'm pretty sure PostgreSQL can handle this. > But since you asked with a theoretic background, > it's probably worthwhile to look at column stores (like [1]). Wow. I didn't know there was a column store extension for PG -- this would come in handy for some analytic stuff we run! I know that PG can handle my current system "at scale". I'm really just wondering what the possible slowdowns/improvementswill be. Doing a rewrite of the entire row + updating the various indexes seems to be a lot of unnecessary IO. At some point it willmake sense to minimize that and isolate the heavy-write columns from impacting the rest of the table's performance.
Jonathan Vanasco-7 wrote > This is really a theoretical/anecdotal question, as I'm not at a scale yet > where this would measurable. I want to investigate while this is fresh in > my mind... > > I recall reading that unless a row has columns that are TOASTed, an > `UPDATE` is essentially an `INSERT + DELETE`, with the previous row marked > for vacuuming. > > A few of my tables have the following characteristics: > - The Primary Key has many other tables/columns that FKEY onto it. > - Many columns (30+) of small data size > - Most columns (90%) are 1 WRITE(UPDATE) for 1000 READS > - Some columns (10%) do a bit of internal bookkeeping and are 1 > WRITE(UPDATE) for 50 READS > > Has anyone done testing/benchmarking on potential efficiency/savings by > consolidating the frequent UPDATE columns into their own table? Consider another in-database attempt to mitigate the need to do this manually: HOT (heap-only-tuple) http://pgsql.tapoueh.org/site/html/misc/hot.html I haven't done any bench-marking but I do currently use this idea to segregate read-only fields from read-write fields. Likely there is also a model reason why these fields have different update frequencies and so can both logically and physically be partitioned out. The only harm I see is that it can make using the schema more difficult - though that can be somewhat mitigated by using (updateable) views in front of the partitioned tables. If you can logically partition them I would go for it; if it is a purely physical concern then I'd probably ponder it for another couple of days and then go for it anyway. The main additional thing to ponder is the cost of additional parsing and the additional join. Neither is super expensive but if only doing this for physical reasons you need to evaluate your planned usage patterns. With a logical split you are more likely to find situations where you do not even care about one table or the other and so can avoid the join entirely. David J. -- View this message in context: http://postgresql.nabble.com/splitting-up-tables-based-on-read-write-frequency-of-columns-tp5834646p5834911.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.