Saturday, April 16, 2016

Database Scaling Jargons

If you are in IT then no matter what you do (develop, deploy or maintain applications), you would have seen database issues. And wherever I see database issues, there comes a jargon of database scaling terms which are thrown on them. Most of them mean the same thing others are nowhere near the problem they are trying to address.

Let’s understand few terms listed below:-
  1. Scale Up vs Scale Out
  2. Horizontal vs Vertical Scaling
  3. Horizontal vs Vertical Partitioning
  4. Sharding
Database Scaling is like a bouncer in a bar. Either you need one strong one or multiple to handle the rowdy crowd. Not sure where I got this analogy from but I like it so I am keeping it :) Below is the explanation.

So with this in mind, let's understand few terms which are not so different to each other like Scale Up vs Horizontal Scaling and Scale out vs Vertical scaling. 

Scale Up or Vertical Scaling means buying bigger boxes for your database with more memory and processing power. While this works, you might pay four to five times what you paid for your previous solution, and you’ll only get twice the performance. 

Pros:-
  • Less power consumption than running multiple servers
  • Cooling costs are less than scaling horizontally
  • Generally less challenging to implement
  • Less licensing costs
  • (sometimes) uses less network hardware than scaling horizontally (this is a whole different topic that we can discuss later)

Cons:-
  • PRICE, PRICE, PRICE
  • Greater risk of hardware failure causing bigger outages
  • generally severe vendor lock-in and limited upgradeability in the future 




On the other hand Scale Out or Horizontal Scaling means adding more servers with less processors and RAM. This is usually cheaper overall and can theoretically scale infinitely. 

Pros:-
  • Much cheaper than scaling vertically
  • Easier to run fault-tolerance
  • Easy to upgrade

Cons:-
  • More licensing fees
  • Bigger footprint in the Data Center
  • Higher utility cost (Electricity and cooling)
  • Possible need for more networking equipment (switches/routers)

Horizontal Vs Vertical Partitioning


There are two different methodologies to deal with same issues but requires change in database schema. Below is what we do in these:-

Horizontal Partitioning

Horizontal partitioning divides a table into multiple tables. Each table then contains the same number of columns, but fewer rows. For example, a table that contains 1 billion rows could be partitioned horizontally into 12 tables, with each smaller table representing one month of data for a specific year. Any queries requiring data for a specific month only reference the appropriate table.

This is also called Sharding.  There are some arguments that Horizontal partitioning is not exactly Sharding but underlying idea is same but just when you use a reference of keys to identify which partition will hold the data. That is Sharding. Below picture explains what I am trying to say.




Vertical Partitioning

Vertical partitioning divides a table into multiple tables that contain fewer columns. The two types of vertical partitioning are normalization and row splitting:
  • Normalization is the standard database process of removing redundant columns from a table and putting them in secondary tables that are linked to the primary table by primary key and foreign key relationships.
  • Row splitting divides the original table vertically into tables with fewer columns. Each logical row in a split table matches the same logical row in the other tables as identified by a UNIQUE KEY column that is identical in all of the partitioned tables. For example, joining the row with ID 712 from each split table re-creates the original row.

Like horizontal partitioning, vertical partitioning lets queries scan less data. This increases query performance. For example, a table that contains seven columns of which only the first four are generally referenced may benefit from splitting the last three columns into a separate table. Below picture explain vertical portioning.


Database scaling could be fun to solve but before you ever get into resolving the issue, you should first understand the underlying problem with your database. Lot of database performance issues can really be dealt with by fixing your schema and fine tuning your queries. 

 Once you get this right and you still have issues, then you further need to perform in depth analysis of why your performance is getting a hit. Do you need a more writes than read. Is your queries processor intensive or high in numbers? All these answers will help you to choose the right scaling option for your database.

Hope you would have found this interesting. I surely enjoyed writing this :)