Info

You are currently browsing the archives for the Column Store Db category.

Calendar
July 2010
M T W T F S S
« Oct    
 1234
567891011
12131415161718
19202122232425
262728293031  
Categories
Links

Archive for the Column Store Db Category

Searching & Compression

If like me you have worked with relational databases for many years, you probably wonder what’s so different about a column database. Well in the column store world that means that all like column information is stored together. Therefore if your table comprises of 6 columns and 100,000 rows then in the relational world you store 100,000 records but in the column store world there are only 6 groups of data stored, one for each column.

 Now imagine that you need to search one of those columns for certain values. Since they are all together, you don’t waste time looking at all the data, just the values for that column, so that must be quicker, but there is more you can do. What if one of the columns in the 100,000 records was a date. If all the records were for a limited range of dates, then instead of having to store each date, you can now keep a count of how many occurences you have for each date. Products like Vertica offer various compression techniques on the columns, and in this example, run-length encoding would be the obvious choice which would store 15/10/2008 and 16982 to represent the number of occurrences. In the relational world we would have to read every record.

Do you now start to understand why its so much faster in the column-store world. Not only is the data highly compressed so there is little of it to read, but you only read the data of interest.

But you cry, who reads all the data, we build objects like materialized views so we search less of the data, hence its not so much of a problem. In databases like Oracle materialized views do result in huge performance gains, but imagine what happens when you take that technology and apply it to a column-store. Vertica creates projections which are the equivalent of materialized views without the aggregation. So just when you speeded up your relational database, the column-store steams ahead once again using projections.

If you are now curious, take a look at some of the Vertica benchmarks and you will see what I mean

What’s so Special about a Column Store Database

I started working with databases over 25 years ago. In the beginning there was CODASYL and Hierarchial databases, but along came Relational and we were convinced that it could do everything and products like Oracle have been available for over 30 years, so they are obviously an extremely useful business tool. So why after all this time would anyone want a column-store database?

What’s interesting about column-store database products is that they don’t claim to be a one size fit all solution, instead they target specific business areas such as analytics. Whilst relational databases can also perform this task, the advantage of using a column-store product like Vertica is that data compression can be as high as 90%. Therefore you can either store much more data or use considerably less storage than a traditional system. Performance gains could be as much as 200x faster than your current system and you can achieve this on a low-cost hardware platform. 

Over the years we have learnt about how to improve the capabilities within database systems, but in traditional systems its not always easy to implement a new approach. Where as new products like Vertica can design these new techniques into their product thus you are using a product really designed for today’s technology.

The column-store database is still queried using SQL, there is still a schema to design and a database to be tuned. Its just fun looking at it from a column perspective rather than always thinking about the entire row.

 I am really enjoying taking all the skills and experience that I have gathered over the years and applying it to Vertica, but also there are new techniques and approaches to learn. So it just goes to prove that you can teach an old dog new tricks.

Next time we will look in more detail about what makes a column store different

|