What is ClickHouse®? Why Column-Oriented DBMS Are Important

What makes column-oriented database management systems faster than traditional ones and how can ClickHouse help?

September 27, 2022
20 mins to read

What makes column-oriented database management systems faster than traditional ones and how can ClickHouse® help?

Way back in 2009, the first prototype for ClickHouse® was released, an analytical database management system that would instantly generate reports on non-aggregated logs in real time. This new type of DBMS was great for all manner of tasks where rapid processing of constantly incoming data was required.

What Do Column-Oriented Database Management Systems Do?

When loads get heavier, universal DBMS’s slow down or start burning through server capacity. Just take the classic problem of a relational database storing everything in rows, which is a very slow way to process complex analytical queries when the number of records climbs into the billions. When creating reports, databases like that have to analyze a huge amount of related but unnecessary information. Not even optimization — correctly configured keys and indexes — gets samples built faster with quantities that large.

Column-oriented DBMSs were invented to build reports faster, storing data in… you guessed it… columns. With one set of values per column, indicator reports are much easier to compile. Column-oriented DBMSs are best suited for online analytical processing (OLAP). These tasks typically involve the following factors:

  • The vast majority of queries are for reading.

  • Data is added and updated in fairly large batches (> 1000 rows) rather than one row at a time, or not at all.

  • Data is added to the database but not modified.

  • Reading uses a fairly large number of rows in the database but only a small subset of columns.

Traditional DBMSs perform poorly for analytics compared to an OLAP DBMS. To demonstrate this visually:

Row-Oriented

Column-Oriented

ClickHouse®

The ClickHouse® column-oriented DBMS was developed to interactively build reports for non-aggregated logs of user actions. But that’s not all the system can do. Over time, detailed documentation was written for ClickHouse®, and the database saw active use by other products. The DBMS brought previously impossible solutions within reach and made problem-solving more efficient than ever.

Here’s just some of what sets ClickHouse® apart:

  • Column-oriented — data is only read from relevant columns, and similar information is compressed for efficiency.

  • Support for approximate calculations using a data sample — this reduces the number of hard drive accesses, further accelerating data processing.

  • Physical data sorting by primary key — you can quickly pull specific values or ranges.

  • Vector calculations for column segments — dispatching costs are reduced and CPU used efficiently.

  • Parallelization of operations within both individual servers with multiple processor cores and distributed computing in a cluster thanks to a sharding mechanism.

  • Linear scalability — cluster potential is sky-high.

  • Working with hard drives — ClickHouse® is in its element even when the data doesn’t all reach the memory cache. That also reduces the cost of operating the system since hard drives are cheaper than RAM.

  • Fault tolerance — the system is a cluster of shards, and each shard is a group of replicas.

ClickHouse® helps clients connect to the database with a console client, HTTP API, a number of wrappers in Python, PHP, Node.js, Perl, Ruby, R, and much more. There are also JDBC and Golang drivers.

Where ClickHouse® Fits…

ClickHouse® took a step beyond internal projects in 2013, when it was used to analyze metadata about the LHCb experiment at CERN. Whilst it could have been used more widely, its restricted status made that difficult. But June 2016 saw the ClickHouse® source code uploaded to open source under the Apache 2.0 license. That meant many domestic and foreign companies could have their IT department adopt it. Amongst those were Cloudflare and Bloomberg.

ClickHouse® in 2022 is a fully-fledged DBMS with a broad range of capabilities. It create tables and databases in runtime, pull data from a variety of sources, analyze it, and execute queries without reconfiguring or restarting the server. ClickHouse® enables quick access to corporate data warehouses and supports a declarative query language based on SQL that coincides with the SQL standard in many cases. DBMSs can be integrated into big data systems like Apache Kafka® and HDFS as well as MySQL® and other external data sources via ODBC or JDBC.

Having a column-oriented DBMS in general, and ClickHouse® in particular, has streamlined our development in a variety of areas. They could be:

  • Analytics for web projects and mobile apps

  • Advertising networks and real-time bidding

  • Telecoms

  • E-commerce and finance

  • Information security

  • Business analytics

  • Online games

  • Internet of things

But ClickHouse’s biggest selling point is still its incredible speed. It lets you handle key business problems centered around customer behavior analytics by generating big data reports in real time quickly and easily. That’s a win for retail, telecom or any other internet infrastructure: quick access to data storage from devices connected to the internet of things, and more.

ClickHouse® empowers you to track business metrics, analyzing how users behave on your store’s website or in your games. Companies can even collect and create data showcases for their customers, giving them secure access but limiting it to information that can be shared.

What’s next?

With the oceans of unstructured data accumulating out in the world, real-time access is key. And that means database management systems like ClickHouse® have to get even better.

The next step for businesses should be transitioning to cloud solutions. That will let them quickly visualize data without lengthy hardware procurement and configuration processes, instantly configure dashboards, and analyze events and business processes in real time.

The DoubleCloud managed database service helps deploy and maintain ClickHouse-based database clusters in different cloud infrastructures. You get all the benefits of a column-oriented DBMS without buying or configuring hardware, handling maintenance, or worrying about updates. DoubleCloud Managed Service for ClickHouse also makes your work far more secure even when you have cluster hosts in different areas of availability.

Thanks for reading, here’s $600 credit on us to trial the power of DoubleCloud

* ClickHouse® is a trademark of ClickHouse, Inc. https://clickhouse.com