In a world split between those organizations that can effectively make use of Bigdata and those that can’t, Data Marts are increasingly being seen as essential aspects of the modern data stack for transforming raw data into actionable business intelligence.
We all know that Data Warehouses are great for dealing with large data sets but we also all know that when conducting large scale data analysis we need easy to find, readily available data.
What you never want is to have to perform a complex query on an entire database because someone’s requested a simple report.
That’s where Data Marts come in…
Definition Of A Data Mart
A DataMart can best be considered as a ‘subset’ of a Data Warehouse.
It’s a subject-oriented database (sometimes a portioned segment of a larger Data Warehouse, sometimes not) that focuses on a particular aspect of an organization’s data, be it a department, a product or particular subject area.
Data Marts are great for accelerating an organization’s processes as they make available specific data to predefined user groups, allowing them to access it quickly, without the need of complex search queries scouring the entire Data Warehouse.
As Data Marts only contain data applicable to one aspect of an organization, they’re a cost-effective way of structuring data and deriving fast and actionable insights.
Data Marts Vs Data Warehouses Vs Data Lakes
Data Marts and Data Warehouses are both highly structured databases in which data is both stored and managed until it’s called upon. In comparison, a Data Lake provides for storage of unstructured or raw data, coming in from multiple, disparate sources.
However, the data in a Data Lake won’t have been prepared for analysis. The benefit to that is that storing data in its raw form is much, much cheaper than a Data Mart or Data Warehouse as the data doesn’t need to be cleaned before ingestion… it can just be held till it’s needed.
Still, in its raw form it’s not very insightful, which is where Data Warehouses and Data Marts come in…
As we’ve already mentioned, a Data Warehouse is the central storage for an organization’s entire operations, whilst a Data Mart will focus down on a particular function.
That leaves Data Warehouses vulnerable. Everyone needs access to them but it will also need to be strictly controlled for security purposes.
Plus, querying the entire database can be resource heavy.
That’s the main role of a good Data Mart, partitioning off smaller sets of data from the whole as needed and providing much easier access for those calling on it.
Much like a Data Warehouse, a Data Mart is a relational database storing transactional data in columns and rows and there’s two ways to create on
They can be spun up from the ‘top down’ (an existing Data Warehouse) or from other sources, such as external data or internal operating systems.
Separate Data Marts can be merged to a single Data Warehouse.
Different Types Of Data Marts
There are three different types of Data Marts known as dependent, independent and hybrid.
The differences between them are defined by their relationship to a Data Warehouse and the disparate data sources that power it.
Dependent Data Marts
A dependent Data Mart is one which is created from an existing data warehouse.
All of an organization is first stored in one location (the Data Warehouse) before being portioned off where it can be queried more efficiently.
Dependent Data Marts can offer up logical views; a virtual table that is logically, but not physically, separated from the Data warehouse or they can be physically separated on a different database entirely.
Independent Data Marts
As it sounds, an independent Data Mart is a stand-alone database, created without the need of a Data warehouse.
Data is ingested from systems either internal or external to the organization then loaded directly into the Data Mart where it’s stored for later analysis.
The benefit to independent Data Marts is that they’re incredibly easy to spin up and great for achieving short term goals or specific use cases (for instance a dashboard where speed is much more important than ad hoc flexibility) but can become problematic to manage as their numbers increase as they all need their own ETL tool logic. Fortunately, DoubleCloud can do a lot of the heavy lifting for you when it comes to managing your Data Marts, so if you’re looking to save some time, feel free to claim a free trial here>>
Hybrid Data Marts
Hybrid Data Marts, as we’re sure you’ve already guessed, share elements of both dependent and independent Data Marts.
They’re used when it’s necessary to combine data from both existing Data Warehouses and other sources, uniting the benefits of both in one solution.
The Different Structures Of Data Marts
Much like traditional Data Warehouses, Data Marts can be structured using multidimensional schemas as blueprints. Traditionally the ‘main three’ were the star, snowflake or vault methods but when used in a true modern data stack, most Data Marts today take a more denormalized approach as speed is increasingly seen as more important than allowing for faster upgrades (especially when there’s managed services doing that for organizations).
The star schema is a logical format of tables in a multidimensional database making up a Data Mart (or Data Warehouse) that, as you probably guessed… resembles a star shape.
Within this set-up, one fact table is at the ‘center’ of the star and is surrounded by associated dimension tables. As there’s no dependency between these associated dimension tables, the schema requires fewer joins when writing queries, making it much more efficient for anyone looking to analyze large data sets.
The snowflake schema follows on from the star schema in that it’s built out with additional dimension tables that are normalized to protect data integrity and minimize data redundancies.
Whilst the snowflake method of building Data Marts does use up a lot less space storing dimension tables, their complex structures often make them a lot more difficult to manage and maintain.
Also known as Data Vault, this is a modeling technique used in modern data stacks that allows data scientists to architect and create Data Warehouses.
Vaults enforce a layered structure within Data Marts and were developed deliberately to deal with issues of agility and scalability when using the star or snowflake schema by doing away with the need for cleansing.
They also streamline the process of adding in new data sources without having to change and adapt existing schema.
Benefits Of Data Marts
Hopefully by now you can see the benefits Data Marts bring to an organization looking to derive actionable business intelligence from their data in an efficient and non-costly fashion, but to summarize, Data Marts offer:
A time saving when accessing specific data sets
A way to separate the business logic from the application layer as Data Marts can often be filled with complicated logic in the backend, but the reading and displaying logic is always simple.
Cost savings as, due to their size, they’re often cheaper than creating an entire Data Warehouse architecture
Improved performance — dependent and hybrid Data Marts are capable of improving the performance of an entire Data Warehouse by lessening the need to constantly process and analyze the entire structure
More control, with disparate departments or functions controlling their own data
Ease of implementation, requiring much less technical skill to create than an entire Data warehouse
Easier maintenance due to their size — less data means less clutter to manage
A minimum viable product, with future Data Warehouse requirements being created out of existing Data Marts.