EtLT: The tech that’s transforming data processing
DoubleCloud recently published a blog on the subject of ETL vs ELT, or Extract, Transform Load vs Extract, Load, Transform.
Many developers still mix the two up, with ETL being used to extract data from multiple, disparate sources and transforming it whilst it’s in-flight so it can be loaded into a system in the correct format, making said data ready for analysis and reporting.
However, ELT is a variation in which the raw data is loaded into the target system first, where it’s then transformed.
ELT was developed to leverage the power of the modern data stack to help perform those transformations as close to the data as possible, cutting down on resources (and cost) and vastly speeding up the entire process.
Never happy to rest on its laurels (or do with just two confusingly similar acronyms) the data community has developed a third, hybrid process known as… EtLT.
ETL vs ELT — Which is better?
Before we discuss EtLT, let’s just quickly summarize ETL and ELT and when you should use each and why.
Ask a dozen different Solution Architects this and we’re sure you’ll get a dozen different answers.
The truth is, you can’t say definitively which is better, as it very much depends on context and the result you’re trying to achieve.
The biggest difference is obviously when the data gets processed with the other being in their performance and scalability.
ELT is generally considered more a performant and scalable solution than ETL, whilst ETL is thought to be more resource-intensive, requiring the moving and transforming of data before it can be loaded into a target system… leads to greater latency and longer processing times.
It’s more suitable for situations in which the data needs to be transformed, cleaned, or otherwise sanitized before it can be loaded into the target system such as when dealing with data from multiple sources which may have different structures, schemas, and formats.
ETL, on the other hand, is much more suitable for situations in which the data is already in some sort of semi structured format and doesn’t need significant cleaning and transformation and where the focus is on loading the data as quickly as possible for real-time analysis and reporting or situations in which the data needs to be arranged into a useable format.
What is EtLT?
Now on to the main point of this blog…
Although still very new, many consider EtLT to be the best of both worlds.
Although ELT offers anyone looking for rapid ingestion of data a huge boost in performance, anyone who’s tried to rely solely on it within their solution will tell you it sometimes struggles to meet data security and compliance needs, especially anything requiring you to hide, remove or encrypt PHI and PII data.
As important as fast analytics are (and they are important in this world of Big Data), sometimes speed has to be sacrificed for security concerns… or does it?
That’s where EtLT comes in: Extract, transform, Load, Transform (although as it’s still so new we’ve also seen it called Extract, Tweak, Load, Transform).
It’s a hybrid or best of both worlds approach that is both fast and ensures data quality when it comes to sensitive information for compliance and regulation.
So how does it work?
-
Extract: The raw data is ingested from your various sources and databases and placed into your staging area.
-
Transform: The first round of transformation is the transformation ‘lite’ stage, in which PHI, PII or other sensitive data is removed or encrypted, usually one data source at a time. That makes these transformations fast (as they’re relatively simple) as the data is transformed independent of other sources. At this point the data from said sources isn’t integrated together… that won’t happen till the next stage.
-
Load: The newly prepared data is loaded into your waiting Data Warehouse.
-
Transform: The data is now transformed and fully integrated within your Data Warehouse, actually using the data warehouse to process those transactions, integrating all your multiple data sources into a cohesive whole.
The reason EtLT is gaining popularity with many data engineers running modern data stacks is that it lets them ingest data from a plurality of sources, sensitive or not, thanks to that first, ‘lite’ transformation stage.
Those transformations are simple to set up, incredibly fast to process and answer ELT’s biggest flaw by satisfying most, if not all regulatory and governance security compliance concerns.
To increase speeds even further, your complicated, multi-source transformations can still happen within your Data Warehouse.
Benefits of EtLT or when to use EtLT
As you’ve probably already recognized, EtLT is most useful when an organization is trying to achieve sub-second analytics but has security concerns about the data; needing to mask, remove or encrypt parts, if not all of it.
If your organization’s governance, whether that be set internally or due to a nation’s data laws, requires preload transformations then EtLT is probably the solution you’ve been looking for.
-
EtLT is secure — meeting data security and compliance requirements
-
It engenders a much higher level of customer trust
-
It’s fast — you don’t have to sacrifice security for speed as you would with the old ETL vs ELT debate
It also offers you a lot more flexibility in your end solution; giving you the ability to change the way you transform data post-load to better suit different analytics you might need to run.
In this article, we’ll talk about: