Data Warehouse Automation Thoughts
I’ve been looking into DWA for a few months now, and might give it a try sooner or later. Here’s some thoughts about it:
Not all DWA tools are created equal
Currently, I think there are two main approaches to Data Warehouse Automation
- Data Warehouse Generation: You provide sources, mappings, datatype mappings etc.. The tool generates code (or artifacts).
- Data Warehouse Automation (DWA): The tool not only generates code / artifacts, but also manages the existing Data Warehouse, by offering continuous insight in data flows, actual lineage, row numbers, etc..
The difference might seem small, but IMHO is visible most clearly whenever changes occur in the Data Warehouse – the second class of tools can handle those changes (while preserving history). With the first class of tools provide you with the new structures, but you need to handle the preservation of history yourself (as you would’ve without DWA).
Worth noticing here is Biml: historically a pure “generation” tool in which you could build your own generation frameworks (plain “free” BimlScript, early Mist). New developments in BimlStudio more and more support you when changing existing deployments. They’re really crossing this border as of now.
Another difference I see is the freedom of modeling: does the tool require you to use a certain way of modeling (e.g. Kimball-style DWH only)? Or are you just provided with the tools to set up layers, mappings and flows without having to adhere to a certain modeling style? Both can be beneficial – the first to assure adherance to a certain modeling technique, the second gives more freedom to adapt to your existing DWH infrastructure when implementing the tooling later on.
Amount of time / money saved
Some DWA vendors calculate the amount of time saved by acting as if you wouldn’t automate without their tooling. This is bullshit: years ago (even before BIML) we already automated our SSIS deployments using the SSIS Automation API, and later using EzAPI. Also, we generated SQL scripts and tables so we could extract new data from sources in an automated way is nothing new. Although the ease with which you can automate is greater using any of these tools, your primary focus shouldn’t be on “I saved X hours because I didn’t do everything by hand”.
Having said that, DWA will in fact save you a lot of money (and time). I just don’t think it is the primary selling point at the moment. Especially comparing the amount of time saved when comparing tool X with hand-crafting your entire Data Vault. I have yet to meet the first shack who considered hand-crafting a Data Vault. Maybe it’s because they were out of business before they finished ;-).
Where you will save lots of time, is in not having to build the automation framework by yourself. Of course your engineers are able to build anything you like – but as long as off-the-shelves solutions exist, they’ll hardly do it cheaper. So when making the “amount of money” saved argument, be sure to include the following aspects:
- The DWA tooling will continue to improve on the experience they have from all other customers
- It’s way easier to transfer work to new developers – no in-company workarounds any more
- It’s also easier to transfer the maintenance of delivered DWH parts to administrators, if needed.
- Documentation is provided out-of-the-box
- The GUI provides a structured oversight of what’s running
- Default reports are often included for lineage, ongoing performance over time, etc.
- Finally, depending on the solution the DWA tooling will monitor your process and sometimes choose other loading techniques. I know that TimeXtender (TX-DWA) does this, and other providers probably too. So the administrative workload will reduce!
Privacy and GDPR
This is where DWA tooling appliances can make a good business case: if they help you with good structures and audit trails to comply to all privacy laws. Not only in a negative way (“not sending your data to our servers” – see also Clint Huijbers’ conversation with Peter Avenant on LinkedIn[ref]Clint references several articles on PrivacyShield:
- Benefits of Participation
- Privacy Shield FAQ (European Commission)
- Why You Can’t Ignore Privacy Shield (Dark Reading)
[/ref]), but also in a positive way (“we can help you get insight[ref]For example, because the DWA framework helps you tag the privacy-sensitive fields, and ensures the right way of storage – encryption – and handling – explicit insight in who can access which data[/ref]”).
I understand that adhering to the law isn’t something any tool can “solve”, but at places where lots of data are gathered tooling will come in handy to ensure compliance.
the value of automation is certainly not limited to privacy and/or GDPR. The right integrated data automation tooling supports governance in general better than any other solution.
This is certainly true – privacy and the right handling of sensitive data is indeed part of data governance. Where possible DWA tooling should augment or support your data governance. I think it will make DWA offerings even more compelling if the ways your DWA tool supports this are highlighted more clearly.
Data Warehouse Automation tooling is gaining a lot of traction lately:
- The list of TimeXtender partners is extending ever more
- Varigence BimlFlex is finally here
- DataWarehouseAutomation.Guide has launched and provides good help selecting tools
For me, the great argument for buying DWA tooling is threefold:
- The way of working is standardized (easier to transfer technical work)
- Less technical work and thus cost savings (more business intelligence)
- Batteries included: instead of having to allocate my team’s resources to improve the technical stuff, lots of things are already provided out-of-the box
Currently, I’ve not yet decided when we’re going to jump the wagon, and which wagon we’re going to jump, but the Data Warehouse Automation Guide[ref]I’m not associated in any way to this site. I just like it :-).[/ref] is a great help in providing a good overview as well as comparing the features.