As you might know, I’m keeping an eye on how I can automate my Data Warehousing testing. In “Automated Testing for the Data Warehouse” my observation was
In essence, testing a Data Warehouse isn’t that complex
It turns out I’m not the only one thinking about this (really? 😉 ). Some time ago I listened to episode 72 from SQL Data Partners podcast, starring Lynn Winterboer. Really deep insight in the requirements for test-driven Data Warehouse work there, I definitely recommend listening the full podcast[ref]If 45 minutes seems too much you can start at 5 minutes and cut off at 36:15, which will save you 15 minutes and capture the gist of Lynn’s thoughts perfectly[/ref]
Three things stood out for me in this podcast:
- The need for communication between “traditional” testing and BI (and the cultural shift needed inside organizations to break down these walls)
- The need to see BI as integrated part of the value stream for business processes
- The need to get solid requirements, get them in a way we can test ’em, and really get the minimal viable product. More on that later.
“Traditional” testing and BI
Whereas the software development world is used to acceptance criteria to be specific (if I on this button A, I expect B to happen), this isn’t always the case in the BI world. In order to nail testing in BI, we therefor need to get test professionals into our BI teams. In the words of Lynn:
The testing mindset and the skills that testers bring into the team is something that we really need to embrace in the Data Warehousing world and get them to come into our team. We have to have some patience with them, because they don’t understand data versus UI. And they have to have patience with us, because we don’t understand testing discipline and strategies. But if you work together, you can come up with some powerful things.
When we talk about agile, we talk about even a single data warehousing team or a BI team having cross-functional skills within a single team of, say, seven people and what we mean by that is we really love to see these data teams evolve to where within a single group of say seven plus or minus two people you’ve got the abilities to do the analysis, work with the business, then do the data analysis, then do the modeling, do the ETL, do the BI metadata work or the cube work, do the BI frontend and do the testing and quality assurance.
And it doesn’t mean you need a human that does each of those things. It may mean that you grow your teams so that a single human has skills across multiple categories there. But then we could extend it even further, and say a really agile organization is going to have to look at things from the point of view of a value chain.
BI: part of the Business Value Stream
The point of using “traditional” testing knowledge in our BI teams shows the key cultural issue in many organizations: BI and Data Warehousing don’t always have their position in the value streams:
If you look at it from a business value chain – let’s say supply chain management – what we would do there is we would have the data professionals (…) be closely aligned with the people in the ERP system, who are also closely aligned with people in any other system that plays a part in the value chain.
So what I’d like to see is that Data Warehousing, Business Intelligence, Reporting and Analytics are part of the value stream of a business process. (…) so as you’re building out the software to meet a business need, you’re also looking at building out the reporting very soon thereafter – in small chunks.
For example: Instead of an entire ERP implementation, you might start with shipping data and say “okay we’re gonna make sure our shipping components work correctly in the ERP and meet the business need, and then we’re gonna do our shipping reporting before we move on to order management.”. The order in which you do these things doesn’t necessarily have to be the order in which the value chain flows. It might be the order in which the business rules are more clearly nailed down by the business – tackle that first while the business is trying to figure out the business rules for the more complicated pieces of the value stream.
That’s where I would really like to see it. And I would like to see that sliced very thinly to run from business needs through finance software through Data Warehousing through the end before they move on to the next little slice. That’s my dream
The minimal viable product
The points up to here (31:00 on the podcast) are mainly cultural. Finally, there was a particularly interesting piece on which data to use for testing. Carlos L Chacon summarizes this process:
Getting together with (..) whoever’s giving you the spreadsheets (..) they have to help you identify the rules and the process of how they got where that report looks like. We can then take that set of that and put all of that in our test environment, and that’s where I’m gonna run my development against. It doesn’t necessarily have to be a whole copy of my Data Warehouse, because I know the rules that are gonna be in place and I can test against that coarse set of data. Do I come up with the same numbers I’m expecting, yadda, yadda, yadda, why not?
Then, he continues with an interesting point many BI developers keep missing: in order to have short iterations, and build products that can be tested well, we need to limit the scope of the minimal viable product! So for example, we don’t say “ok, we build this report for sales figures”. Instead, we try to limit the scope as much as possible, so we can verify the math for the smallest chunk. We might for example deliver a report for sales figures of our bike sales, ordered in Q3 2016, to web customers.
Let’s just say we’re going to the next quarter. So the data I have is Q3 2016, and I start looking at the Q4 data. When I apply those rules, I show it to the business and then they say “well, this number looks a little bit odd”. Then I can say “well, I’ve applied these rules, my Q3 numbers do look good – so what is it? Do we have a change? Is there something like a new product? Is there some other component or rule that we didn’t take into account?”
Lynn Winterboer then points out that this not only provides for more agility and testability in the development process, but also provides better ways to troubleshoot, and:
What you’re doing is you’re narrowing the range of areas you would look for an issue. So it’s gonna help with troubleshooting to say “we’re confident the math works correctly. What else could be skewing this? Is it a change in the business process? Is it a change in our source system – how it records things?” Any of those things is gonna narrow down the scope of what you have to go troubleshoot. And that is very useful to a data team and to the business people, because we can have some confidence in certain things and then say “here are some unknowns that we should go investigate”.
Above is what stood out most for me – regard it as an appetizer, but more valuable insight is shared in the podcast as well:
- regression tests
- adding new features to existing products
So by all means go and listen to the podcast :-).