The Testing Show: Coverage

The Testing Show: Coverage

Coverage is a broad area and asking ten people will probably get you ten different answers. We didn't ask ten people but we did ask three and this podcast sums up that discussion. Join us as we delve into the topic of coverage with Perze Ababa, Elle Gee, Matthew Husser, and Michael Larsen to see the good, the bad, the ugly, and the just plain necessary of coverage as it relates to software testing.

itunesrss

Panelists:


References:

 


Transcript:

Michael Larsen: Hello and welcome to the testing show. I’m Michael Larsen, your show producer and today I am going to be moderator for this discussion on coverage and if that sounds a little bit vague, that’s on purpose, we promise it won’t be by the time we’re finished. To that end, we’d like to welcome a long time contributor to the testing show who hasn’t been here for awhile. Mr Perze Ababa, welcome back.

Perze Ababa: Hey everyone, I am glad to be here

Michael Larsen: and also on the QualiTest side, we would like to welcome a now regular contributor with us and thank you for joining us Miss Elle Gee.

Elle Gee: Hi, thanks for having me back again.

Michael Larsen: And of course we have Matt Heusser in the hot seat this time because we are talking about an area that he has some expertise in and he felt a little passionate about and so I said that I would be the one handling the questions and he can talk to his heart’s content on this. So welcome Matt, thanks for joining us in this role today.

Matthew Heusser: Thank you Michael.

Michael Larsen: And let’s go ahead and get started. I’d like to throw this out as our first question or block and I realize this alone could probably break up into five or six podcasts, but let’s try to at least do it briefly. When we say coverage, what exactly do we mean? If you talk to any given individual in an organization, you’re probably going to get a different answer and expectation. So Perze, how about I throw that to you first? When we say coverage, what are we really talking about?

Perze Ababa: Alright, let’s take a stab at this. Right. At the end of the day, what we really care about from a testing perspective is the type of information that we need in order to test the application or your tests effectively. In my personal opinion, coverage from the get go really is looking at an existing set of information that’s important to someone at a given point in time and providing tests for that particular set of information. Now, however we can effectively or efficiently or maybe consistently gather that information and provide tests for that specific information that that’s really up to whoever is looking and depending on the skill of the people as part of that team.

Elle Gee: I think as you started with saying coverage is different people and manifests in different ways on different projects. More often than not. For me, when I think about coverage, I think about the relationship between testing and requirements, user stories and test traceability. How can we identify what needs to be tested and then how do we ensure that it is tested and quite often it’s not straight forward. What do you do when there’s a project where there are no defined requirements? Where you can’t use the requirements as a guide? I think that coverage as a question can just go in a million different directions.

Michael Larsen: I definitely feel that, I know this from my own experiences. Whenever look at an application or something that’s being tested. My general question whenever I try to develop coverage beyond just the what is the application and what are the features in it. More times than not, I try to break it down into more granular steps if possible so that when I have the coverage conversation, my initial question is what changed? That’s always my first thing to ask and if you’re dealing with a very granular change or something that’s been a very little niche specific thing, then coverage isn’t such a daunting question. If however you’re doing a release that’s had a whole bunch of modifications or a new front end put in coverage can be a nightmare. What do you think, Matt? How do you think we’re handling this definition?

Matthew Heusser: Well, I think that’s good. Yeah. The ways I would define coverage are boring and based in mathematical concepts that are precisely correct. Yet people might not be able to connect. Yeah, so I’m going to try, it’s a totally different angle, so I think we’d all agree complete testing is impossible. To measure the percentage of work you’ve done on an infinite set is meaningless to come up with a model. And then we measure our conformance to that model. So that might be the use cases, that might be the requirements. That might be the code itself. And as I think Perze nailed it, that everybody has a different perspective because everyone has their own model of what the software is. If we can figure out a model that is better than, “I’m just testing it” and then we can track how well we’ve covered the things in that model that will reduce some risks. So like I said, boring and academic, but correct. And another question is what do we do with that? Disagree?

Michael Larsen: No, I think that actually works very well. So the next thing that I would say is, so now I think we’ve gone around and we’ve hacked out a definition of coverage that I think we’re all happy with. Now let’s go on and ask an additional question. How do we ensure that we’re actually meeting it? And again, I know ensure is a loaded word, too, so Perze, back to you.

Perze Ababa: Yeah. So lists are definitely important here because when we are talking about coverage, and I’m going to go back to my definition, we start with the information that we care about and then we start digging deeper, expanding to what Elle has pretty much said regarding what are now these things that would actually define the coverage items that we care about? Having some sort of either a running list or more of a coverage map for any given application. And the beauty about defining it as a map is that the longer you haven’t visited a place, you only have an understanding of what it looked like the last time you were there. Right? And then that’s really the reflection of it. And then when you go back and see if things are still the same, are there any new things? That’s something that you can actually keep up to date. It boils down to the quality of dimensions is that you really want to care about after that. And then making sure that you have an up-to-date list, that it can be communicated to the people who actually need this information.

Elle Gee: I love the idea of thinking of traceability beyond the list and thinking of it as a map. That idea of looking back and thinking how long ago was I there, when did we last visit and how’s it changed is something I think that often gets lost in the frantic nature of cycles of testing, pending code drops and everything else. I really do like the vision that that puts in your mind and how easily can share that with testers or your QA analysts to ensure that they are asking themselves that question and looking at coverage both from the point of view of of making sure you achieve that much asked for 80 or 90% but also the when, when did we cover it and how did we cover it and covered enough. That really does speak to the mapping aspect of coverage. I think that’s a nice insight to take away.

Michael Larsen: Excellent, so I guess in some ways also this is all very theoretical and we’re just looking at this from a high level perspective. I know that Matt and I were discussing this as we were getting ready to start the recording this morning that he said that he had a pretty concrete example as to how he was able to do this. So if you don’t mind, Matt, can I put you on the spot?

Matthew Heusser: Sure. Yeah, and I should add, Perze brought up a really important point in that you make one small change to the code after you’ve done this massive, let’s assume it’s customer facing coverage done by humans and then you make one small change to the code. Does that invalidate everything and your coverage goes back down to zero? It’s like lines in the sand, the wind blows and coverage slowly decreases over time. That’s the way I talk about it. When I deal with those kinds of problems and that’s a universe where we can’t just press a button and get results in an hour that we have high confidence are accurate. Why might that happen? Well, for instance, say you’re working on a mobile application. A lot of moving parts with a lot of back end API stuff is changing all the time. Multiple business units are changing their back end APIs, and there’s an internet of things device. It’s also going out with, it gets a firmware update every six months or so. Your testing of the mobile device might be good and the next day there might be a new internet of things device that goes in the marketplace that invalidates your testing. There’s lots and lots of moving parts. Now I imagine there’s, I don’t know, 16 different teams on four continents working on this project and that’s being conservative because it doesn’t really count some of the infrastructure work and there’s lots of other parts in this large global business that are doing things that impact you and you have to coordinate testing and release. Realistically at that scale, what you’re going to have is entire teams get spooled up and funded by a business unit, contract team, but then go away when they have delivered their features. So if you’re doing SaFe, and this is the outlines of a real story and you want to release every 12 weeks, every six sprints, and the last sprint is kind of a testing, fixing sprint, you know, we’re talking about a physical device that actually ships. There’s a lot of coordination going on with an actual ship date. We’re not Tesla, we’re not gonna power down the internet of things device and download a new update. It might be at a place where there’s no internet connection at all. We want to get it right. Where do you even start? So the theory is every team is going to test their own stuff for two weeks. It gives us a thumbs up and we ship it. But how do you coordinate that and how do you track that? We brought in Karen Johnson who some folks here know longstanding expert in software testing, been around a long time, former member of the board of directors of AST. The first thing she did was she built a site map. So that is for all the features of the application who was responsible for it. Then, we went back and said, how do you test it? And then we came up with test plans and then we had high level and then had them reviewed and we compare that to the tooling that we had. The things that were automated and what the risks were to try to come up with this visualization of how testing is done. Then we said, what can we get done in a day? What can we get done in a week? What can we get done in an hour to come up with a test plan for a large group that could hit the high notes for the appropriate amount of time invested and frankly I think that was kind of a big deal. So that’s one way to conceptualize and visualize and what we ended up with was something a lot like a low tech testing dashboard that James Bach has talked about for years with a whole lot of detail and then the individual elements instead of just being the name of the feature, they actually went back to a Wiki page, a Confluence page that would describe the feature in some depth. I think that was a real example of saying how much coverage we got rather than “I have no idea” and being able to create a visualization that would stand up to scrutiny we could give to management. That’s my story .

Michael Larsen: and he’s going to stick to it. Thanks Matt. I guess I could add a little bit to this as well. Having worked with Socialtext for a number of years and also the fact that Socialtext is now incorporated with… putting on how you look at it, one or two larger entities. It’s been interesting because of the fact that once upon a time we would look at our coverage and our testing coverage as being just our application. Even when you say, well, we’re testing our application, as was alluded to earlier, there is more that just the stuff that makes up our product. We have individual pieces that are not even related to our product that we have to consider. I don’t think I’m breaking anything by saying this. Lots of people use it. Our general web component is done with nginx and so if there’s a change made to nginx in the back end, that affects our coverage. If there’s something that changes a Postgres database that also affects our coverage and those are items that we may have very little control over. It’s one of the big nightmare points, especially if you deliver an application on a platform and that platform needs to be updated. That can really cause a tailspin for an organization. I say this simply because I’ve lived through it. Being able to go through and say, “Okay, yeah, we’re doing all of this stuff and everything looks good for the application that we’re working with. Oh yeah, our base platform is being end of lifed, so we need to move to something that’s more up to date”. And I will admit that those can be really scary conversations because you don’t know how many of your components are going to work after you do that update. And that’s a lot of stuff that’s not directly in your control. So I think it is important to not just look at what is your application and what coverage do you have to worry about, but also to look at stuff that is completely outside of your application and if you have to play well with others. So from there, I guess let’s move this over into another question… plain and simple, How do we recognize when we have gaps in our coverage? Perze, Back to you?

Perze Ababa: Oh man. Now we’re going into the negative space. When we bring this back into the notion of coverage, we’re really just focusing on the things that we know about. A lot of this stuff when we talk about gaps is that the process by which you discover things that you don’t know that you don’t know about the product, whether it’s a serendipitous event or activity or maybe a production problem that costs, that brings up a particular topic to attention, but at the end of the day when we are looking at, you know, our systems that maybe it’s as big as what Matt was mentioning or maybe it’s just a pretty straight forward component, which is a REST API for example. Yeah. At the end of the day, if we don’t really have some sort of an ability to probe for at least a way to recognize very specific anomalies within our code or within our infrastructure that shows like “I’ve never seen this behavior before”. I have a very specific example for this. This is a project that I had from three jobs ago. We released this feature that gives us the ability to display the top seven latest news from a given data set, but that data set is collected directly from the database and after we released it, it was functionally tested, but then our infrastructure folks that are running in like, “Hey, something’s happening every seven minutes with our database because there’s so many locks that are happening”. What actually happened was the application, instead of pulling that information from cache, it pulls it directly from the database. It’s also using a bad way to pull things from the database which locks access from everyone else. So we wouldn’t have recognized the effects of that problem without infrastructure monitoring, for example. In this case, that gives us a completely different dimension, purely active monitoring that gives us these informations and this things that we know about is definitely key for us, especially when we’re only focused on one thing. Maybe it’s just the focus on product functional quality for example, where we see this and I know the notion of having these processes that gives us more information that we don’t know about in a way to analyze this information. That’s definitely key.

Matthew Heusser: Yeah, I was about to say almost the exact same thing. You find out about a gap in coverage because someone says, “Hey, did you know our application crashes on the Apple watch?” And you’re like, “no”. “Well you test it on the Apple watch?” And you say “not this release.” You might just have a gap in coverage.

Perze Ababa: Definitely. I mean at the end of the day we, I think we all agree that when we have some sort of a coverage outline or coverage map, we realize that this is some sort of a living document. There’s going to be some gaps that will come to our attention that we’re going to have to be able to figure out how are going to dig deeper into those gaps. An active way of looking at the system without the blinders of requirements or maybe the requirements will help as well because we completely missed that part. But then this has to be a balance between the time that you have and the skill that you have as a tester and to making sure that that coverage map that you have really reflects the reality of your, where your application is at the present.

Michael Larsen: So something that I’ve found helpful in this regard is just about every release, like when I finish up a given release and we get to have the feedback from when it goes out. I’m not going to say this happens all the time, but what I do like to do is I like to go and talk to our customer, support people and just see if after any given release, if there is an uptick on either complaints or something that happens because usually it’s not that, “Oh, the software was released and now we have an uptick in complaints, which means we’ve missed something”. More to the point it’s just, “Oh, there’s been a software release, which means there’s been a churn, which means people for whatever reason are now a little bit more alert at the fact that the code has changed and maybe something that they never noticed before suddenly pops up”. It’s like, “Hey, this is behaving strangely”. Maybe it is, maybe it isn’t, but they’ve just now recognized it because we just invariably get an uptick in customer calls every time we get a release, even if the thing that we’re getting the call about has nothing to do with the release. But I think that that is also a good point to this and Ellie, I’d like to kind of throw this out here too because with Qualitest in the testing that you work with, do you develop coverage ideas on the fly? Do you do that with customer interactions as you’re looking for gaps or you’re looking for areas, “Hmm, we can add that” or “maybe this is something we haven’t considered yet.” How do you come to that understanding?

Elle Gee: One of the advantages of working in a pure play testing company and having such a diverse range of clients is also one of the biggest disadvantages because to answer your question is it’s very complicated. Every project is so different. Every process varies and is tweaked and we don’t necessarily subscribe to “Here you have to do it exactly this way “. We modify our approach individual projects needs, which that brings with it the challenges of this conversation. Coverage. Coverage on one project and how we track that. How we look for the gaps will be so very different to the next project that the team right next to you is working on so common things that will come up or techniques that we use and that I like to encourage would be starting at the test planning stage. We touched on the fact that coverage is more than just the functionaries. It’s the integration, it’s the security testing, it’s the accessibility testing. It goes into the metrics of how much of that we’ve covered at the test planning stage, identifying who owns the coverage for the various aspects of the applications, the componentry. Sometimes if you’ve got a chip that you’ve used in your device for five years and that manufacturer just went out of business in that release, you’ve suddenly got to throw in there the coverage to make sure that you are testing that chip. What are the manufacturing coverage of that? Who’s taking care of that and the integration, too? Again, circling back to how do we look for the gaps in there? For us, it’s a combination where we able to create a site map, coverage map, a traceability matrix. That’s one point. That sometimes also means if you’ve got the mapping back to traceability, but the approach of the project was to do happy path testing and you didn’t do any ultimate path testing. You’ve got a huge gap there, so that might be a question that’s just asked to the team, is your coverage all happy path? Are there alternate path testing in there? It’s about also making sure that every engineer on our teams is keeping their eyes open at every point. We encourage helicopter view or exploratory testing as part of even standard scripted tests runs. Don’t just run your test and look at just these steps, but always keep eyes and ears open and look for anywhere where something doesn’t look right, raise it. That’s going to help us to find them close some of the gaps. There’s no, in my opinion, easy answer for how to identify all the gaps. I think from my point of view it’s about encouraging people to look at lots of ways to identify gaps and then voicing them so that we can fix them.

Matthew Heusser: Thank you, Ellie. I’ve had an opportunity in my role to work with Qualitest on a few projects and I will say that the commonality I see on every customer is different than they really adjust to the customer needs. They seem to be very good on taking the requirements and trying to have a traceability to test them leaving in room there for style and lots of different things, but because it’s outsourced, if the requirements are unclear, wrong, incorrect, out of date, it’s going to make things harder. The closer your model maps to reality, the better shape you’re going to be in and sometimes those models don’t include possible combinations of things customers could do. You know it’s interesting when you say, what if the customer does this from requirement number one and that from requirement number two and the product owner says, huh, I never thought of that.

Michael Larsen: Yeah, that comes up quite frequently and I can totally appreciate that, especially when you start to deal with interactions between products. Again, Socialtext has been the product I’ve worked on consistently for the past almost seven years. The interaction and playing well with others has become a much bigger part of our component, not just the way that Socialtext works and interacts, but other tools that we display within Socialtext and being able to have them appear has become a much bigger issue. So it’s not just a matter of, Hey, what is Socialtext do? It’s what is Socialtext do in conjunction with this video service or in conjunction with this video service as it interacts with the talent management system and vice versa. And the more of those interactions that you have to deal with, your coverage map explodes because you’re no longer just dealing with this one piece of functionality yet could be a piece of functionality in our case, something as simple as a widget container. For those who don’t know what I’m talking about, I’ll just say this, that in Socialtext we have these widgets that you can display on a dashboard or in a group or something. And the purpose to it of course is just that you can encase it and use that functionality where you need it and incorporate it with other things. So you can get one view of something at the same time. If you’re actually using a microapp service to allow you to jump to another product that you’re displaying and that product allows you to do another jump and it’s all displayed in the same widget. Now you’re actually interacting with and your coverage needs to not just look at the widget itself, but all the stuff that’s going on in the widget and that stuff that’s going on in the widget can be referencing three different convergence points and that can get really scary. So we’re going to be talking about coverage. There’s no way that we can escape the elephant in the room and that’s metrics and the fact that metrics are always going to be included because we are a measurement based society. We care about measurement. Measurement is how we tend to quantify whether or not we’re making progress or we’re doing good things or not doing good things. Totally subjective, I understand. And the danger for metrics of course is that they’re subject to being gamed. Perze. Back to you, how do you deal with this? How do you deal with coverage metrics? The good, the bad, the ugly,

Perze Ababa: right. So this is kind of tricky. The most simplistic one that we see really is when we report and say what percent of the functionalities that we have corresponding tests with. That is a very, I guess one dimensional, somewhat, you would say it’s an invalid metric because you know you’re trying to divide a number with infinity. Like I know we just came off from talking about, you know, recognizing gaps and that’s really the fundamental problem with assigning a completeness metric. You know, when it comes to coverage, what I’ve really found out is that maybe I’m better off in looking at some surrogate measures around what can help me really understand our coverage for this particular topic. I know Elle kind of mentioned a little bit about this and this is in relation to my initial definition, or at least the analogy that I used for a map. A good way to measure if you know, we look at when was the last time somebody looked at one of our features and that really gives us an age based metric. It’s like maybe something changed or maybe there’s some like an automated trigger that says, Hey, something did actually change. Maybe you should take a look and update your coverage list or your risk list to make sure that these things are done. The other number that I’m actually looking at, too, is – this can be very wrong if I don’t qualify it – but when was the last time that we actually ran all of the tests for our known features and how well did we do with that? Maybe this is the time to trim the edges on our coverage list if things are still valid or things are no longer valid because we may or may not care about a particular feature anymore. I do want to open it to the rest because I could go much longer than this, but yes, the fundamental challenges when it comes to metrics is really are we actually measuring what we intend to measure in this that actually reflect the quality of our application instead of just describing a number that really has no bearing about the usefulness of our product.

Michael Larsen: Good points. Elle, I have a feeling that because of Qualitest’s involvement as literally a test consulting company metrics is probably a discussion that comes up very frequently for good or ill. What’s your take on this?

Elle Gee: Metrics define my life. One of my biggest challenge ensuring that our testers understand that testing equals numbers. Everybody needs to know the numbers. You need to know the numbers as you’re walking down a corridor and you encounter a business stakeholder who says, “where are we up to in the testing phase?” It’s not good enough to be able to say, we getting there. People need to and they crave to know “We’re 70% complete” and then they need to understand what does 70% complete mean? Does that mean that we’ve got one day until we can release? Metrics and numbers, they need to define every activity and never more so than coverage. I’m not interested in metrics on who ran how many test cases because when talk about gaming the numbers, that is the perfect place to do it. Somebody can jump in and run all the really quick test cases and have a huge number of tests run, but they’re not the meat of the application. When it comes to coverage, I find it’s a little bit harder to game the system or maybe I’ve just been lucky in working with testers who appreciate the importance of coverage, but I don’t know how to address that thought of gaming the numbers on coverage because I hope that my test teams, my QA partners understand just how critical it is to know how much you’ve tested and where the gaps are, and that they stay true to tracking numbers that help all of us to identify that and quantify that to our stakeholders.

Michael Larsen: Very good points. What do you think, Matt? Metrics. Much ado about nothing? Really important and we’re not giving them the right amount of deference? Something else?

Matthew Heusser: I expect anyone in the audience with a developer background is going, “Well why aren’t you talking about code coverage, man? Code coverage! That’s, that’s the 100% code coverage. That’s that’s the wave of the present. And if you’re not measuring code coverage of unit tests, then your incompetent.” And that solves all the problems. Well first, it certainly does not solve all the problems, but it does provide an amount of first order confidence and it is a number you can publish and it does help you identify the gaps. Well code coverage can be helpful as a percentage. I think if you’ve got a test project, being able to express progress according to the model, it can be very helpful. One thing I’d like to mention in terms of all these measurements is that they are all backward looking, like if our developers are really doing a good job of isolating their components and we know exactly what components changed and we have a high confidence that there aren’t unanticipated side effects and I’m much more interested in having a high coverage of the pieces that change then in touching everything. So “what is our goal for this release for coverage” is a relatively mature conversation. I don’t get to have it that often because that part about isolated components and confident that we didn’t make a change. It’s not as common as you would expect by reading the literature. The state of the practice is not caught up with the state of the art. And I know there’re going to be four people that listen to this podcast say, “Well my projects are” and I say “that’s great and that’s the reason why you’re speaking at conferences”.

Michael Larsen: Awesome. Okay. Well I mean we could go around on this all day long honestly, but of course, we have a set time that we give to these podcasts so we have to respect that. So I’m going to say at this point, this is probably a good spot for closing thoughts. We’ll just kind of keep going in order. So Perze, then Elle, then Matt and then I’ll throw something. So, Perze?

Perze Ababa: Alright. Coverage is an important topic because it gives us an understanding of where we are in our progress. But at the end of the day, your ability to be able to discover gaps is really important. So I would highly testers or test leaders, you know to focus on improving your skill, develop those testing heuristics that gives you the ability to discover and maybe even solve the problems that’s in front of you. Everything else will follow.

Michael Larsen: Fantastic. Elle, what do you think? Closing thoughts.

Elle Gee: 100% agree with Perze in terms of the fact that it is about individuals who are testing and encouraging an attitude of looking broadly, testing broadly testing beyond this, the test that you’re running right now and and just keeping the eyes wide open and challenging everything as you’re testing an application and I encourage anybody who is in testing to embrace the importance of understanding test coverage and how it plays a role in ensuring the quality of the application. It really is the foundation of effective testing.

Michael Larsen: Matt? Closing thoughts?

Matthew Heusser: I think the gaps are the interesting thing. I think we say “what do we do about those gaps?” Following that, recognizing that it’s backward looking. Okay, Apple watch broke. Is it ever going to break again? Should be tested every time and when you really get good at finding those gaps, they’re fractal. You find one gap and find a smaller gap inside of it too. Inside of that three inside of that and you go get those three and find another one. So you have to stop at some point. You have to say this model is complex enough and how do you do that? When do you do that? Those are good questions to ask and when you’re done you’re going to feel like you’ve written it on the back of a napkin with a terrible marker. But if you don’t have anything, moving to something, if your app is complex enough that you need to start thinking about coverage, just take whatever you have and make it one step better and be happy with that.

Michael Larsen: Excellent. And I guess I will throw in my thoughts, too, for final thoughts on this. Mine would be to say you have to maybe under utilized resources, as I alluded to earlier, your customer support team keeps an active database. It might be very well worth your time to go explore that database, especially to see things that haven’t been answered. Sure, we jump on the stuff that is most pressing and those we will certainly address, but there might be a few things floating around there that have not been addressed and could be ticking time bombs that would be worth your time to look at. And the second thing that I would suggest is if you have a customer relationship engineer or somebody who actively does customized work for customers or sales support or any of that, they’re also a really good person to talk to because they may be working with questions that are either just percolating up in customer’s minds or they may have done something special as a one off that might actually become more pressing later on, but nobody else has just happened to think about it yet. With that, I want to say thanks to everybody for joining us and we look forward to having another conversation with you on The Testing Show soon. Have a great day everyone.

Perze Ababa: Thank you.

Matthew Heusser: Thank you.

Elle Gee: Thank you.