For this set of two shows, we decided to do a forum of just us regulars, and we were going to look at a couple of news stories. Those stories turned out to be the bulk of an extended conversation. We realized that the theme in all of them were the claims that companies made vs. what actually happens in workplaces and organizations.
Needless to say, this is an opinionated set of shows this go around as we discuss the promise of Machine Learning and what it actually delivers. We look at it in the light of other promises made over the past fifty years and how, often, it’s not the best idea that wins the day, but the first idea to gain traction that does.
Panelists:
References:
- The Testing Show, Machine Learning with Peter Varhol, Part 1
- The Testing Show, Machine Learning with Peter Varhol, Part 2
- 4 Important Things About Hewlett Packard Enterprise’s Multibillion Dollar Spinoff
- Worse is Better, Richard P. Gabriel
- 2017 State of Functional Testing
Transcript:
MICHAEL LARSEN: Hello everyone, Michael here. What you are about to hear is a two- parter on “Claims and Practices” in the software testing world. Part One is appearing this week as Episode 46. Part two will be posted in two weeks. With that… on with the show!
[Show Intro]
MICHAEL LARSEN: Hello, and welcome to The Testing Show. I’m Michael Larsen, your show producer, and today, we have an all-panel show. Let’s welcome our regulars. Jessica Ingrassellino?
JESSICA INGRASSELLINO: Hello.
MICHAEL LARSEN: Perze Ababa?
PERZE ABABA: Hello, everyone.
MICHAEL LARSEN: Justin Rohrman?
JUSTIN ROHRMAN: Good morning.
MICHAEL LARSEN: And our moderator, Mr. Matt Heusser. Take it away, Matt.
MATTHEW HEUSSER: Hey. Thanks, Michael. It’s great to be here. We’ve been talking about some of the claims that have been coming out of the test community. I know we’ve mentioned it on the show before the potential for Machine Learning, and I think that probably the right term is “potential.” I don’t think that we’ve figured this thing out. Actually, we did a recent show on it. We had Peter Varhol on. There are a lot of claims in testing that make it sound like the potential is reality, and that’s what really concerns me personally. A couple of new pieces. First of all the HP Micro-Focus Deal has closed. HP Enterprise is now a Division of Micro Focus. My company does a little bit of work for HP on the journalism side. QualiTest does significant support of companies that are using the HP Toolset, but I think the merger is interesting because Micro Focus doesn’t have a history of investing in tools. They have a Harvesting Model. They’ll continue to make sure the tool exists and you can get support and they’ll get paid for it. I think that’s fair.
JUSTIN ROHRMAN: I’m just wondering what that’s going to do to the UI Automation. HPE used to have a pretty big claim and that was TCP and, I guess it’s called UFT now. Are they going to like rebrand or continue developing or set up some type of platform, or I just wonder what’s going on there.
MATTHEW HEUSSER: I think their two biggest pieces are their Test Automation Platform, which is UFT, which honestly I just don’t hear about much lately.
JUSTIN ROHRMAN: Yeah. I don’t either. That used to be “the big thing.” Whenever you looked on jobsites, you would see, “Must have UFT or must have QTP.”
MICHAEL LARSEN: Haven’t seen any requests for any HP stuff. I haven’t seen anything for that since like 2010.
MATTHEW HEUSSER: See, I was just thinking that we had just pivoted our customer base and I think maybe it’s fair to say that they’re being used by older, more established companies that aren’t growing very fast and have low turnover. I think it’s still out there. It’s pretty big.
MICHAEL LARSEN: I’m not saying it’s gone away. It certainly exists. I do see comments about it, and I see people talking about using it. Every once in a while on Twitter I’ll see somebody posting a question about utilizing it or making some feature request about something out there. So, I know it is being used certainly. But, I think you’re probably right in this regard. That, just because of the nature of the companies and the people that I talk to, who are smaller, newer, and littler teams, it doesn’t make sense to be using a machine like this. Most of us have used some former variant of Selenium or Local Unit Tests or some other framework to be able to put something together. Not that there’s anything bad about or wrong with the HP Stack. It’s just we’re not in the space where our product makes sense to be using it.
MATTHEW HEUSSER: Well, the other thing is: The Test Case Management piece. If you’re going to use one or the other, it’d probably make sense to use them both because they integrate so well together. But, there’s a lot of scrappy, new, upstarting competitors in the Test Management Space. And again, we’ve worked with all of them. We’ve worked with Zephyr. We’ve worked with SmartBear. At one point, we were working in QASymphony. All of these companies are trying to make it kind of light, quick, and easy. We work with Gurock, and I’ve just got to declare that. They’re trying to make light, quick, easy, more human, more flexible test case management tools. There are a lot of options out there right now. Some of it’s web based. Some of it’s free for 30, 60, or 90 days. So if you want to play, I think now is a pretty good time to play around with the different stuff. The only problem is going to be if you’re locked in because you have six years of Legacy Data stuck in a Test Case Management System. Perze, do you use Test Case Management Systems over there at Johnson & Johnson?
PERZE ABABA: We are actually.
MATTHEW HEUSSER: Did I mispronounce your name again?
PERZE ABABA: Yes, you did.
MATTHEW HEUSSER: [LAUGHTER].
PERZE ABABA: It’s okay, Matt. I love you anyway.
PARTICIPANTS: [LAUGHTER].
MATTHEW HEUSSER: Thanks, brother. I appreciate your grace.
PERZE ABABA: Yeah. So, we have been. Well, we have multiple groups within Johnson & Johnson that use Test Case Management Systems, and you know we’re looking at better ways to leverage the information that we can get out of each of the systems. Well, the beauty of the more modern Test Case Management Systems is that there is a way for you to export, you know, your data into a more common ingestible format.
MATTHEW HEUSSER: Do you mean results or the test cases themselves?
PERZE ABABA: It could be a combination of both. There is a way for you to export the actual script itself. There is also a way for you to export the results as well in a way that you can have a different system ingest it. Like Tableau or any other system that can ingest either XML, or our data of choice is in JSON Format. But, there’s definitely a ton of work that needs to be done to come up with, you know, a holistic dashboard. But then, it also does boil down towards the type of data that you want to dig into, “What’s relevant to you, from your perspective?” There is experimentation done on a per-team level and then, as you go up the chain, depending on what they need, that kind of gets tweaked a little bit to, “What’s useful for the team?” But, as long as you have data that you can easily ingest and parse and then connect to what it’s actually doing, I think that’s the one good key to a pretty decent Test Case Management System in my book.
MATTHEW HEUSSER: Yeah. You know, for some time, I’ve been saying, “Ah, who needs a Test Case Management System? Ah, just test it.” Blah, blah, blah, blah. “Get rid of regression testing and then test the stories, you’re good. Whatever. Continuous test, come up with a risk census, who needs it?” That’s really not realistic for larger or Legacy organizations, but I think the tools are also morphing. They are becoming on the output side better dashboards of quality. On the input side, more than just a glorified Excel Spreadsheet with some ability to create test runs and some ability to track and reserve, “I’m working on this a piece at a time.” Instead, we’re getting an aggregation of data from unit tests, integration tests, external sources, humans taking notes, usability testing. All of those sources, you’ve got a hub that can process them all and give us a unified picture of coverage and results. That’s the potential. From a scale of 1 to 10, where 10 is “we’ve realized all of the potential” and 1 is “that’s a cool idea, someone’s got to go do it,” I’m not sure where we are on that ladder. But, realistically, I think we’re getting there. I think we’re at a 5, 6, 7, or somewhere. What I’m hearing is that it’s working for you, for a Fortune 500 Company. Did I hear that right?
PERZE ABABA: Yeah. It’s a work-in-progress. We’re continuously tweaking. There are multiple tools. There is no one tool to kind of rule them all. It’s a matter of the people who are using a given tool to make sure they get what they need. One of the bigger challenges that I’ve seen with Test Case Management Systems—like I have experience as an admin for Quality Center, when Quality Center was Quality Center, [LAUGHTER], way back in the day—was this huge dump of data, “Export to Word” Feature that gives you 500 pages’ worth of stuff that nobody wants to read. So the challenge then is, “How do you actually get insight out of that, and how do you relay that to what your team is doing, if pardon the term. But, how do relay as you go to the left side of the SDLC as you go to the right side of the SDLC as well?” We’re actually looking at, “How do activities that we have in a given strength, for example, relate to the actual support tickets that we get after we release a product?” The ability to weave all of this information into a single, readable dashboard where you can get some insight, I think is pretty cool. But, we’re not quite there yet. But, we’re making some movement on how to normalize our data so we can just gather very specific insight that would cater to very certain groups of people in the company.
MICHAEL LARSEN: We’ve been going through something similar to this recently. Again, this is coming from a small company that had a very organic process that we grew ourselves over a period of many years. Now, of course, that process is now having to fit in with another organization that “owns our product,” so to speak, “Well, now we want you to use this different approach and use these different tools, and we’re going to unify everybody on the same thing.” I’ll be frank, we’re dealing with some frustration because of that. What we worked on and what we’ve utilized has leveraged our own product to—I would say—good affect, and now a lot of that stuff that we have, here’s the challenge: Of course, it’s that when you utilize your own product to make all these customizations and capabilities for being able to use the skills within the group, it was great. We could radiate information within our own group just fine, and we felt that it was effective. We had our automated test suite that covered a lot of ground. We didn’t really need to do a lot in the way of Test Case Management. The cases that didn’t exactly fit into our automated suite, we would put it in as part of (say) our Release Burndown or an area where we say, “Hey. These are some areas where we definitely want to have some physical look at as we go through it. If there are any notes that we want to take down about what we notice, put them in, and then we can go back and we can review them.” For a small team like ours, that was easy to do. Now, when we get the request to say, “Hey. We’re going to incorporate this with a much larger organization and using their tools and having to adapt to what they’re using, then put it into some kind of a Test Case Management System,” part of me just feels like, “Didn’t we already solve this? Okay. I get it. I understand that when you’re part of a larger organization, they want to be able to share the information. That’s cool. But, when you’re dealing with products that don’t necessarily mesh up,” it’s not like you could say, “Well, we’re going to combine this all together and make one unified flow of information.” And, “No, we can’t do that. Or, at least not at this point in time we can’t without reinventing the wheel.” So, that’s our current frustration.
JESSICA INGRASSELLINO: What I’m hearing from you guys reminds me (A) of what I went through in doing my dissertation research and then (B) of something that I’m not just encountering in the Software Space but something that I’m encountering as I’m doing work in the Education Space, which is this idea that one system or one size fits all, and it’s something that we’ve largely known not to be true in certain instances. For example, we have standards for how we might treat a person who has heart disease, but how your heart is diseased defines how we treat it. Well, yes. But, if I’m Company X, “I don’t really care necessarily what product. I want them all tested this way.” Or, if I’m School X, teaching these students, “I want them all assessed this way.” I mean, our whole, entire, Education System in the United States is kind of based like that, and I won’t go on. But, I could. But, I guess what I’m not really understanding is like why people haven’t adapted a more flexible mindset, and what is preventing that? What prevents people? When I say “people,” I guess, in education, in business, in other fields, I mean, the leaders in these fields, I’m not really certain what is preventing some of these people from seeing and understanding that the world is not like a Mozart Symphony that’s written down and played exactly a certain way. It’s more like a Jazz Improv where you have a structure but you allow for flexibility.
MICHAEL LARSEN: I totally agree with that.
JUSTIN ROHRMAN: So, I think the music analogy is really interesting because even the Mozart Symphony is played fairly differently each time and everybody has their own interpretation.
JESSICA INGRASSELLINO: This is true.
JUSTIN ROHRMAN: That’s a perfect example, right? It’s super, super easy to take these test scripts and to think that they can be performed exactly the same way every time or to think that it might be ideal to perform them the same way every time. When, in reality, what they are is this like (the word is “heuristic”) squishy thing that you might perform somewhat like the steps if you want to or if you need to, but generally you’re going to have a lot of variation. And, that’s not the impression you get from looking at the tools. You don’t get this feeling that there’s going to be a lot of variation whenever you run it.
MICHAEL LARSEN: From my end, absolutely. There are certain tests that we run, and those tests we know are going to have a similar output each time. Because of that, they do serve a purpose. But, they’re not going to cover everything that we want to do. And, if we went to be able to be wide open and expansive and looking at a lot of different things, we’re going to get different results just because the ambiguity level goes up. Modern software is by its very definition ambiguous. It’s not something that we just plug in A and we expect to get B and that’s everything that we do. It doesn’t work like that. Or at least it doesn’t work like that for very long without us getting frustrated and saying, “No. I meant this instead.” Over time, of course, we’re going to struggle with trying to be able to create automated test suites or some kind of testing scenario that’s going to cover everything that we want to do. The more ambiguous our stuff gets and the more abstract we have to get with it, then yeah. The longer it’s going to take for us to be able to either codify everything in a meaningful way or provide coverage for every conceivable option we want to do or we just accept the fact that there’s a bunch of stuff that we just have to eyeball. And in the process of eyeballing, we see things that make us question, “What’s that all about?” And, we dig in. And, we discover something. That, even with the product that we’ve been using for a long time, we suddenly realize that, “We asked a question nobody thought to ask.” Because of that, now we’re going down a whole different area that we don’t really have something clearly defined for, and things get very interesting very fast.
JUSTIN ROHRMAN: So, if I could return to that music example. I thought of something. Whenever you go to see a Mozart Symphony, you’re not going to see that symphony. You’re going to see the conductor’s interpretation of that symphony, whether the audience understands that or not. I think that’s exactly the same thing with these Test Case Tools. Whenever a person is running the test cases, you’re getting their personal interpretation of it. You’re not getting just the straight-up steps. That’s a good thing that we should foster, is this interpretation.
MICHAEL LARSEN: Right.
MATTHEW HEUSSER: We’ve gone pretty far ranging here, but I think it’s really important. I think Jess raises a really important point that I’ve been trying to figure out how to put into words for a long time. The way I say it is, “Worse is better. The bad ideas tend to win.” Jerry Weinberg was doing unit testing in the 1950’s while automated unit testing you could code and that didn’t really stick. We had the sort of wonderful, chaotic ideas happening in the 1970’s and somehow we ended up with the Waterfall Model, which put us back 20 years. Extreme Programming was fantastic and powerful, “I’m going to change the world,” and then Scrum came along and Scrum won the branding wars. And then, what we actually got was this weird, bizarre Scrummerfall for most organizations. And then, SAFe came in to teach you how to Scale, which—I think the phrase that we were using—was “premium mediocre.” I mean, there’s some good stuff in SAFe. It’s okay. I guess. That’s the brand that is winning the discussion about, “How to do Agile for large organizations now.” The, “It’s okay. I guess,” wins. Rick Gabriel wrote a paper for the Lisp Community called, Worse is Better, which you should read it. It’s fantastic. He’s talking about how, “Lisp is better than UNIX and C.” In the end he says, “The good news is, you’re going to get a better programming language in 15 years. The bad news is it’s going to be a better version of UNIX and C,” which is exactly what happened because we got C++ and then Java. And now C# or some other language, instead of the Functional Programming Languages that he was trying to suggest we use 20 years ago. So, the worst ideas tend to win. I don’t know why that is. It’s very challenging as testers, but they don’t always win. There are little, scrappy organizations running around doing fantastic work. So, there’s this sort of dialectic, this dichotomy, between the loudest voices in the room and the best ideas. The difference between the state-of-the-art and the state-of-the-practice, and I think the people that are engaged that are listening to this Podcast and they’re reading blogs are helping to push both forward. We know the state-of-the-practice is not great. The state-of-the-art is. We’re trying to make them both a little better every day. That’s encouraging, right?
MICHAEL LARSEN: Definitely. It certainly is encouraging, but also at the same point—the reason why sometimes, it’s not so much that bad ideas win out—that inertia or movements (just whoever pushes first) oftentimes gets that benefit. “Why did so many people in the 1990’s learn how to use UNIX?” Well, it’s because of the fact that was what was available in many universities, and it didn’t cost a whole heck of a lot of money to be able to get it started. When I went to work for Cisco, the reason why we were using SPARCstations was because most of the people that came out of Stanford, that was what we had. Many people don’t remember that Sun Microsystems was actually formed from the acronym the Stanford University Network, and a lot of the machines that were developed from that literally came out of the fact that they didn’t have a tremendous amount of money to put into a proprietary computer system. So UNIX was the next available system for them to use, and they were able to hack on it and improve upon it and that created (literally) the 1990’s Wave of Computing. Cisco was born out of that Stanford Environment as well, and so a lot of the things that came together weren’t necessarily because they were grand designs with amazing hardware that everybody just glommed onto and said, “Yeah. This was great.” It was, “Hey. What’s available to us? What can we use, and what can we put together without having to incur a tremendously large expense and get it rolling?” For a time, yeah, C was the language. The people that market it or that are able to get it happening, win simply because they get there first and they get enough people to adopt it. That, “Changing to something else is going to be laborious.” So, it’s not so much the best idea wins. I think very often it’s the first idea wins.
MATTHEW HEUSSER: I think it’s safe to say that, “Good marketing tends to beat craftsmanship.” I mean, we’ve done some pretty amazing things in the industry. There’s awesome stuff happening that I’m really proud of, and there aren’t a lot of commercial companies that are large that are successful that are suggesting context-driven testing. QualiTest is one. That’s one of the reasons we work with them. So let’s talk about, “Marketing tending to win.” Particularly this claim that I keep hearing. We talked about Test Case Management Systems, and the claims of becoming the hub for integrations. I think they’re getting there. How about this claim that you’re going to use Machine Learning to generate your test cases? Who’s actually doing that?
MICHAEL LARSEN: I hear a lot of claims about it. I hear a lot of people describing it. We receive e-mails saying, “Hey. Here’s this tool that’s going to allow you to use Machine Learning to pinpoint and focus on these things.” And yet, “Who’s actually had success with it?” You hear crickets. That’s not to say that aren’t people doing this. I would guess that if somebody was going to be marketing this that they have somebody that they can point to as an actual case study who has successfully done it. But, so far, I haven’t.
MATTHEW HEUSSER: I haven’t seen one yet, and I have a pretty broad view of the community. I mean, this group has a pretty broad view of the community. And yeah, I just got this. There’s a lot of these—these State of Testing Reports or Testing Industry Survey Reports. I just got one from a company called, “Panaya.” Their recommendation for testers (and this is a quote verbatim) is, “To remain viable in an increasingly complex and competitive marketplace, leverage Machine Learning to create and maintain automated test cases based on real live production data.” You could do model-driven testing with real live production data, and I’ve done that. Lots of people on the show have done that. But, in terms of genuine Machine Learning, which we have already done a show on, in terms of genuinely using a programming language like R or Python to analyze test data to create automated test cases, I have no idea what they’re talking about, and the 26-page survey does not return to that theme or tell you what it means. That’s what really bugs me, is the lack of detail.
PERZE ABABA: You know, we tried to look deeper into, “How can we employ existing Machine Learning algorithms into, you know, helping improve our testing better?” Then, the furthest that we were able to go to is really, you know, what looks like model‑based testing. So, we’ve all agreed to say that, “We can’t call this Autonomous Testing or Machine Learning,” but mostly like, what we’re really trying to get to is either supervised test execution versus unsupervised test executive because you’re still building the things that your test is supposed to look for. You’re introducing capabilities to your system so that it can dig through given context. Like, for example, if I have a native app in iOS, I can have it go through a very particular, I guess, locator within a given page and attack the flow of how far I can get by clicking around or performing user gestures. But, at the same time, I can tell the system, “While you’re doing all of this, be mindful of everything else that’s happening within your system, and then gather that data for us so that we can actually detect how this particular change in code with affect it in the user system, looking at a spike in memory usage or spike in CPU.” But, that’s not Machine Learning. You’re dealing with—
MATTHEW HEUSSER: Ooh.
PERZE ABABA: —gathering lagging indicator, pulling metrics, and how to react based on that.
MATTHEW HEUSSER: Ooh.
PERZE ABABA: You’re not—
MATTHEW HEUSSER: But, now this is a potential. But, you could use Machine Learning to look at production logs and log data to look for trends between CPU, disk, memory, time, and dependencies. You could use Machine Learning to say, “Hmm. There’s a worrying trend here with the database when it makes these types of queries.”
PERZE ABABA: That’s monitoring, right? I mean.
MATTHEW HEUSSER: It’s not testing. I mean, it’s not what these guys are talking about, but that could be an application for Machine Learning in the IT Organization.
PERZE ABABA: I mean, the one thing that we think we can pull out of it: For example, if you’re looking at a typical linear regression as a way for you to look into, “What are out users doing that triggers all of these performance spikes?” One of the bigger challenges that we happen to form assess needs that we don’t really know, “What is a natural pause, and how do we define that properly?” That could define a very random number, but if I don’t have data around that to support how long a user takes before it performs actions, that’s something that we can gather now. That’s something that linear regression can inform us in saying, “Hey. I’m going to increase the number of users. I’m going to increase the number of CPU’s and memory that I have in the area. What would be a great pause between requests?” And then, it can suggest in saying, “Okay. Just stick it, put it in, between 100 milliseconds to 400 milliseconds, and because that’s what I really observe what happens in the real world.” But, I guess, I would. Go back with me. Remember when we first had these conversations around TestOps or Testing around DevOps, and that was really the question, “What is it? What does it actually mean? How do we deploy Machine Learning, you know, when it comes to the work that we do from a testing perspective?” I mean, we look at this Panaya White Paper, “Machine Learning” is only referenced to two times in this whole White Paper. It didn’t say anything about what it is and what we can use it for. It’s kind of misleading to me, when you can recommend something that you don’t even expound on as to what it is, unless it’s like a, you know, secret sauce. That’s fine. It doesn’t really help me as a consumer now.
MATTHEW HEUSSER: That’s something that Justin and I often talk about. It’s the sort of, “Watch this space” claims. Like, “Come on. It’s going to be awesome. You’ll see. We’re going to have a big announcement to come.” Occasionally at Excelon, if you go over our blog, and at QualiTest too, we’ve said, “We’re going to have an announcement on Friday, the 22nd, at 3:00 p.m.” Right? [Laughter]. Like, it’s not vague. It’s, “We’re working on this thing that’s not quite fully baked, but it’s coming.” And a lot of times people say, “Watch this space. It’s going to be awesome. Throw out a buzzword. You know, we’re looking at integrating. There’s really some impression things when you look at the combination of Internet of Things and Virtual Reality. It’s going to be great.” I don’t know what that means.
JUSTIN ROHRMAN: So, I really like buzzwords. I think usually there’s something behind them and it gets a lot of people riled up and talking about the subject for various reasons. But, on the Machine Learning Topic, I think it’s usually a question of, “Scope of definitions. What is testing? What is Machine Learning? How can you reduce the things we’re doing enough to fit into that category to make the claim that we are doing Machine Learning and testing?” Also it’s a question of, “Is our company big enough to afford to spend millions of dollars on this garbage?” Because most companies aren’t. I think this is being done, but it’s probably at maybe like five-or-ten companies in the world. The fact that it shows up on this State of Testing Report as an up-and-coming thing is kind of laughable. But maybe Netflix, Facebook, Twitter—those kinds of companies—are investing the dollars in Machine Learning.
MATTHEW HEUSSER: Yeah.
JUSTIN ROHRMAN: So if we wanted to see what’s happening, we’d talk to those people instead of people working at like mygoofyapp.com.
MATTHEW HEUSSER: So Netflix, for instance, or Amazon, is looking at, “What clicks on what when,” to figure out how to optimize to get more click throughs, and they’re doing continuous AB Split Testing and all kinds of crazy stuff and they’ve got the scale to do that.
JUSTIN ROHRMAN: They have the money to invest in it.
MATTHEW HEUSSER: Totally.
JUSTIN ROHRMAN: Like, they have the money to invest in engineering problems.
MATTHEW HEUSSER: If they get through 100 people with that problem and get a 1 percent increase in conversion rate that generates how many million a year? Billion, sorry. Billion, kind of like, “You don’t want to do that.” But, I think it’s worse than that. I think that there are a few companies who are using this to sell services. Like, “Oh, no. You wouldn’t do that yourself, because you need 100 people. You should just reuse our patented methodology process and we’ll teach you how to do it,” and that really concerns me. That’s an accusation. I don’t have the evidence to make a direction accusation yet, but like I’m doing the research right now as a journalist to figure that out.
[END OF TRANSCRIPT]