This is a transcript of episode 112 of the Troubleshooting Agile podcast with Jeffrey Fredrick and Douglas Squirrel.
When a client tells Squirrel “of course there are no releases on Fridays”, it’s a red rag to a bull. After Squirrel rants for a bit, he calms down and we argue strongly for releasing often, even when it hurts—in fact particularly when it’s difficult!—to “bring the pain forward” (Jez Humble).
Squirrel: Welcome back to Troubleshooting Agile. Hi there, Jeffrey.
Jeffrey: Hi Squirrel. And I’m excited to hear about a story you have to tell us. And that really was the inspiration for this week’s topic. You know why? Why should you release on Fridays? Please tell us about this story.
Squirrel: Oh, yeah, I’m in I’m in rant mode again because I had someone say something that really got my goat. I was doing what I often do. I do health checks and due diligence and so on. So I’m checking out a company and writing a report on all kinds of fantastic things. They could do better. And this person was describing their release process to me and they said, oh, yes, and we use Kubernetes for this and we click that and there’s the code reviews and all the different components. And he said, and of course, this is the part that really got me. Of course, we don’t release on Friday. And because I wasn’t in an advice mode, I wasn’t consulting, I was recording. I just carefully recorded ‘Don’t Release on Fridays’. And then I carefully in capital letters wrote do release on Fridays, then said we should talk about this on the podcast, because I just think it’s very important that if things are hard to do, if they cause you pain, that’s a signal that you should do more of them and not less of them.
Jeffrey: That’s interesting. So if it hurts, i should do it more often, not less.
Jeffrey: That’s that’s unintuitive .
Squirrel: Precisely. But that’s the point. So, for example, what people will often say is and this is exactly what this person meant when he said ‘of course’, that, well, if we released on Friday, maybe even at 4:45, something might go wrong with the release. And then we’d have to get called on Saturday and come and fix that are late on the night on Friday. And I have a hot date. I don’t want to be interrupted. But the point is, if you always avoid that, if you don’t do anything to cause that situation to change, it will probably get worse. At least it certainly won’t get any better. And therefore, if you ever really do need to release on a Friday or if you have a hot date on Wednesday night, you’re going to be interrupted because you haven’t done enough testing. You haven’t found out enough about your system to find out what the heck is it that makes it so unreliable that you feel like you can’t release for a whole day. You’re giving up 20 percent of your potential release time. You’re reducing your cycle time, you’re decreasing your flexibility, and you’re doing that because you’re afraid. And if you don’t confront that fear, you’re going to have a lot of trouble. Okay. Sorry, I’m finished ranting. Do you have. Do you have any opinions about this? Jeffrey, do you share my view or do you see it differently?
Arguments Against Friday Releases
Jeffrey: Well, I certainly do share your view on this. However, I also, you know, have. And this is something that’s a longstanding issue. This is something that I’ve come across for. I don’t know how long. Probably for me, this kind of conversation goes back to the early days of extreme programming. So probably back into the like ‘98, ‘99 kind of timeframe and the idea of continuous integration, which was very similar. But I- so I want to just try some the objections I’ve heard over time over these years, you know, so like really Squirrel, this is you know, how bad is this really? I mean, it’s so fine. It’s Fridays. It’s one it’s one day out of the week. What why is why is that a big deal?
Squirrel: Well, it’s a big deal because Friday is just one example of a time when you might be sensitive. So you’re going to probably do other things like, oh, we have a big client rollout.
Squirrel: Oh, there’s a big demo for the sales team. They’re going to be out there showing the software to hundreds of people we’d better not release. Then you’re going to get more and you chop up more and more of your week with all these times when you’re sensitive and you’re saying, I’m afraid to release, I can’t take action. I don’t trust my system. I don’t trust myself. I don’t trust my on call rota and my ability to recover all of these things are going to cut away at the time that you have to release. It’s not just Fridays. It’s going to infect the rest of your release process, your attitude to risk. And instead of mitigating the risk by by actually addressing it, by doing something about it so that your system’s more reliable, you’re just covering it up. You’re just saying, OK, we won’t do that.
Jeffrey: Well, but isn’t this just prudence? I mean, we’re just trying to be careful here. We don’t want to have any problems. Isn’t that just normal prudence? Isn’t this just good, you know, being properly professional about it?
Squirrel: Well, no, because if you look at people who run much more dangerous types of systems: nuclear power plants and spaceships and other things like that, they drill over and over again to understand their failure modes. Failure modes are very important to them. If you want to read a great book on that. Have a look at Meltdown by Christopher Clearfield. And I’m I forget his author’s name, but we’ll link in the show notes. The one of the problems is that people try so hard to avoid the complexity of their system. They say, I’m just going to try to work around it. I’m going to close my eyes. I’m just going to try not to have this happen. And often that actually interferes much more with the correct operation of their system. Good model for dealing with high risk situations is to confront the risk and mitigate it, not to try to avoid it.
Jeffrey: So I’m having I’m having trouble continuing to be the straight man here because I do I do so agree with you.
Jeffrey: But these these kind of objections, I think, are what people say. The one I didn’t pull up yet is kind of the, you know, social proof version, which is. Oh, come on. This is this either either a version of this is what we’ve always done or look, every place I’ve worked with has been like this. We’ve always had a quiet period. And the quiet period could be every Friday or it could be, you know, as you say, that the week before a major event or it could be something around holidays. But, yeah, you know, that’s. Isn’t this a normal practice? What’s what’s wrong with being normal?
Squirrel: That normal is actually pretty bad that most people are not. And I do these due diligence and health checks all the time. And one of my key questions is always, what’s your cycle time? That is, how long is it between when you had an idea and when you’re testing it in the real world and it doesn’t count if it’s in staging, it doesn’t count if you have it working on your machine. It counts when it’s live in real users are using it. And I think my my worst record for that, I think the worst cycle time I’ve seen well, I had one where it was two years, but that was truly pathological. The worst one I’ve seen that it wasn’t involving kind of the decimation of the development team was about six months. And this was an environment where there was there’s a medical component to it. So there’s there’s health and safety and the protection of humans from getting wrong results from the software. And that was very important. I wasn’t suggesting that they not do that. But to wait six months between when you’ve done something and when you get feedback on it means that you are not able to be responsive to your users at all.
Squirrel: You’re just really frozen. And that company was. If we hadn’t done something about that cycle time, that company was going to go under. They weren’t going to be able to sell anything. Wouldn’t be able to help any people with their their medical conditions. So what we did there is to very carefully get the company used to the idea that perhaps we would be putting things live a bit more frequently and we got it down to two weeks, which was a big improvement for them. I think in many cases that our listeners will be in, they probably most of you are probably not in human safety environments. You could have cycle times in hours. And the fact that most people don’t do that is an indictment of our industry. It’s not a commentary on how great it is. That’s not the right thing to do. What we should be doing is getting feedback from our users as quickly as possible. In the fear of Friday and the fear of failure of our system is something we should be overcoming and mitigating, not hiding in the corner.
Jeffrey: I think that’s a really great point. And I have two thoughts about that. One is, you know, we were just talking in our previous two weeks about the value of emotions as signals for for what you might do. And it seems to me that here’s one example where we’re saying your anxiety is actually a message.
Jeffrey: It’s that you have the fear. You don’t want to ignore it. You want to use it to say, how can I be better? But there is this thing. And this is the second point, which is we do have an assumption here that people want to be better. And and if that’s true, better also means being different. It’s really you can’t be better by doing everything the same way you always have. You’re going to have to find a reason to change.
What Does it Mean to Strive for ‘Better’
Squirrel: And it really should we should define better as well. Whenever anybody says I have a concept, a betterism, that you just have this notion of better and nobody ever agrees on what better is. So everyone talks about what better would be for them if better for you is less interaction with customers, less meeting of customer needs and total safety where nothing can possibly go wrong or change, then this isn’t better. So so ignore us, you know, turn off the podcast. Now go do something else because you’re better for you doesn’t match better for us. But in that case, I’m not sure why you’re writing software in the first place. Because what would be better would be never to change it. This is kind of like back to when one of the ideas that I think you might have taught me, I’m not sure. Long ago someone taught me that if your unit test will not allow you to change the object that the class or whatever it is that you’re testing, then you might as well just do a hash of the source code and just verify that the class has not changed. And then go play golf or something, because there’s no point in making something that is so tight that it only allows- it prevents you changing. And I’m not sure why you are changing your software if you want total safety. If you want responsive software, if you want software that does that improves as it progresses, as you’re making the changes, you’re going to have to test that. In order to test that, you’re gonna have to release it. And you want to do that as often as you can and keep improving that response cycle and getting more feedback more quickly.
Jeffrey: I think that’s a really great point, that that better is relative and context sensitive. And we are making some assumptions here. One thing that I recommend people do, if they’re sort of struggling with this kind of thing, what would better mean for us? Why would we want to increase our cycle time?
Jeffrey: Why would it be good? Is to check out the book Accelerate.
Squirrel: Just just to clarify decrease cycle time. Make your cycle time faster.
Jeffrey: Yes. Decrease cycle time.
Jeffrey: Increase year takt time. I think anyway. Decrease it anyway. Yes, exactly. Get, get, get better here it be be faster that the book Accelerate.
Jeffrey: One things I like about it and link in the show notes is they talk about various practices and what the inter-relationship of them are, how improving in one area, what that tends to cascade to. So you can kind of get a sense what the levers are. What happens if I increase this or decrease that? What are the subsequent types of fallout? And what we’re describing here is something that I’ve experienced in a company we both work for, which is TIM. And at the time that I joined, we did have the idea of pretty strict release windows. We wanted to make sure that we were only releasing the TIM product, the main product on weekends was a Saturday morning would be our normal release hours. And that was because we had this sort of fear about, well, you know, what would we what would be the impact on our clients? We might have problems. What we’ve done is say, well, what are those fears and how could we mitigate them? And over time we’ve moved to actually not only do we not do our releases only on weekends, but we now move to them to be in hours. So you’re not even a late night release because we’ve said we want to be able to release and if there and we want to have confidence our release process.
Using Controlled Chaos to Prevent Pain
Jeffrey: And the only way to get confidence is by facing our anxiety. And this is I think this is really interesting here, because if we know from psychology that the only cure for anxiety is exposure, if you have a phobia. Ultimately, any sort of cure is going to involve exposure. And I think it’s similar with this sort of development environment. The only way to get over your fear of release at certain hours is to start releasing in those hours. Now, not saying just rush into it. You can. You have to build yourself up and then develop confidence. And that’s something we did over time. I said we built what we thought would be a a good system that would allow us to be safe and to roll back in case of any problems. And then we had some redundancy in place and then we didn’t just go trust it. We first tried to test it. And we we did a technique that I’ve heard referred to as failure Friday and again, link in the show notes. And it’s something you’ve come across this before, Squirrel.
Squirrel: I haven’t. So you were just telling me about it. What is failure Friday?
Jeffrey: So this is coming from really resilience engineering. This is kind of like the human version of the Chaos Monkey.And it basically says take a look at your system and where you believe you have redundancy. You know, test it. Essentially, you’re going to have this, I was about to say a simulated outage, but like, actually, it’s a real outage. If you if you.
Squirrel: Yeah, you make something fail.
Jeffrey: Yeah, exactly. You make something fail. And the idea is that you don’t makes something fail that you think would be catastrophic. But you if you believe that you have redundancy, say of your network cables, then you schedule a time and you unplug one of them.
Jeffrey: This is very unintuitive.
Jeffrey: But the theory is very much like you may have heard of it. You don’t really have a backup until you’ve done a restore and worked from the restoration in this.
Squirrel: And you’ve not felt fear and pain until you’ve tried to do the restore and the backup file that has been carefully checked and has been verified for months to exist turns out to have zero bytes in it.
Jeffrey: Oh, yes.
Squirrel: Not that that has ever happened to me.
Jeffrey: Oh, yes. That’s it. I think that’s a feeling that probably none of our listeners in the same in the same way that it’s never happened to you. It’s never happened to many of our listeners. Never. Never. A common, common problem.
Jeffrey: And so if you’ve had that experience or, you know people who had that experience, you can understand the value of actually doing the restores, working from the restores. So you have real confidence. Similarly, we could build confidence in our system by having induced failures if we believe that our system should be resilient in the face of these various problems. Let’s go try it. And then that way we could build confidence in the systems that we built, that we, in fact had the sort of safety and resilience that we wanted. And that was an enabling step to allow us to move to in hour releases.
Squirrel: There you go. And buy in hours. Buy in hours. I just want to clarify for listeners who who might not know the finance world very much. You mean in the hours of financial trading. Yes. A world that TIM exists in. And so back in my day before Jeffrey took over at TIM, we would be very frightened, as you’ve described, of doing anything when the markets were open, because that would potentially cause some problem for people who were trading. And they’d phoned us up and and yell at us and come over and throw things at us. But what you managed to get us to do that the team to do is to move to safety for releasing any time when the markets are open. And that, of course, then meant, I imagine, that people didn’t have to do what I used to do, which is to come in on a Saturday and go press the buttons for a release to happen and monitor it. That would be more interruptive. So it’s not like my client that I started with where they said, ‘oh, well, we never release on a Friday because we don’t want to get phoned on Saturday’. This was the opposite. What you’re doing is making sure that we can release anytime the markets are open. That actually makes easier because we’re around to fix the problems. So it’s it’s better, but it also means we’ve massively reduced the fear we used to have. So the fear comes from the same the same source but it manifests very differently.
Jeffrey: That’s right. I don’t want to claim that we have everything figured out. We do have some releases that have involved a database migration, say where we system needs to be off-line. We do those on the weekend during the during the quiet period. However, that’s the minority, tiny minority of the releases. And it’s also an area that we think, you know, we want to tackle. Like, this is for us one of the last remaining elements and it gosh would be nice to not have to worry about the sort of special case. And it’s a really good example of how your idea of what’s normal changes.
Jeffrey: So having had a time where, you know, releasing every two weeks on a on a Saturday was normal and seemed even kind of good compared to a lot of other people when we said, well, how could we be better look out in the world? Are there examples of people who are doing something better that we would value? We said, yeah, there’s work we can do and then we did the work to do it. And now no one would want to go back to that sort of two week cadence. The ability to do releases when we want to are something we really appreciate. And actually now people always want to say, well, how could we make this better? Can we make us faster? Could we make it more streamlined? Could we make it safer? And it’s a it’s a great sort of virtuous cycle that having taken a step down this path, we can see how to make it even better.
Squirrel: There we go. OK. Well, if listeners disagree with us and think releasing on Friday is a bad idea. We always like to hear from you. And if you agree with us, but you’re having trouble implementing or you have some questions, we’d sure love to hear from you as well. You can always find us on conversationaltransformation.com. And you can also find out about our book coming in May. Preorder. Join the mailing list. Other good stuff like that. And of course, we like it when you click the subscribe button and therefore you hear from us every Wednesday because we like talking to you and hearing about your experiences. Don’t forget, you can always improve your attitude to fear. Try to expose yourself to things you’re afraid of. That’s, I think, our message today. Super.
Squirrel: Thanks, Jeffrey.
Jeffrey: Thanks, Squirrel.