Getting better at CI

I recently felt the urge to experiment with my TDD workflow and improve it. It had too many manual steps, like running the tests, starting a commit, writing a commit message, pulling changes, and pushing it. It felt boring and wasteful. I want to automate this stuff and eliminate all the waste.

We’re not aiming high enough with the continuous part in CI/CD.
“Integrate at least daily” … Come on!
“Hourly” … We can do better than this.
“Short-lived Feature Branches” … You got to be kidding. It’s rather “short-lived lies”.
None of this is continuous. We need to get better and decrease the risk even further. I want to integrate actually continuously.

My inspiration comes mainly from the ideas of continuous integration, continuous testing, TCR, and limbo on the cheap.

Actually continuously

I came up with a way that drastically increased my commit frequency. I managed to create 63 commits in just 25 minutes practicing this way, where I peaked at 6 commits per minute. Yes, it was just a kata, but that’s not the important part. On many occasions literally every keystroke went live, and it was all working - covered by tests.

What I did is based on the following requirements:

  • No manual saving. The code saves itself automatically.
  • No manual test running. The tests run continuously. They restart automatically as soon as the code changes. And they are fast.
  • No manual commits. The code is committed automatically whenever the tests pass.
  • No manual pulling. Changes are pulled automatically before the tests run.
  • No manual pushing. Every commit is automatically pushed right away.

Not that hard to achieve actually. Just need proper tooling and a little bit of scripting. The language I’m trying this with is Java.

The right tools for the job

Since I use IntelliJ, which is a great, maybe the best IDE (*cough* it became a little buggy as of recent *cough*), it saves my code automatically. So that problem is already solved. For the continuous running of the tests, I know a few options.
IntelliJ offers a way to trigger the tests automatically, but it’s rather slow. Then there is this old-school plugin infinitest, but I need something for the CLI so I am able to script it. And it should be really fast. Incremental compilation would be key.
Gradle has it, and it is quite fast. With Gradle, I could also make the test report pretty using this test-logger plugin.

Nice test report with the gradle test logger

I like that!

Another option would be the quarkus continuous test runner, which is fairly new, and its test report is so ugly. Also, I have no idea how to customize, or script it. So I am going with Gradle for now.

On top of the incremental compilation, Gradle ships with a continuous test runner in:

> gradle -t test

But since I need more control I chose to use a little helper tool called watchexec instead. It watches for file changes and then executes a command. Like this:

> watchexec -e java ./gradlew test

I made some tests and it is just as fast as the Gradle continuous test runner. If you’d like even more control and a more complicated script, you could also use inotifywait. However, I like to keep it as simple as possible.

Next, I needed to commit as soon as the tests pass. A simple bash script would do, but I like to use a modern task runner for the job. I settled with just. In case you did not know, it is a modern version of Make. And it is written in rust for whoever that may concern. The way this works is you just create a justfile and specify your tasks in it. The commit command looks like this.

    @git add . 
    -@git commit -am "wip"

Simple and concise. To execute it you run:

> just commit

And I do not even have to remember. My shell courteously suggests to me the available commands.
The @ means that the command is not printed.
By default just would abort on an error exit code, whereas the - tells it to ignore that and continue. I need this for the case where the tests pass, but I have not changed anything. For example when I add something and delete it right away. Then there would be nothing to commit, and git would throw the error, having just abort the task. I could also allow the empty commit using the allow-empty flag. But why allow an empty commit for no reason!? That would be inventory, wasteful.

So here are the first commands in a justfile.

    @git add . 
    -@git commit -am "wip"
    @./gradlew test

    just test 
    just commit

    watchexec -e java just test-commit

Nice! I like it.

First Try

Now we are able to give it a try. Small steps.

  • Run > just tdd-commit in a terminal that stays visible next to the IDE.
  • Write a very simple failing test; Tests fail. -> The test failure is shown in the terminal immediately.
  • Make the test pass. -> Nicely formatted test report is shown immediately. Changes commit automatically.
  • Rename the test method; Tests pass. -> Changes commit automatically.
  • Add another test; Tests fail. -> I am shown the failure report.
  • In a first attempt to make it pass I notice I cannot make it as easily. I need a preparatory refactoring first.
  • So I disable the failing test; Tests pass again. -> Changes commit automatically.
  • I do the preparatory refactoring; Tests pass. -> Changes commit automatically.
  • I enable the failing test again; Tests fail. -> I am shown the failure report.
  • Make it pass this time; Tests pass. -> Changes commit automatically.

Wow, this felt smooth. During all of this, I had not manually done a single save, test run, or commit. The terminal was open on the right-hand side of my screen, and I got that feedback immediately, continuously.

A meaningless commit history

Let’s take a look at the resulting git history.

  • wip
  • wip
  • wip
  • wip
  • wip

Whoops, that’s not very expressive. But the commits are so small and pleasant to review. It’s like a playback of every little step that was taken. It is the actual history. Honestly, I think the flow might have more value than the documentation. Still, I would like to improve on that by sneaking in descriptive empty commits every once in a while.

We are used to writing git messages that describe what we did after we did it. But for this, I would like to propose a different way. I want to use a commit message to describe what is next. In other words: What my current goal is. So let’s add a command to create such descriptive empty commits.

goal +MESSAGE:
    git commit --allow-empty -m "Goal: {{MESSAGE}}"

Every time I start working on a new goal, I want to write it to my git history first. Something like > just goal make rover turn left. The other commit messages would stay ‘wip’ commits and that’s fine. One idea would be to use further tooling to decode some of the refactoring commits. For example: The refactoringinsight plugin.

My commit history would then look something like this:

  • wip
  • wip
  • wip
  • Goal: make rover turn left
  • wip
  • wip
  • wip
  • wip
  • Goal: make rover turn right

But what about Integration?

Lots of small commits on my computer are nice. But if I work in a team I need to integrate my changes to the mainline, too. So I want to pull before I run my tests, and I want to push after each commit. It’s called Continuous Integration for a reason, right?

Let’s add that to the justfile.

    git pull --rebase
    just test    
    just commit  
    git push

And I think we’re done.

This is the complete file. It also contains a TCR task.

goal +MESSAGE:
    git pull --rebase
    git commit --allow-empty -m "Goal: {{MESSAGE}}"
    git push

    @git add . 
    -@git commit -am "wip"

    @./gradlew test
    git pull --rebase
    just test 
    just commit
    git push
    watchexec -e java just test

    watchexec -e java just integrate

    @git reset --hard &> /dev/null
    @git clean -df &> /dev/null
    @echo -e "\033[0;31m=== REVERTED ==="
    @just test && just commit || just revert


Notice how I renamed the tdd-commit task to ci. Continuous Integration is not only about what the build server is doing, it is primarily about what we do.

Coding with this script feels super smooth. It is actually continuous.

Also, it was not that complicated to set up. You can probably do even better.

Imagine remote pair- or mob programming with this. Hand-overs could not be easier. You just exchange the screen-sharing while the tests pass. And that’s it.

Probably some people are already working this way? Let me know!

to the comments

Running a Company Coding Dojo

Two years ago, in 2019, I ran the first Coding Dojo at my company EBCONT. Corona wasn’t a thing back then, so it happened to be an offline event that lasted 4 hours on a Thursday afternoon. All the brilliant people came together to practice programming and enjoyed it. How awesome is that?

Local Coding Dojo at EBCONT

A lot has changed since then. When the pandemic happened, I had to adapt and move it online. So I decided to make it a regular remote whole-day event. Luckily I had already collected plenty of experience from my friends at the Vienna Software Crafts Community where we also run Coderetreats.

The EBCONT Coding Dojo turned out to be a small success story. When I started it, I had approximately ten people join, many of which became regular attendees. But lately, there was a small hype around the event. People liked it so much that they came up with the idea to create a Coding Dojo T-Shirt and shoot a group photo with it. Of course, we did so. I had 25 employees join in the latest event, and it was a lot of fun.

Local Coding Dojo at EBCONT

Again, take a closer look at these sweet t-shirts! :-)

EBCONT Coding Dojo T-Shirts

Coding Dojo???

So what is a Coding Dojo, how do I do it, and what can you take away from my experience? The term Dojo originates in Japan and stands for a training facility where they perform katas, choreographed patterns of martial art movements designed for practice.

A Coding Dojo is a similar thing. We just perform coding katas. It is a great opportunity to practice technical skills like TDD, design, and refactoring. Invaluable fundamentals that are mostly not taught in school. They are left behind the things that turn out to have higher demand: Frameworks and tools.

Under pressure, we naturally fall back to old and maybe poor habits, even though it leads to worse results. So we take our time to practice and get comfortable with better programming techniques and to be confident to apply them when it counts. While the traditional Coding Dojo is a short ~2-hour event, the one I do lasts almost a whole day.

The Coding Dojo creates space for developers to practice the fundamentals of programming, away from the pressure of getting things done.

But why not a full day?

An intense full-day practice event can become tiring in my experience. Closing just a little bit sooner leaves everybody more energy for the final retrospective and the evening after the event. There are similar whole-day events in the name of Coderetreat and Mobretreat. The classical Coderetreat has the notion of throwing away your code after a timeboxed session and starting from scratch with a different pair. So you don’t finish, but get to see many different perspectives within a short time - quite intense. That’s a little different from what I do. I like to provide the participants the opportunity to dive deeper into a kata. So we stay within the same teams and kata throughout the day. This reduces the relative amount of setup time, which allows us to get more coding done.

The Dojo plan

I decide on the topic and kata in advance, taking the attendees’ skill levels into account. Before the event, I send them information about what we’ll be working on. The Dojo starts with a short welcome where we have a few minutes of small talk before the intro session begins. In the intro session, we discuss the topic and kata. I like to walk over a minimum of the theory that I believe everybody should know. One effective way to do this is to keep asking questions so they provide the answers themselves. At the end of the intro session, we form the teams whose be working together in the following coding sessions.

Balancing teams

I want less experienced people to learn from more experienced people. But that doesn’t mean more experienced ones won’t learn. They deepen their understanding as they communicate their ideas. Curious participants may even challenge their thinking and help them to refresh or even reset. So I like to balance the skill levels among the teams while also taking into account their desired programming language. The tool I use Gather-Town helps me with that.

Gather-Town is a remote video conference tool that works like a multiplayer version of Zelda. You can move around on a 2d map and talk to people in your vicinity. It creates that feeling of meeting somebody at a conference in the hallway again, just online. And it allows us to split up and go to different rooms. Also, it gives me the freedom to customize the map. I use an altered version of a map that Christian Haas once made for Viennas Global Day of Coderetreat.

I ask the people to assess their abilities and to take a position in the room that matches their confidence to work on the given kata. Standing on the right end of the room means “very confident” and the left end means “I’m lost”. The rest of the spectrum is in between. This creates an overview that makes it fairly easy to form balanced teams. In my opinion, the optimal team size is 3, but 4 works too. For bigger teams, you might want to assign a designated facilitator.

Tip: Assign a facilitator for a team

It is not possible for me as a single person to facilitate the programming in every team. Especially when the people are not used to collaborative coding it makes sense to assign a designated facilitator. The responsibility of this person is to provide just enough guidance for the team to work well together. This can be anyone who knows a bit of mob programming. The role can be rotated so that everybody gets the chance to contribute. When the team is small and most people are already used to this, a facilitator might not be necessary.

Coding sessions & retrospectives

Typically we manage to have three coding sessions, each of which is followed by a short retrospective. In these retros, we discuss anything interesting. How we feel, what we discovered and learned, and how we approached the exercise. We might also share and review the code we had written so far.

The first session always includes setup, which is the time spent until the team starts coding together in some way. Somehow, setup always takes a fair amount of time - regardless of how well prepared you are. The goal is to minimize this time and maximize coding time. There are lots of ways to code together quickly, and you probably already have some in mind. However, I would like to share some ways with you I found work well.

#1 Single Driver Mode

A way to get to code quickly is to have just a single person who already has a setup prepared to share their screen. The downside of this is the risk of other people falling behind due to inactivity. The avoid that, the person sharing should behave as a passive driver while the other people make decisions and rotate the navigator role. I prefer when everybody gets to drive, but this usually takes more setup time.

#2 Cyber-Dojo works well for TDD katas, as it allows you to create and share a browser-based setup for any language in no time. However, it won’t provide you with all the conveniences your IDE does. Things like continuous compilation, autocompletion, automated refactoring, and so on are not available.

#3 Virtual dev environment

Another way to get to code together quickly is to join the same virtual development environment. It could be a virtual machine running in the cloud where everything is already set up. People would connect to it through some remote desktop software. I prepared something like this, where I can spawn an immutable Linux dev system on Azure: Remdev on Azure

A typical schedule

This is what my typical schedule would look like:

  • 08:50 - 09:10 - Welcome
  • 09:10 - 09:40 - Theory, Details
  • 09:40 - 10:50 - First Coding Session
  • 10:50 - 11:00 - Short Break
  • 11:00 - 12:30 - Second Coding Session
  • 12:30 - 13:30 - Lunch Break
  • 13:30 - 14:50 - Third Coding Session
  • 14:50 - 15:30 - Retro

Coding sessions already include 10-20 minutes of retro time. While it’s not a big deal to be a little late in the schedule, I want to nail the 1-hour lunch break. This allows people to plan and spend that time with their families.

My role as a facilitator

As a facilitator, I am not there to actively perform katas. Instead, my job is to make sure that every participant gets the chance to learn and practice. So I am merely the organizer and enabler. I watch out that we keep the schedule (which I’m terrible at) and mostly try to get out of the peoples’ way. Also, I’m there to help the participants when they get stuck or have questions. But this doesn’t mean I’m not learning. Quite the opposite is true. I learn a lot as I get to see amazing ideas, experience new tech, observe sociotechnical patterns, discover and rediscover non-obvious details.

During the coding sessions, I switch from team to team and observe what they’re up to. This works well with Gather-Town as I can literally walk from room to room. Occasionally I see things I am concerned with and bring that up. I try to do this by asking questions, sparking their creativity, and having them come up with their own solutions. Or I may see something interesting, for example, a pattern emerge that I find worthy of a discussion, so everybody understands. If you want to learn more about facilitation in this regard I recommend Peters Coderetreat-Facilitation Podcast. Many of the things I’m doing are things I learned from him.

Choosing a kata/topic

The Coding Dojo should be a place to practice the fundamentals. The perfect kata is not too hard for the attendees to tackle, is small enough to finish within the event, and is one that you as a facilitator already know well. But it doesn’t have to be perfect. As a refactoring exercise, I like the Expense Report Kata which is nice and small. Or the Order Dispatch Kata which is about Tell don’t ask. As for TDD katas I liked Snake, Game of Life, or Mars Rover. But I also did completely different things. For example, the Elephant Carpaccio Exercise which is about vertical story slicing and iteration.

Selling your Coding Dojo

If you want to start a Coding Dojo at your company, let me tell you that I think that’s awesome! I’d recommend getting your boss to agree that it will happen during the work-day and that it will be considered work-time. Missing know-how is a bottleneck in our industry, where the majority have less than five years of experience. Fresh developers have to learn so many things about their tech, tools, and frameworks these days, that there is little room left for programming fundamentals like TDD and refactoring. Some of those are mostly not taught in school either. When people get to practice these, they become better programmers. They get better at writing code that works, is more maintainable, and more secure, in less time. What boss wouldn’t want that?


The Coding Dojo is a great and fun way to provide developers with the space they need to get better at their job. People enjoy learning from one another in a relaxed environment like this. I’m proud of the progress participants have made so far at my Dojo. Feel free to contact me if you have questions, or if you would like to start a similar event. And if you already have something like this at your company, I would love to hear about that, too.

to the comments

TDD Crash Course from the BACK of the Room

I was recently giving a 2 hour TDD crash course remotely for a group of 5 people, and I find it worked out wonderfully! So I would like to share with you how I did it.

If you were searching for TDD guidance, this is not it! It is rather a guide on how to run a TDD crash course.

I recently read the book Training from the BACK of the Room!, which resonated with me, and it inspired how I ran the course. The book is highly innovative and turns traditional training upside down. The emphasis is on learners being active and talking more during the training instead of the teacher.

The Training Plan


The goal of this training is for participants to understand TDD and be able to practice the red-green-refactor cycle themselves. It is not a goal of this training to make TDD pros that can test drive their whole projects. While TDD is easy to start with, it is also hard to master. To get better, learners will need much more hands-on practice after the training. The course should provide participants with a smooth start on their learning journey.


I did it remotely. However, I don’t see a reason why it would not work in the same way locally.

Group Size

5-6 Participants.


Knowing the principles I used for this training will help you understand the reasoning behind its design.

  • Just show up. Coding sessions often require technical preparations for participants in advance. When the training starts, you somehow lose those 25 minutes to fix the issues every time. Two hours is not enough time for having this kind of technical troubleshooting. So for this training, there is no preparation for participants needed. All participants have to do is to show up.

  • Focus on the need to knows. TDD is a broad topic, but the essentials are few. Teaching everything from history to styles, test doubles, and so on would merely confuse the learners. So in this training, we focus only on the essentials.

  • Learning by doing. The training will have the participants experience TDD in practice, which is very important. We could explain what a baby step is and what the value of a fast test suite is. Still, learners won’t understand unless they experience it themselves.

  • Have learners talk the most. In traditional training, the trainer talks more than 70% of the time, which doesn’t help learners learn. Participants learn much more effectively when they are the ones talking. So this training aims at maximizing the amount of time that learners talk instead of the trainer.

  • Keep everybody engaged from start to finish. No participant should be listening passively for more than 10 minutes at a time. We want to keep them engaged to get the most out of their training.

  • The 10-minute rule. The 10-minute rule helps us to optimize for the approximate attention span of people. TV has conditioned us to receive information in small segments of ~10 minutes in length. After 10 to 20 minutes, learning begins to diminish. So we want to avoid dry instruction that lasts longer than that.

  • Psychological safety. Create an environment where participants feel comfortable to express their opinions without the fear of being wrong. We don’t want them to be afraid of making mistakes, so we don’t punish those. Instead, reward every form of contribution from the very beginning. Whatever learners have to say: Unless it’s not abusive, it’s not wrong - it’s interesting.


Two hours is not a lot of time. If the group already knows each other, we want to jump right into the topic. When that’s not the case, give them at least the opportunity to introduce themselves in a minute or two. I like to use one of the following Start-Up activities.

Start-Up Activity: Web Hunt (20-30 minutes)

Start with a “Web Hunt” activity where learners have 10 minutes to search the web and find three facts and come up with one question they have about TDD. Prepare a virtual board where the learners can put and share their findings.

Then, take another ~10 minutes to review the facts and questions they had put on the board. Have the participants present them, and try to stay out of the discussion as much as possible. When you are not satisfied with one of the facts, ask the other participants what they think about it. Try to have the learners answer all the questions on the board. If you have some great answers that you can back up with quality content such as links to blog posts, articles, talks, or books - that’s awesome. Add those in the end.

Keep in mind that it’s not about us (the trainers). It’s about the learners.

Alternative Start-Up Activity: What do you already know? (20-30 minutes)

Give learners 10 minutes to think of three facts they already know about TDD and have them put those on a virtual board for everybody to see. Then, take another ~10 minutes to have participants present the facts they had put on the board. When you are not satisfied with one of the facts, ask the other participants what they think about it. This activity connects learners to the things they already know. Typically, developers have already heard at least something about the topic. When they connect to these things first, it will help them evaluate what they had learned in the training.

Theory (10 minutes)

After that comes the only part of the training that is dry instruction. Take ten quick minutes to explain the essentials of the TDD workflow. The three rules of TDD provide a good start, but you probably want to explain the whole workflow. This wiki page gives a nice overview of all the steps involved.

Short Break (10 minutes)

At this point, we are typically 30-40 minutes into the training, and it’s an opportunity to have a 10-minute break. After that, continue with the practical coding part.

Practical FizzBuzz (60 minutes)

The kata I choose for this exercise is FizzBuzz, as it is pretty simple and can be completed within the available time. It should help with creating the feeling of having accomplished something which makes the learning stick longer. Also, we don’t want to confuse learners with a design challenge. That’s not the focus here. The focus is on the TDD workflow and the thought process and decision-making behind it. A bit of sugar on top is the opportunity to use a parameterized test, which learners often find interesting. The kata is being worked on in a special mob where everybody is assigned a specific role.

Roles: Red / Green / Blue / Navigator

As we like to keep all participants engaged, we assign each a responsibility that requires them to stay focused. Choose three people and assign them one of these referee roles:

  • Red Referee: This role is responsible to make sure we watch each test fail and that the error presented is useful and expressive.
  • Green Referee: Watches out that we only write the simplest code to fulfill the test, but not the line of code we know we’d need to write.
  • Refactor Referee: Makes sure we always refactor in the green and only in the green.

The other participants are navigating collaboratively. Take a look at strong style pairing to understand the Driver/Navigator relationship.

After half-time, ask your participants whether they would like to rotate their roles.

The Trainer is the Driver

The trainer is the Driver/Typist. Writing down test cases is another important exercise for learners, but it is not the focus of this training. The focus of this training is to have learners grok the workflow of TDD. To learn the decisions we make, when we have tests drive our design in tiny steps. And to achieve that, we would like to remove all other impediments. So the trainer plays the smart input device that makes it easy for the learners to write the tests they want. As a driver, the trainer is also able to step in and take control if needed.

The trainer shares their screen, test setup prepared with the FizzBuzz requirements as a comment on the dummy test, font size increased, and test result visible. Remember, the goal of the trainer is to stay in the background as much as possible. She might chime in to get things going but mostly asks the right questions and delegates control to the participants.

As a trainer, you might say: “I only type when you navigators tell me to.”, or: “What would be an even simpler test case to start with?” When you see something you are not satisfied with, play the ball to the responsible referee: “Green referee, what do you have to say about this?”, “Refactor referee, Is it okay that we do this refactoring now?”

Instruct participants to have their mics on! Sometimes people turn their mics off when they are in video calls which could be harmful in this training. When everybody is starring at the code, we won’t notice when somebody starts talking with their mic off.

Retrospective (10-20 minutes)

Find out what the participants have learned that they hadn’t known before. How did they feel doing FizzBuzz using TDD? Was there anything they didn’t like? Ask the participants whether they would want to apply it in their real projects and how. They are more likely to do so, when they commit to it publicly.


It’s astonishing how much you can still teach after getting out of the way. Of course, the crash course is just the start for the learners. It will provide them with the prerequisites to have more hands-on practice. After the training, they should feel more comfortable joining a dojo/code retreat.

Did you like the training design? Which parts did you not like? How are you teaching TDD? Leave me a comment.

to the comments

Why Test?

It’s 2021, yet developers writing automated tests don’t seem to be the norm to this day. The belief that the writing of tests is just an additional effort that increases development cost is still going strong. Of course, it’s wrong.

Yes, the learning curve is steep, and yes, there’s a lot of things to get wrong and to suffer from. Proper developer testing like TDD is a broad topic and demands deep knowledge of design and refactoring. None of which seem to be taught in higher technical schools that much either. Humans have been developing software without writing tests for decades, so why bother?

Well, I write tests to my own advantage. I do it so that I know what I’m doing, early and often. It helps me find more joy at work and has an overall positive impact on how I feel. Maybe also because I’m a little lazy.

Hopefully, this little diagram will help explain how this works out for me.

How good developer tests are advantageous

to the comments

Peeling an Onion

In one of the recent Coderetreats, we did an Outside-In TDD session. I paired with a guy who was new to this, and I noticed a challenge in expressing my ideas well. Honestly, I don’t think I did a good job, so I decided to write about this topic.

A Software System

Suppose we’re developing a thing, a program.

It will inevitably become a hierarchical system composed of collaborators that form the sub- and sub-sub systems. Each of those will solve yet another problem, and together they will form our thing. The outer layers will be more concerned with infrastructure and coordination, whereas the inner parts will be concerned with the business logic, the domain.

Outside vs Inside

When we design the thing we can start on either side, the outside or the inside.

The Outside

The outside is where the users of the thing are. Users might not just be humans, but also other software systems. They observe the thing from the outside, and they cannot see inside of it. But they don’t even care about its inside. All they’re interested in is what it does, and how to interact with it.

So to them, it is a black box.

The Inside

The inside contains the hidden details of the thing - its gear, its structure. It represents all the subproblems the thing was broken into. The inside answers the question, how the thing accomplishes what it was designed for.

Inside-Out Design

In Inside-Out Design we start at the inside and gradually ascend outwards. So we first break the problem down into smaller subproblems and define how they interact with each other. In doing so we identify the most inner pieces, the domain of the system. Doing Inside-Out, they are exactly where we want to start. After all, they will be the foundation for the remainder of the system. They are the collaborators we use as soon as we ascend to build the next higher layer. As we ascend further and further outside we will at one point arrive at the outermost layers. They form the interface of our system, the entry point users may interact with. As we build those last, they will be guided by the structure and behavior of the subsystems that have already been built. So they will be biased towards the early decisions we made when we first designed the domain. I think that a good example of an API that is biased towards its domain is the CLI of git. You see that in the sophisticated helper tools, scripts and aliases that attempt to make it more accessible.

ⓘ Inside-Out is domain-centric. Can cause the Interface to be biased towards early domain decisions.


  • We cannot know what the domain will look like in advance, it is shaped by how users will want to use the system.

  • A bias towards the domain makes the interface more complicated.

  • To preserve a sound Interface, we might have to make ugly adjustments in the layers above the domain.

  • Thinking about usage last will cause us to build features nobody will ever use. (YAGNI1)

Outside-In Design

When we start from the outside we don’t care about the domain at first. Instead, we focus on the users and how they would want to use the thing. So we imagine the thing as a black box while defining its interface as simple and practical as possible. It will be doing what it should do, but we will care about that later.

Once the interface is defined we descend inwards, thinking about how we can bring this entry point to life. Now we have to decide what collaborators will be needed, and what their responsibilities will be. So from this perspective, the entry point we just defined is now the new user. Again, we’re treating its collaborators as black boxes. And again, at first, we only care about what they do, but not how.

We descend further until we arrive at the core, the domain.

As a result, the built system will be biased towards the anticipated usage of the thing.

ⓘ Outside-In is user-centric. The implemented solution might be biased towards the anticipated usage.


  • We’re bad at predicting how users will want to use the system.
  • A bias towards usage makes the domain unnecessarily complicated.
  • Thinking about usage first will help us avoid building stuff we don’t need. (YAGNI1)

Descending in Outside-In TDD

When we test drive the thing Outside-In, we may start with an acceptance test as in double loop TDD. It describes the thing and its interface in a simple example: How it is used, and what it does. Of course, the test does not work, as there is no thing yet. We can now keep the test red until it passes, or we just disable it. But our goal is to make it pass. So we write another - this time a unit test, to guide us towards writing the code that will make the acceptance test pass.

And this is already the first of three descending strategies which I call: “Skip and Descend”. The other two are “Fake it till you make it”, and “Replace with Test Double”. But when we build a full slice from the entry point all the way down to the domain, we mostly don’t use just one strategy, but a combination of these. Every time we descend we have to make another judgment call about which strategy fits best this time.

Skip and Descend

In Skip and Descend we use a test to drive the decision which immediate collaborators will be needed to suffice the test. But we acknowledge the fact that implementing those collaborators on the basis of this test would be too big of a step. So we disable the test and descend to start test driving the just defined collaborator. Sometimes we may rinse and repeat until we arrive at a leaf whose unit is small enough to be implemented. After implementing the leaf we would ascend again to the previously disabled test where we would then use the collaborator we just built. Kind of like mikado.

Leads to sociable unit tests and test overlap2. Where test overlap happens we aim to minimize it and use the sociable unit tests to cover integrations only.

Use when

  • Confident in the need of the collaborator.
  • The sociable unit test will be fast:
    • The collaborator is doing in-memory instructions that finish within milliseconds.
    • The collaborator is going to be a fake that will be replaced by a real system later.
    • The collaborator is inside the application boundary.
    • The collaborator is not interacting with an expensive system such as a database or a web service.
  • The call to the collaborator is not a notification3.


  • Avoids test doubles4, and as such decouples the tests from their implementation to enable refactoring5.


  • Need to manage disabled tests.
  • Can lead to premature collaborators.

Fake it till you make it

In fake it till you make it we don’t necessarily decide on a collaborator and descend. Instead, we write the simplest and stupidest expression that will make the current test pass. We then write more tests to force us to change the stupid and specific expression into something more generic. We might have to apply preparatory refactorings in the process. With those, we place seeds as we extract new collaborators that grow while we write and pass more tests. May also lead to sociable unit tests and test overlap.

Use when

  • Unsure which collaborators to create at first.
  • The SUT (System Under Test) remains inside the application boundary.
  • The SUT is not interacting with an expensive system such as a database or a web service.
  • The call to the collaborator is not a notification.


  • Avoids test doubles4, and as such decouples the tests from their implementation to enable refactoring5.
  • Collaborators emerge out of triangulation and are therefore more mature.


  • Testing subcollaborators from a distance.

Replace with Test Double4

When we are confident in the need of a collaborator, we may decide on replacing it with a test double. This allows us to finish the implementation of the current SUT before having to descend.

Use when

  • Confident in the need of the collaborator.
  • The collaborator is at the application boundary.
  • The collaborator is at the boundary of the module.
  • The collaborator interacts with an expensive subsystem such as a database or a web service.
  • The call to the collaborator is a notification; we like to use a mock in this case.


  • Avoids test-overlap.
  • Can finish the SUT before having to descend.
  • Allows simulating expensive subsystems such as databases and web services.


  • Couples the structure of the test to the structure of the implementation.
  • Mocks typically less performant than hand-written test doubles.
  • Hand-written test doubles are an additional effort to write.


Mocks are not the only way to descend in Outside-In TDD. There are many strategies, each of which has different trade-offs. So we have to make a judgment call every time. We need to keep an eye on refactorability when writing our tests. Sociable unit tests can improve refactorability, but we have to keep the test-overlap low. So we avoid testing all the details from a distance.

  1. You Aint Gonna Need It  2

  2. Test overlap is when more than one unit tests cover the same thing thus may fail for the same reason. 

  3. A Notification a.k.a ‘fire and forget’ is a type of relationship between objects where one object just notifies another. I first read the Term in the GOOS Book. To test notifications we prefer to use mocks or spies. 

  4. A test touble replaces a real collaborator in a unit test, just as a stunt double replaces the real actor in a scene  2 3

  5. Tests should be structure insensitive to enable refactoring.  2

to the comments