Excellence Is a Habit
“We are
what we repeatedly do. Excellence, then, is not an act, but a
habit.”
-- Historian Will Durant, simplifying part of Aristotle’s
philosophy.
As I write these words, the crew of Artemis II has returned safely and successfully to Earth, after being the first humans to have reached the vicinity of the Moon in over 50 years. It is also the 56th anniversary of the launch of Apollo 13, a mission known not only for the catastrophic events on the way to the Moon, but even more so for the Herculean efforts made to eventually return the astronauts successfully back home.
![]() |
| Artemis II landing, April 11th 2026 (NASA) |
NASA’s foray to the Moon in the 60s and early 70s is the story of a long
journey made step-by-step. The simple one-man Mercury spacecraft paved the way
to the two-man Gemini craft which was a stepping stone to the twin Apollo space
capsule and Moon lander.
Between May of 1961 and April of 1970, NASA launched twenty five manned missions,
a cadence with an average of around 4-5 months between missions, peaking in 1965-66
with only 1-2 months between missions.
NASA astronauts, engineers, and managers were part of a well-oiled machine which not only achieved President Kennedy’s goal of landing a man (and returning him safely to Earth) on schedule but did it successfully five more times. The machine had been tested under stress in practice and was therefore able to save the ill-fated Apollo 13, snatching a heroic story of the triumph of achievement out of what could well have been a tragedy.
Many of today’s concepts of software development and resilience are built around the same idea that the more you practice a process, the better you can become with it. That’s why DevOps continuous delivery pipelines automate repeatable deployment tasks so that delivering a new feature to clients is just business as usual on a normal day instead of a nail-biting experience, fraught with surprises. This is also why Infrastructure-as-code is such a compelling concept, because it enables both repetition and flexibility.
Even when things are going perfectly, we know that we need to practice dealing with problems through Chaos Engineering, Disaster Recovery exercises, Game Days for high severity incidents and more. A muscle that isn’t exercised is a muscle which atrophies and cannot deal with an emergency when the time comes. Clients I’ve worked with have almost always found that something flips in the wrong direction during a DR test (yes, it’s most likely to be the DNS, but it’s also been hard coded connection strings, missing credentials, or shared storage that wasn’t quite as shared as everyone thought) – unless the test is done often enough for issues to be ironed out faster than new ones occur.
Each and every one of NASA’s flights, even the most successful, had many troublesome issues. Some remained “mere” anomalies – unexpected occurrences which needed to be investigated, while others were actual problems which needed solutions in real time to resolve.
It’s a testament to the decades of retained institutional learning, simulations, and test infrastructure, of the last half century that NASA was able to replicate so much of the first half of the Mercury-Gemini-Apollo programs with only two flights of Artemis, the unmanned Artemis I in 2022 and the round-the-Moon Artemis II in 2026.
While the Artemis II mission was a resounding success, two small issues stood out for me as direct lessons which can be taken for traditional software resilience and reliability.
In the final hour of the countdown, Engineers investigated a sensor on the launch abort system’s attitude control motor controller battery that showed a higher temperature than would be expected. It was deemed to be an instrumentation issue (i.e. the battery's temperature was fine and only the measurement incorrect) and did not affect the launch. This is a lesson that we need to instrument as much as possible, but still understand the context of every message, cross referencing with other signals to understand what is the haystack and what is the needle in the mountains of information we receive from our observability systems.
The other issue was, of course, the failure of the space toilet (somehow, sanitation and hygiene have always grabbed the public’s attention in space flight stories). Here the lesson is that of making sure to limit single-points-of-failure in our systems and be ready with other solutions if repairing the primary system fails. In the case of Artemis II, the astronauts, together with ground-based engineers, performed the necessary troubleshooting, planning and repair of the toilet. This meant that the astronauts could use the state-of-the-art solution and not the… less savory backup bag-based solution.
In the same way, here on Earth, when we have a system failure, we might activate a temporary backup solution which would mean our clients have a gracefully degraded experience (perhaps a bit slower, perhaps specific features are unavailable), but in general can continue using the systems we deliver to them. Degraded does not mean broken – it means we’ve snatched some measure of success out of a failure.
![]() |
| Artemis 2, during pre-flight testing (NASA) |
And like NASA, the more we repeat and practice our problem solving, the faster we’ll solve real problems when they inevitably show up.
“We are what we repeatedly do. Excellence, then, is not an act, but a habit.”
In closing,
be like NASA on the way back to the Moon.
The story
of Apollo, Artemis, and every successful system in between is not one of
perfection, but of continuous preparation. Progress happens not through
flawless execution, but through repetition, simulation, and the humility to know
that something will go wrong. Whether you’re flying humans around the Moon or
running software for millions of users (or any number), the lesson is the same:
resilience is built long before it’s needed.
Excellence is not what we do when everything works. It is what remains when it doesn’t, because we’ve practiced for exactly that moment.
If you’d like to learn more about how I help my clients achieve all this, please feel free to reach out.
The coming week is the anniversary week of Apollo 13 and the next few articles will go over the Apollo mission and discuss the lessons from that flight which are all too relevant even today.


Comments
Post a Comment