The Unreasonable Effectiveness of Small Optimizations

Oct 14, 2016

The challenge of building large-scale complex systems often gets caught between purist visions that never get off the ground and seemingly pragmatic random-walk tinkering that slowly grinds to a halt via diminishing returns. If you're ambitious -- and I hope you are for the sake of my own future gig-flow -- sooner or later, whether you are an engineer, designer, project manager, CTO, CMO, VP of Sales, or CEO, you'll run into this challenge. How do you break out of this rock-and-a-hard place impasse?

My friend Keith Adams, veteran of several such complex scaling challenges at VMWare, Facebook, and now Slack, pointed out the key insight: what he calls the "unreasonable effectiveness of small optimizations." I am going to paraphrase the version he shared over lunch at the Facebook campus a few years ago and call it Keith's Law: **_In a complex system, the cumulative effect of a large number of small optimizations is externally indistinguishable from a radical leap. _**If you want to do big things in a software-eaten world, it is absolutely crucial that you understand Keith's Law. So let me BIRG in the light of the wisdom of Keith and other great technologists I've had the pleasure of knowing, and unpack it for you. All stupidities and inanities in what follows are mine.

View this email in your browser

The Soul of Praxis: The Unreasonable power of a thousand small optimizations

1/ Let's unpack Keith's Law: In a complex system, the cumulative effect of a large number of small optimizations is externally indistinguishable from a radical leap.

2/ Let's start with the narrower technological implications before getting to complex systems with social and political dimensions.

3/ Many hobbyist tinkerers and even trained engineers can spend decades in technology and never truly understand how large-scale complex systems are different.

4/ By contrast, many non-technical people seem to grasp Keith’s Law even without any hands-on technical skills (though this is frankly rare in my experience).

5/ Dan Luu discussed scale-effects blindness in his excellent recent post, Why's that company so big? I could do that in a weekend.

6/ The question reveals a basic inability to switch between weed-level and system-level perspectives of complexity, and how the two levels relate.

7/ Every great technologist I’ve met, however, with a track record of big achievements, seems to intuitively grok Keith’s Law. It is quite amazing how consistent this trait is: 100% of my sample set.

8/ To understand why, consider what happens as you expand a system's capability, adding capacity, features, and coverage of increasingly rare corner cases.

9/ As you do this, failure modes become more catastrophic, bugs become more obscure, and improbable things become probable.

10/ You start to fall behind on situation awareness and urgent starts to outrun important. Things slow to a crawl along a 1000-project frontier. Work begins to resemble attrition warfare against chaos.

11/ In companies with mediocre engineering leadership, this can lead to out-of-control reactive fire-fighting, where huge teams are doing a thousand unrelated things.

12/ A growing complex system is like a huge machine with a proliferating number of tuning/tweaking knobs, each staffed by a small, depressed and stressed-out team.

13/ Turning a given knob makes some things better, other things worse. Usually, the good is concentrated locally, the bad is diffused globally.

14/ The depressed losers give up, the sociopaths switch to zero-sum mode, moving the system towards a "privatize gains, socialize losses" regime of operations. The clueless stir in extra chaos.

15/ The knobs also interact to drive emergent behavior. If two knobs are turned a certain way at the same time, they could trigger higher-dimensional emergent behavior. The world gets weirder.

16/ Such systems can be thought of as having significant decision-making slop. Small decisions have both local, deterministic effects and global, probabilistic effects.

17/ The global probabilistic effects constitute decision-making slop, which can either power a grand systemic random walk to zemblanity, or accumulating serendipity: the unreasonable effectiveness.

18/ A random walk does not mean a system stays in the same place. It means it will slowly drift. Generally in a “bad” direction, since bad directions outnumber good ones.

19/ But, with the right people in the right places, the random walk can turn into slow steady progress in a chosen direction. What do they do that's so special? They harness Keith's Law.

20/ At smaller scales, one person with god-level visibility into, and comprehension of, the system can keep it all in their head and herd the various knob-turnings/tweakings in a chosen direction.

21/ But in a complex system, where hundreds of people can be doing little things, this does not work. And more communication is not the answer (at least not the whole answer).

22/ As Keith observed once, most humans can at best understand 1-2 levels of abstraction above/below their home zone. Beyond, you rely on things like metaphor and pop sociology.

23/ So architects and leaders being deluged by a firehose of information on ongoing firefights is useless. At some point even the most formidable genius cannot keep up.

24/ OTOH, the naive version of what is sometimes called the “holographic” model, where everybody sort of has the “DNA” of the whole thing in their head or muscle memory, is worse.

25/ If the overwhelmed architect leads the troops on a death march, the supposedly smart "crowd" tears itself apart because shared "DNA" does not equal shared direction.

26/ The way out is people with strong “finger-tip feeling” (which we’ve discussed before), herding the system, which can turn the uncontrolled random walk into controlled, cumulative gains.

27/ They have an intuitive sense of which tactical challenges also have the power to herd the whole system in good directions, towards serendipity. It's not think-global/act local. It's feel-global/act local.

28/ They are able to pick out “herding” knobs whose probabilistic effects have a somewhat predictable direction, and are likely to "socialize gains and limit losses"

29/ This effect is most visible where there is an obvious core piece in the system. In a computer for example, smaller, faster and more memory are all generally good things.

30/ This is not the same as the 80-20 principle. 80% of the gains cannot be deterministically attributed to 20% of the improvements. That’s the “leverage” view of complex systems.

31/ The “leverage” view of leads people to go on futile, quixotic quests for the “one weird trick” that will magically trigger a big leap in a system that is assumed to be complex in a Rube-Goldberg sense.

32/ Architect Indy Johar suggested what might be called the “resonance” view: getting the parts of the complex system to harmonize through loose mutual awareness. This is a probabilistic view.

33/ A system with such resonance is ripe for Keith’s Law to operate. It creates a system-level harmonies in the environment that can shape positive spillover and surplus effects.

34/ In such a resonant system, catalysis, rather than leverage, is key. What things can you do that makes other things increasingly easier, and cumulative in their effects?

35/ This is not a one-shot thing. You have to keep finding new herding knobs as old ones lose potency and new ones become available due to ongoing growth.

36/ The “herding” potential of a given knob is finite, and limited to a range of system behaviors. Leadership is knowing when you've entered the range of a given knob, and when you've exited it.

37/ This is the essence of Schwerpunkt — the ability to repeatedly find the “center of gravity” where effort drives systemic synergy (I previously incorrectly translated Schwerpunkt as “point of the spear”)

38/ As I observed earlier, strategy is a pattern of interpretation of all actions, not a specific set of actions. The specific "herding knobs" that matter can change.

39/ Today it might be adding a virtualization layer. Tomorrow it might mean bolting on a type system onto an untyped language. Next year it might mean buying a key supplier.

40/ Whether an action is strategic depends on the role it plays in a story. People who fail to appreciate Keith’s Law generally fail at strategy in one of two ways.

41/ Ideologues tend to have a fixed idea of “strategic”. Such as "strong typing is more strategic" or "compiler people are better than kernel people" (one of Keith's examples). One Big Optimization to rule them all.

42/ Overly practical people with no use for abstractions fail by being blind to probabilistic global effects, ("doerist" types are often purists in pragmatist guise). "System views are bullshit" people.

43/ But if you have even a few people with instinctive ability to find and work herding knobs within their locus of authority, small optimizations being worked elsewhere start to work better and better.

44/ This is why you don't need specific roles or titles to turn Keith's Law to your advantage. There are usually good herding knobs at every locus of authority.

45/ In strong engineering cultures, herding knobs are not secret. They are often global imperatives like “increase transistor density”, “all data must be exposed through service interfaces.”

46/ “Herding knobs” have roughly predictable, cumulative directional effects, but it isn’t magic. It’s using the “resonance” of loose mutual awareness to direct slop systemically.

47/ It takes serious insight to recognize (for instance) that among the million knobs, “transistor density” or “service interfaces” are the ones that all can tweak without tripping over each other.

48/ It is only through 20-20 hindsight that it seems like successful organizations that build large complex systems were twiddling the "obvious" right knobs at the right times.

49/ The unreasonable effectiveness of small optimizations applies to any complex system. If there are 1000 knobs, leadership is knowing which ones herd the others.

50/ Building a large, effective sales team, an efficient political campaign machine, or a grassroots activism campaign — all these endeavors behave this way if they work at all.

51/ One thing that changes as you get to more heterogeneous systems is that the knobs may be pure abstractions such as “safety” which mean different things in different contexts.

52/ If a CEO makes _safety first _the drumbeat of a company, it will mean different things, and highlight different herding knobs, in engineering, sales, and operations.

53/ In sales, it might mean giving up head-to-head feature comparisons against competitors and going after deals by leading with the safety record of the product and the safety culture in the company.

54/ In operations, it might mean analyzing factory flows using critical path and moving-bottleneck tools to minimize accident rate rather than maximize output.

55/ In engineering, it might mean building your product with paranoid focus on safety for both users and (in the case of hardware) factory workers.

56/ In a really effective organization, that is growing as generatively as possible, the entire system can be described by a dynamic hierarchy of “herding knobs” arranged in a cascade of self-organized criticality.

57/ Each such knob directs the slop downstream for maximal effect. Such a system is maximizing open-ended learning in via systemic fat thinking and experiencing autopoiesis.

58/ The funny thing is this: an organization can look dysfunctional from almost every perspective, but if it gets its fat thinking and herding-knob twiddling right, it will still be “unreasonably” successful.

59/ This drives process-focused consultants crazy: companies that are "bad" from a given textbook perspective doing well anyway breaks their entire raison d’être.

60/ Conversely, a company might have a perfect “scorecard” where on paper it is doing everything right by every textbook, earning straight-As, and yet be floundering.

61/ This is why, in my own consulting work, I don't even try to distill a formal, teachable “system” out of my gigs, even though I draw on many systems developed by others.

62/ It is also why I don't like the term "executive coach." It suggests textbook drills and "homework." That only works in functionally limited contexts, and anyway adulthood means you get to set your own homework.

63/ You can train skills to improve local effects of knobs. This is useful, but does nothing to improve the selection and use of herding knobs. Improved knob control can in fact worsen knob selection.

64/ What seems to help develop herding-knob selection instincts is what I call "sparring" with compatible people: No-rules live conversational processing of decisions with high herding potential.

65/ There are two necessary conditions for this to be helpful: a leader needs strong “finger-tip feeling”, and a natural ability to create “loose mutual awareness” culture (like an improv theater actor).

66/ When both conditions are met, the organization can become an intelligent and recursively self-aware extension of the leader’s “fingers” so to speak. Hence "feel global, act local."

67/ The final battle scene in the Ender’s Game movie portrays this in a literally visual-tactile way. Ender's command team and the spaceship fleet are intelligent, semi-autonomous extensions of himself.

68/ Much of the work of a leader in a large, complex system, to the tune of 95%, is highly reactive. The remaining 5% is herding opportunities. They are rare, which means they can't be wasted.

69/ As a leader, you might have as few as 10 minutes in a year where you get to actually lead. The rest is what I call "leadering," which might as well be done by a bot in the future.

70/ But to make those 10 minutes count, you have to keep your finger-tip feeling, and awareness of your extended organizational "body" in top condition, mostly by sparring with trusted friends and peers.

71/ A friend and fellow consultant asked me recently whether I've written up my sparring model. I've been trying lately, but I've concluded there is no general system to learn here.

72/ The job of a sparring partner is to increase the probability that the leader's 10 minutes of actual leadership in a year will end up tapping the unreasonable effectiveness of small optimizations.

73/ It doesn't scale because 80% of what you do is adapt to the strengths and weaknesses of a particular individual. And not everybody can adapt to everybody else.

74/ I have discovered the hard way, for instance, that I spar best with engineering leaders/tech CEOs. I am far less effective with sales leaders, and a complete disaster with marketing leaders. YMMV.

75/ The other 20% is developing a broad portfolio of mental models so you can deploy them in a sparring bout at a moment's notice. That's what it takes to explore "herding knob space" with another person.

76/ Becoming attached to specific mental models simply because you developed them or they were influential in your own development makes you a weak sparring partner.

77/ So if you want to be a good sparring partner for others and benefit from your own, don't get attached to mental models. And you too will harness the unreasonable effectiveness of small optimizations.

_Feel free to forward this newsletter on email and share it via the social media buttons below. You can check out the archives here. First-timers can subscribe to the newsletter here. You can set up a phone call with me via my Clarity.fm profile page. _

Check out the 20 Breaking Smart Season 1 essays for the deeper context behind this newsletter. If you're interested in bringing the Season 1 workshop to your organization, get in touch. You can follow me on Twitter @vgr

Ribbonfarm Studio

The Unreasonable Effectiveness of Small Optimizations