Friday, November 27, 2015

The Manager as Debugger

I have observed that the best engineering managers I know are often also great debuggers. Why would this be? What is it about these two tasks that has such an overlapping skill set?

A great debugger is relentless in their pursuit of the “why” for a bug. This is simple when we are looking for errors in application logic, but we all know that bugs can go many layers deep, particularly in complex systems that involve many separate parts operating over time-delayed networks. A sign of a poor debugger is a person who, when they add a log statement to a piece of concurrent code to attempt to find an error and see that the error can’t be reproduced, assumes that they have therefore fixed the problem. It’s a lazy habit but a common one. Sometimes there are problems that seem impossible to determine, and many people don’t have the patience to dig through layers of code, theirs and others’, log files, system settings, and whatever else is needed to get to the bottom of something that only happened once. I can’t blame them. Obsessive debugging of one-off issues is not always a great use of your time, but it does show a mind that cannot be satisfied with the unknown, especially when that unknown might cause them to be paged at two o’clock in the morning.

What does this have to do with management? Managing teams is a series of complex, black boxes interacting with other complex, black boxes. These black boxes have inputs and outputs that can be observed, but when the outputs aren’t as expected, figuring out why requires trying to open up the black box and see what is going on inside. And, just as sometimes you don’t have the source code, or the source code is in a language you don’t understand, or the log files aren’t readable, the black boxes of teams can resist yielding their inner workings. 

Teams also share the characteristic of another famous box, the one containing Schrödinger’s Cat. The point of that experiment is to show that the act of observing changes the outcome, or rather, causes an outcome to happen. You can’t go into a team and not change the behavior of that team by being around them, sitting in their meetings, watching their standups. Your presence changes the team behavior and may hide the problem you are trying to find, in the same way that a log statement can cause a concurrency issue to be magically erased, at least for some time.

To debug a system properly you have to be able to have a reasonable hypothesis that explains how the system got into the failed state, preferably one that you can reproduce, so that you can fix the bug (or at least prevent it from happening in the future). To debug a team, you also want to attempt to get a hypothesis for why the team was having problems. You want to do this in as minimally-invasive way as possible, to prevent your meddling from obscuring the problems. You have the added challenge of team problems not generally being single failures but more like performance issues, the system is running but it seems to slow down from time to time, the machines are ok except occasionally they crash, people seem happy but attrition is too high. 

Let’s work through an example. We have a team that feels slow. You have heard complaints from their business partners and product manager that they are slow, and you agree that the team just seems to lack the same energy as your other teams. How do we figure this out?

Debugging this deserves the same rigor you would apply to debugging a serious systems issue. When I debug a systems issue, the first thing I look at is generally log files and any other record of system state from the time of the incident. When talking about a team that is not producing work fast enough, look at the records. Look at the team chats and emails, look at the tickets, look at the repository code reviews and checkins. What do you see? Are there production incidents happening that are taking too much time? Are a bunch of people sick? Are they bickering over coding style constantly in their code review comments? Are the tickets that are being written vague, too big, too small? Does the team seem upbeat in their communication style, sharing fun things as well as important work in chat, or are they purely business? Look at their calendars. Is the team spending many hours a week in meetings? Is their manager doing 1-1s? None of these things is necessarily a smoking gun, but they may point to an area to address.

Perhaps everything seems ok in all of these indicators, but the team still just is not performing as well as you believe they should. You know the talent is there, the team is happy, they’re not overburdened by production support. What is happening? Now is the time to start doing some potentially destructive investigations. Sit in their meetings. Are they boring to you? Is the team bored? Who is speaking most of the time? Are there regular meetings with the whole team where the vast majority of the time is spent listening to the manager or product lead talk? Boring meetings are a sign. They may be a sign of inefficient planning on the part of the organizers. There may be too many meetings happening for the information covered. They may be a sign that the team members don’t feel that they can actually help set the direction of the team, choose the work that will happen. Boring meetings are a sure sign of time wasted, if not bigger leadership problems.

Ask the team what their goals are, can they tell you? Do they understand why those are the goals? If they don’t understand the goals of their work, their leaders (manager, tech lead, product manager) are not doing a good job engaging the team in the purpose of the work. In almost every model of motivation, people need to feel an understanding and connection with the purpose of their work. Who are they building these systems for, what is the potential impact on the customer, the business, the team? Did they have any part in deciding these goals, and the projects that they’re doing to achieve them? If not, why not? When you see a team spending all of their time on engineering-sponsored projects and neglecting product/business projects, it’s likely that the team does not appreciate or understand the value of the product/business projects that they are supposed to be working on, and therefore they are lacking in motivation to tackle them.

Finally, you might take a look at the actual team dynamics. Do people like each other? Are they friendly? Do they collaborate on projects or is every person working on something independently? Is there banter in the chat room, in emails? Do they have a good working relationship with other adjacent departments, and with their product managers? These are little things but even very professional groups tend to have a degree of personal connection between the members. A bunch of people who never talk to each other and are always working on independent projects are not really working as a team. There would be nothing wrong with that, if the team were performing well, but given that they are not, this may be contributing to your problem.

Sometimes managers of managers choose to take such problems as something that the manager of the team just needs to fix. You measure the manager on the output of their team, after all, and it is their responsibility to fix it if things are not going well. This is true, but just as I sometimes jump in and help debug complex system outages even though I rarely write code, it is ok to jump in and help debug team issues as you see them, particularly when the manager in question is struggling. It can be an opportunity to teach the manager and help them grow. It can also reveal more foundational problems with the organization, such as a lack of senior business leadership that even the best managers can’t identify or resolve alone. 

The pursuit of why when it comes to organizational problems is the thing that gives you patterns to match on, and lessons to lead with. We get better at debugging by doing it often, and learn which areas tend to break first and which indicators provide the most value for understanding issues. We become better leaders by pushing ourselves and our management teams to really get to the bottom of organizational issues, searching for why so that we can more quickly resolve such issues in the future. Without the drive to understand why, we rely on charm and luck to see us through our management career, we hire and fire based on charm and luck, and we have a huge blindspot when it comes to truly learning from our mistakes.

Friday, November 6, 2015

Truth and Consequences of the Technical Track

About a year ago, Dan McKinley aka McFunley wrote a bit on the technical track. The piece has stuck in the back of my mind for quite a while, and continues to influence certain points I make when discussing career progressions of engineers. Let's get them out.

I think that Dan's take is cynical, but not entirely incorrect. We need managers. As companies grow, at some point, you need people whose job it is to organize other people. I think the idealistic goal of management is to help teams live up to their talent potential. The realistic experience of management is often an abstraction layer between upstream and downstream, to help flow information, each layer helping to make a finer-grained set of decisions and adjustments to keep the overall company moving in the right direction.

Let's talk first about why I personally went into management. I was successfully performing in one of those technical track positions, at a company where I felt pretty confident I would be able to continue to be promoted in the technical track positions up to the highest level if I put the work in and continued to perform well. I was working on interesting things, paid very well, and on paper, everything was perfect. And yet, I quit to move to a smaller company and go into a management position. Why?

I realized pretty clearly that the scope of my ambitions was greater than the scope of my ability to produce things independently. I could not write enough code alone to achieve the kind of impact that I wanted to achieve on the company I was working for. Even with a small team, I felt that my ability to make a difference would be limited. I would have to make extremely good choices as to what to guide the team to do, and my effectiveness would always be limited by what I now realize is actually "product market fit." If I made a great product that no one else in engineering wanted to use (I was working in infrastructure engineering at the time), I would not have much of an impact.

So, instead, I gave up some degree of hands-on technical time in order to have a broader ability to focus people on projects where I felt their impact would make the most difference on the company. Ultimately, that is a core part of any technical leadership job, not doing all of the work yourself, but helping to both see what needs to be done and focus people on getting it done in the right way.

Now, you can imagine a situation where you have an engineering manager who deals with all the "people" side of things, paired with a technical specialist who makes all the decisions about what needs to be done and how it should be done, as per their technical expertise. Why doesn't this situation exist? Well, imagine putting yourself into the shoes of the engineering manager. You are basically a glorified project manager with little decision-making power. Except... all of the people report to you. You have hiring and firing power. You write their reviews. Do you really have no decision-making power? If you disagree with the technical specialist about what needs to be done, why would the technical specialist have final say? It is unrealistic to believe that the person with management authority over the people would have no influence over what they do, even with best intentions assumed.

The technical specialist, with no management authority, is limited. They are limited in what they can produce by the scope of resources (people, time, support from other departments, computers, etc) they have available to them. A people manager may give them a team to focus on their ideas, but that will always be balanced by their ability to produce valuable output as defined by whatever that people manager is being measured on. Miss an alignment with whatever the manager (or THEIR manager more likely) sees as valuable, and you risk losing those resources or having them siphoned off for "more important/pressing" work.

If a company has mostly reluctant/unhappy managers who would truly rather be coding, they have created a culture where code is more valued than management. Think about that. I created a team where the managers, by and large, were happy to be managing, and in the few times when they were not happy to be doing it, they went back to technical focus and we adjusted the teams. I was accused at least once of undervaluing technical contribution, so you should feel free to take this with a grain of salt, but reluctant management is a sign of a broken system, not a necessary an expected outcome of a growing engineering organization.

Dan mentions in his piece that one solution is that we could be promoting people more aggressively on the technical track. I agree with this, if the problem we are solving is giving productive individual contributors more money/bigger titles. Here is the alternate cynical perspective: the technical track exists so that we won't lose people who do good work and have valuable institutional knowledge, but the impact that those people have is often not equivalent to the impact that managers have (for good and for evil). Most companies don't actually need nearly as many people at senior levels of the technical track. We don't want to lose all of them, but there are diminishing returns for people who are possibly just incrementally better at building software. That's why you start to see language about scope of impact in staff engineering job ladders. As leaders, we want to encourage people to actually show results at broader scopes before we promote them. This just takes longer if you have to do it alone or with a small team. At a functional company, people aren't usually promoted as managers until they have shown they can do the next job by managing larger teams and actually doing the work to show the results. Why should the technical track be any different?

What about superpowers that could be given to technical specialists to make up for the authority given to managers? I would personally observe that often, senior technical staff work less hard than senior managers. The superpower of a senior technical staffer is that they get to work on harder intellectual problems and/or more speculative work, they get the luxury of a calendar that is far freer of obligatory meetings and tasks, and they are pressured far less than management for results on tight timelines. In short, by taking the technical path, they have traded off getting to continue to focus on what many engineers consider to be the fun stuff in exchange for a lot less clarity on how to progress and make an outsized meaningful difference. They don't have to work as hard as managers, they often don't work as hard, but on the flip side when they want to work harder to accomplish more it's less clear where to put that effort to gain value.

We all make tradeoffs. Engineering managers are making the tradeoff of getting farther away from the immediate reward (and, I will note, general tech cultural cachet) of day-to-day coding, as well as giving up control of their time, for a clearer career path and possibly more authority. Those who choose to stay technical should expect a likely slower career progression, and they will have to put in the difficult work to learn how to influence without management authority, but they should have more freedom from other people's demands and more control over when they put in the hard work to grow. Make no mistake, growth is hard no matter which way you go. You may have a clearer workout plan as a manager, but you still have to hit the gym and get sore. If you're not a little uncomfortable, what makes you think you're really growing into a new role?

Thursday, October 22, 2015

Open Source Culture

One of the biggest achievements of my career to date, one of the biggest impacts that I have made so far, happened almost as an after-thought. On March 26th of 2015, I released publicly the engineering ladder that the team created for Rent the Runway. I want to reflect on that for a moment because I keep hearing more and more people refer to this ladder.

Why has the Rent the Runway ladder been so viral? I've been told that companies from Kickstarter to Slack to Lyft and beyond have used it to inspire discussion and serve as a starting point. Why now?

We are in a point of rapid evolution in startup culture. A few years ago, everyone was obsessed with flat organizations, zero-process anarchy. And then we all started to live through the consequences of that, and many of us realized that it wasn't the utopia we were promised. Suddenly we saw that having no title didn't actually mean that every voice in the organization was heard equally, it often just meant that the loudest voices were the only voices heard at all. People began to push back on this.

We have become aware of unconscious biases and the challenges presented there. There's a huge temptation to have a very lightweight, low-detail engineering ladder. Unfortunately, that gives people only a vague idea of how to move forward. As we become more aware of the impact of bias on human decision-making, we attempt to provide more clarity to combat this bias. I realize that no amount of ink spilled can ensure a perfectly quantifiable process that takes all human bias out of it, but I also think there's little harm in trying for more clarity. And it turns out that many people appreciate it, and not just the people most affected by unconscious bias.

The devil is in the details. But putting in those details is hard, so providing a thoughtful starting point is better than providing a basic outline. Just as you only use the features of open source software that you actually need, you only need to take the details from open source culture that you want. Companies can (and often have) taken out details from this ladder that they don't like, but it's much harder to take a basic outline and flesh out all of the details yourself.

People want to know what they are getting themselves into. A recruiter commented to me once that the public engineering ladder was a massively useful tool for recruiting. Candidates knew what the structure of the org looked like, what the potential paths forward from a career growth perspective were available to them, and what we were looking for at a given level. It gave them more clarity into the workings of the organization in a way that most companies at the time were not providing.

This has gone beyond hiring people. Leaders of other companies who have adopted the ladder tell me that the process of getting the ladder done and adopted internally has gone smoothly partly because they started with a known, widely-adopted ladder that their teams already felt comfortable with. People are getting used to the idea of an engineering ladder as a tool to help them progress in their career, not a tool to stifle them or create unnecessary hierarchy.

Engineers like to share and learn from each other. The veil of secrecy around HR-related issues has existed without questioning for a long time. Let's question it! When I proposed publishing the ladder, no one in HR batted an eye. We're all comfortable with sharing our source code, and we might blog or speak about the processes for running our teams, but few of us have ever shared the core documents for running our organizations. It is awesome to see Clef share their entire handbook on GitHub, and I hope to see more organizations share not only the processes that they use to run their companies, but the actual documents they rely on for core policies, procedures, and standards.

Open source culture requires you to openly care about culture. You can't open source what you haven't created or modified. I'm excited to see people truly caring about the craft of creating healthy organizational cultures, processes, and documents, and sharing them with the world. I'm excited that people are proud to show that they care about making the culture of the technical workplace better in tangible ways. This is the best of techno-optimism. We can share ideas, and make things better for the whole tech community and the workplace as a whole. I'm proud to be an early adopter of open source culture, and I hope many others will continue to join in the movement.

Tuesday, October 13, 2015

Autonomy, Mastery, Purpose... What's Missing?

I've been thinking a lot about motivation lately. Anyone who has ever managed teams has probably spent some time being alternately perplexed and frustrated with the difficulty of motivating people, especially engineers. We go through contortions to engage, inspire, and most importantly, hire and retain our valuable engineering teams, and yet we still often fail to provide workplaces where people feel truly motivated.

When Daniel Pink's Drive came out, it was the talk of the tech community. Finally a concept that makes sense, especially to engineers! It is still heavily quoted in engineering management conversations, and the TED talk refers to two tech companies (Atlassian and Google) as part of his examples of doing this right. A quick refresher:

Autonomy: Our desire to direct our own lives, to feel that we have choices, and what we are doing is of our own volition.

Mastery: Our urge to get better, to master our craft.

Purpose: The feeling and intention that we can make a difference in the world.

These concepts are said to be intrinsic motivators, meaning they are something that the person does because they want to do it, without expecting a reward from an external party. They stand in contrast to external motivators like money, status, power, or praise, which tend to be more of a baseline that is insufficient to motivate consistently. Note that without some degree of pay, praise, or reward for their work most people will also lack motivation because they need to have their basic needs met. You can also refer to a baseline of pay, safe working conditions, etc as hygiene.

So, back to our big three intrinsic motivators. Engineers resonate with these. Even junior engineers want to have some control in what code they are writing; everyone is constantly driven to learn the next new thing and get better at the craft of software engineering; and any manager will tell you that good employees want to know WHY they are doing the tasks they are doing and who/what is benefitting by their work on these tasks.

I like these motivators, but they have always seemed lacking to me. The natural extreme translation of them in my mind is the job of self-directed researcher. You're choosing what to do, learning new things, and making the world better by exploring the unknown! Except that I tried to do that job, or at least went to grad school where we simulated it, and I absolutely hated it. I had no idea where to go, no idea what to do, and felt like my work had no point and moreover I had no one to help me.

So, I've been very happy to read more lately on the wider topic of motivation, specifically The Three Signs of a Miserable Job and Why Motivating People Doesn't Work, both of which talk about the elements that keep people engaged at work. Including Drive, these three books define, not three, but four distinct motivators:

Measurement/Competence/Mastery: Having a goal, reaching it, getting better, being objectively "good" at what you do

Autonomy: The ability to choose your measures, have some say in what you focus on

Purpose: Knowing who you are making a difference for, whether it is your customers, or your coworkers, or the world at large

Relatedness: Being known at work, feeling a relationship to your coworkers

Relatedness is the fourth element that is absent from Drive. It does not surprise me that engineers would cling to the first three motivators without thinking of the fourth. Relatedness is very touchy-feely. It's hard to quantify. It requires interpersonal engagement. And for many of us, it's probably a distant fourth motivator behind the other three.

And yet. As a leader, you will lead people who want Relatedness in their job. They want you to know about their family, their hobbies. They want to chat with you about their weekend, their trips away. They want to get lunch sometimes.

The nice thing about Relatedness is that it is the easiest thing to provide. You don't have to have management buy-in to ask people about their weekends. You don't have to go through contortions to find business cases for sharing an occasional meal or coffee. And you may find that once you start caring about people, you feel a bit happier yourself at work.

So my advice to new leaders and managers is to be mindful of the fourth element, and don't forget about it. Treat your peers as interesting fellow humans, and you may be surprised what it does for their motivation, dedication, and engagement.

Thursday, October 1, 2015

Notes on Startup Engineering Management for Young Bloods

(With apologies to my good friend Jeff Hodges this is a takeoff on Distributed Systems for Young Bloods).

I’ve been thinking about the lessons startup engineering managers learn on the job. A great deal of our instruction is by making mistakes, and as leaders, those mistakes often cost us real opportunities: people who don't join the company, people who quit, projects that don't ship, sometimes even our jobs. These scars are useful reminders, sure, but it’d be better to have more managers with the full count of their fingers.

Below is a list of some lessons I’ve learned as an startup engineering manager that are worth being told to a new manager. Some are subtle, and some are surprising, and this being human beings, some are inevitably controversial. This list is for the new head of engineering to guide their thinking about the job they are taking on. It’s not comprehensive, but it’s a good beginning.
The best characteristic of this list is that it focuses on social problems with little discussion of technical problems a manager may run into. The social stuff is usually the hardest part of any software developer’s job, and of course this goes triply for engineering managers.
I'm a distributed systems engineer by training, and I enjoy drawing lessons and making comparisons between the two worlds. I wrote this post with many indirect references to Jeff's original screed because I found it mapped quite well. It's long, and each entry is itself worthy of one or several posts, but I hope you will find it a valuable introduction.
OK, having copied Jeff's intro and lightly reworded it, let's dive in.
Managing people at startups is different because you have no safety net. You may think, having spent a few years at a big company in a management position, that you know how to manage already. You've given performance reviews, done interviews, dealt with project timelines, played politics. You know the basics. Right?
Here's what you don't see until you leave the safety of a big company. You don't see the millions of invisible systems all around you that have set you up for success. The army of recruiting personnel, the pipeline of aggregated talent, the power of a known brand. You don't have to figure out salary bands. You have standards, you have processes, you have rules, and you have people who make it relatively easy for you to live by those standards, processes, and rules. You have no idea what it is like not to have someone to check your work, to make sure that people get paid on time, that reviews happen, that holidays are tracked, that budgets get approved. You've never looked at an excel spreadsheet of numbers for year-end compensation only to find it full of errors that needed correcting.
Or maybe you've never been a manager at a big company. Your whole career has been spent in startups, and you have not only no management experience at all, but you've never experienced the safety net. You too are in for problems, but they will be much harder for you to detect, because you don't know what a smooth-running operation even looks like. You are comfortable making your own rules, but you may not appreciate the value of having a machine behind you. Instead of chafing at the chaos, you not only thrive in it, but create more of it. 
In both cases, creating the safety net yourself is part of the job, whether you realize it or not. Create sane processes, sane standards, and shared values, so that the team can scale without you having to make every decision yourself. If you find your team has doubled and you're still making roughly the same decisions you were before it doubled and you're just working twice as hard to get it all done, you are trying to paper over the safety net via effort. It doesn't work.
Coordination is very hard. "I hate agile!" Yeah, I get it. But when you are expected to know the progress of 10 very different projects off the top of your head, and teams across the company from each other cannot seem to get on the same page and you find yourself in meeting upon meeting upon meeting trying to align roadmaps and get some clarity as to what the hell is going on, you may feel that at least getting some shared vocabulary and process across the company is a valuable exercise. Many tactics and theories exist for how to solve this problem, but we must acknowledge that it is fundamentally difficult, and it is unlikely that you are going to solve it perfectly on your first try.
The amount of overhead that goes into managing coordination of people cannot be overstated. It's so great that the industry invented microservices because we'd rather invest engineering headcount and dollars into software orchestration than force disparate engineering teams to work together. And even that is an illusion when it comes to true cross-functional efforts. It turns out, there is a reason big companies end up with project managers who spend all day making sure people are talking to one another.
If the set of projects going on can fit into your head, your project list is probably trivial. I do not say this to malign your company, but there is a huge separation between the management style that you can afford when you know everything that is happening in detail, and the management style when you have to do some deep reading to even know what is going on in an area.
If the set of people is small enough that everyone knows everyone else, politics is trivial. You'll have it, sure, but it will almost always be very visible very quickly. Now is the time to get out in front of it. Politics at this scale is probably a sign of either organizational illness or a lack of cultural unity. If you can easily put a label on the type of people most often bearing the brunt of the politics (engineers, marketers, women, new grads, etc), it's probably a sign of organizational illness that you need to address unless you want it to fester and cause bigger problems as you go. Unless you can literally grow your business without that group represented, you don't want to create a culture that completely excludes an entire group. If, on the other hand, the people who bring politics share personality characteristics that are not strongly correlated with their gender/race/job/experience-level/etc, then you need to figure out what kind of culture your company actually is, and start screening for that in hiring. The sooner you realize what values your company truly holds, the easier avoiding these types of problems will be. If you do not figure out your company's values, as you grow, politics will become worse and worse.
“The team is moving too slow” is the hardest problem you’ll ever debug. When your CEO asks you why nothing is getting done, why we can't do everything on their laundry list, why the project they expected would launch next quarter is still two quarters out, accurately answering that question is incredibly difficult. Once you are a level or two removed from the actual people doing the work, your previous debugging process of "going to every meeting, watching the work being committed, understanding every detail of the project" does not scale. You have to figure this out from a distance. 
Implement trust and communication throughout your team. Your management team MUST trust you, and their developers MUST trust them. The basis of trust is doing what you say what you will do. At the most basic level, this is done by scheduling regular 1-1s and then showing up for them. If you don't have time to do this, think about whether you can handle having so many direct reports. You have to lead by example. Whatever you do to your direct reports will probably flow down the chain. If you disrespect their time, they will disrespect the time of their direct reports. In addition to 1-1s, find ways to be available to your whole team. I am a fan of office hours held weekly that anyone can sign up for. Spend time in the channels where the team hangs out. Show up for drinks and demos. Participate in hackweek. Enjoy this rare opportunity to build a team yourself by spending time with the team you built!
Metrics are not the only way to get your job done, but they can be useful. How often do releases happen? How often do people check in code? How many tickets are outstanding? How much time do people spend in meetings? How many people do you really need to hire this month? Here's the challenge: you're dealing with small data sets (lies, damn lies, and overfitting the data). Even if you have a year of github activity data, understanding WHY a person is coding infrequently requires an understanding of what is going on in that person's life. They may have just become a manager!
This also applies to goal-setting. If you miss your KPIs, what does that mean? Fundamentally, are you measuring the most important things? 
Learn to estimate your capacity, and your team's capacity. You should know how much you can realistically do in a week. If you constantly find yourself not finishing something you said you would, you are overestimating your capacity. When you aren't finishing important things, it is a sign that you need to either delegate or simply STOP DOING SOMETHING. That something you need to stop doing is often writing code. 
Lots of engineering leaders think that the way to maintain empathy for their team and to understand where problems lie is by continuing to write code. Here's the deal: if you have the time to do the full process of writing code (write code + tests, get review, release to QA, validate, release to prod, fix loose ends, document/hand-off/maintain), by all means, write code. It may be valuable to do this for one or two small things a year (protip: hackweek can be good for this). But as your team grows it is completely unrealistic that you are going to understand the pain of every engineer by walking a mile in their shoes. You may be able to do this for a service, but not the javascript, or the javascript, but not the app, or the app, but not the tooling. You must figure out how to get information about the bottlenecks and problems in the development process without actually having to do it all yourself. That is the job. It is not easy, and it is usually less fun than writing code. 
Here's a useful trick for estimating team capacity:
If we got some number of features done this year with our current engineering staff, we will need ~20% more engineers next year to get the same number of features done.
Technical debt and production support implies long-term cost for every new feature. You have to account for that. This is why big tech companies seem to grow massively every year.
Prepare for long feedback loops, and look for opportunities to shorten them. The feedback loop from "hired someone" to "figured out their total impact on the organization" is the duration of a person's time at a company. You will write strategies that may take years to implement. When you are coaching someone to improve in an area, that coaching will often take weeks or months to actually result in behavioral change. You are no longer in a red-green-refactor cycle. Anything you can do to build out quicker feedback loops into the company and your personal style will generally pay dividends. Beyond the common wisdom of not waiting for review cycles to give people both praise and areas for improvement, some things you can and should do:
  1. Get your teams in the habit of releasing code as frequently as is reasonably possible. That may not be "continuous", but it probably looks like more than once a week. If your team is unable to release code frequently (barring stupid shit like Apple store processes), it is a sign of potential bottlenecks.
  2. Postmortem time? Try to make sure it is held the day after the incident, when it is fresh in people's minds.
  3. Look for ways to prove out ideas early. This includes your architectural and strategic ideas. Work them organically into the product roadmap, to show value early.
Choose your structure (or lack thereof) wisely. You may not personally be creating technical debt much anymore, but that doesn't mean you aren't creating organizational debt. When you roll out a half-assed engineering ladder, this new structure may actually make your life harder, because now you're going to have to negotiate with engineers eager to debate the finer points of broad and vague language. Early on, structure doesn't matter much, but at some point you have to address it. Interested in going the Holacracy or other self-organizing route? I highly recommend reading Reinventing Organizations. Because guess what? They also require some thought and process to work! Be prepared to be thoughtful about this and put some time into it.

The worst situation is having random titles, random pay, random equity. I have stopped counting the number of people who have told me that they discovered massive unfairness in the salaries and equity paid to members of their team. It happens easily. You pay early employees less, assuming you'll give them more valuable equity, but that does not always happen. As the company gets bigger and the market rates change, you hire new people in at higher salaries, but never adjust older people. Cleaning this up probably requires setting up a structure for levels, pay ranges for those levels, and actually increasing the salaries of many people to bring them up to level. 

You'll probably get this a little bit wrong the first time, but in an effort to make your life slightly easier, if you decide you want to do an engineering ladder feel free to use the Rent the Runway Ladder that we shared as a starting point

People can do more than they think they can. This goes for you, and for your entire team. I can't tell you about the failure patterns of people who overwork their teams because that's not my personal failure pattern (yet). But I do sometimes fail to push people hard enough. Engineers want to ship. This goes double for startup engineers. If they are not shipping much, they will start to complain of boredom, and go looking for ways to make trouble. This may result in you having overengineered systems in prod, engineers who quit to go to a new shiny company, or worst, engineers who first push overengineered systems halfway to prod then quit to go to a new shiny company.

I know my boundaries and rarely respond well to people trying to push me to do things. I push myself hard enough. This is true for some engineers, but not all. When you are not pushing them to get something done, you may be trying to give them space, and they may consciously appreciate that while unconsciously start to think that their work doesn't matter that much. If it mattered, my boss would be pushing me!
So, if you have an engineer complaining of boredom, ask them to finish what they're doing faster. I once heard this on the topic of infrastructure engineering: "If you're bored, try doing it all 20% cheaper". Engineers will often identify interesting hard problems when they try to do things faster. Such as: it's hard for me to run tests because they're too slow, getting to prod takes a long time, the build is always broken, this downstream system is not responsive. You were saying you're bored?
People will quit. As sure as the sun rises, as sure as networks occasionally partition, people will quit. They will quit because you are a bad boss. The will quit despite you being their best boss ever. They will quit to move across the country. They will quit because they don't see the future with your company. They will quit because they got another offer they can't turn down. They will quit because it is time for them to move on.
You cannot control all the reasons that people quit, but it will feel like a punch in the gut every time it happens. Your job is to keep going. To put on a smile and go out and recruit. To rally the team. To celebrate that person even as you are seething inside, how dare they leave me right now! Try to identify common causes for people quitting your organization, and address them as best you can. Especially if those causes relate to the environment that you help create (harassing, aggressive, burnout, underpaying, favoritism, politics). If you can in good conscience look at your environment and believe that it is in pretty good shape, then be kind to yourself. 
One of my greatest accomplishments was this: I told my team over the years that if they were seriously looking to move on, to tell me and let me help them find a new job. This finally happened for the first time this year and it felt like a massive win. If you care about your team, helping them move on when they want to move on is a great honor.

Jeff didn't manage to find a concluding paragraph. It's hard to put a conclusion on what is ultimately a brain dump. But here is mine:

Your job is to survive. Put one foot in front of the next. Keep going. Be open-minded. Be curious. Read about what other people are doing. Make friends who are running tech at other companies. Be kind to yourself, even if you fail. You have the power to make the world better for all of those people on your team. Use it responsibly.

Wednesday, July 29, 2015

Have a Theory

There is a phrase I find myself employing pretty frequently at work, when discussing new features or products. While I am not a product manager, I am responsible for making sure that we implement features well, and thinking strategically about what we are spending our precious time implementing. So, when I am asked about my thoughts on a new product or feature, I usually have one and only one question:

"What is your theory?"

In this day and age we sometimes get lazy about thoughtfulness, and rely on data and experimentation to hill-climb our way through the world around us. Or at least we say that we rely on data and experimentation to drive our features. But the reality is that we're working in such complex multivariate environments that we cannot possibly test all permutations of even the simplest change. We do make choices about what features we build, and these choices are not entirely data-driven.

So, given that our choices cannot be entirely perfectly data-driven, how then do we decide what to build? The only way that we can make sane choices in a complex world is by actually being thoughtful about the choices we are making, creating a theory, and creating experiments that actually test that theory.

For example, in my current world of e-commerce, we often are faced with the mandate to implement a new feature that will make the customer feel better about the product in some nebulous way (it's cooler! it's more high fashion! millennials will love it! whatever). This feature, while it might not cause customers to immediately buy more up front, should cause them to be more loyal over time. Sometimes, this is the right instinct. But beware: if you're going to try to get second or third order effects from a feature, you'd better have a really solid theory of the chain of events that leads to those second or third order impacts. And you need to figure out what you can measure to validate the chain of events. Don't just look at the number of people buying the product and hope it goes up. What does making the look and feel "cooler" DO for your customer? Do they visit more often? Spend more time? Tell more friends? Have a theory!

Failing to have a theory, and a solid experimentation plan for proving that theory, leaves you open to all kinds of irrational outcomes. The worst of these is the "you just didn't implement it well enough" outcome. The original idea was good, but you implemented it poorly, and that's why it failed. And that could very well be true! But it's impossible to prove or disprove without anticipating the question ahead of time, thinking through the logical conclusions of the theory, and setting up a good test to understand its outcome.

So the next time you are building a feature, ask yourself: Do we have a theory? What is it? Are we measuring the immediate expected effects of the theory, or are we just measuring the same stuff we always measure and hoping that it changes?

Tuesday, July 21, 2015

Ask the CTO: Going Rogue

I often get asked one-off questions about engineering leadership and management, and thought it would be fun to share my answers here. Asker's question has been anonymized and generalized.

The challenge:
I have an employee that was supposed to be adding a needed feature to one of our core systems. A few days ago I notice in GitHub that he has created a new repo and been working solely in that repo for the past two weeks. Instead of adding a feature to the new system, he is completely rewriting it! Furthermore, the repo is in a new language that no one in the team uses and that we have never put into production. I feel bad, I should have noticed this before it got this far, but I never expected someone we consider to be a senior engineer to go off and do this without at least checking in. How should I address this?

My thoughts:

Oof. This has happened to most of us at least once, in some fashion. Engineers want to be able to use what they think is the right tool for the job, even when the right tool is brand-new to the company. And generally speaking, this should be OK! The last thing you want to do is stifle people's initiative to create solid solutions and learn new things.

That being said, there is a high overhead to adding new things to your stack, and at some point, it usually makes sense to have some policy around how to add new things. I whipped together such a policy for my team about a year ago, when we had reached around 40 on the team and there were some folks discussing creating a new service in a language that we had very limited experience with. Our policy looks something like this:

Before any new language/framework is chosen for a production system, the following needs to be in place and approved by Camille and an architecture review board (group of senior engineers who are knowledgable in the area of change and would be impacted by it)
  1. the engineers advocating for the language/framework will present a case as to the benefits it will provide over existing choices
  2. there is a plan for what sorts of systems this language/framework should be used to implement, and what existing systems could be rewritten in it
  3. a style guide and templates for readability, testing, continuous integration, monitoring, logging, deployment and production standards will have been created
  4. at least four engineers on the team must sign up on learning how to write readable, production quality code and support the new systems in production
This must be done before the start of any project for engineering to commit to supporting the resultant code.
 This is a bit of a heavyweight list, but it articulates some of the challenges with bringing new languages and frameworks into teams. If you are in a team that wants to be conservative with new languages, frameworks and tools, clearly articulating the process for adding new things is an important element to avoid unexpected surprises on both sides of the equation.

So, you can put such a policy in place and point to it in the future to try and prevent such things from happening, but it has happened now, and there is an argument to be made that part of being a senior engineer is knowing when to communicate scope changes such as the need to move to new languages or frameworks. Even if you don't agree with my conservative approach to adding new things, you probably appreciate it when you get a heads-up on important changes early in the process.

The conversation you have now should involve first understanding their perspective: Why the new language? Why didn't they grab you earlier to tell you about it? What you learn from their perspective might surprise you. Perhaps you skip all of their 1-1s and they think you are unapproachable. Perhaps they are frustrated with the way you make decisions. Perhaps they knew you would be mad and were simply afraid to show you new work early when you would shoot it down.

Once you have gotten their perspective (and perhaps some takeaways for you), now it is time to clarify your perspective and expectations of them. As a senior engineer, you need them to push information to you. You expect them to communicate the scope of changes and approximate timelines, and let you know when these things change. They need to think about their peers, and have empathy for the needs of the team as well as their own interests; all too often teams will reject projects they weren't aware of and didn't have any say in. Socializing change not only to your manager but to your team is part of the role of the senior engineer.

So, to sum it up:

If you care about having a somewhat conservative process for new languages/frameworks, clarify what that process is and share it with your team

When someone ignores the process or otherwise surprises you, first ask why and try to understand their perspective

Finally, clarify your expectations to them, helping them understand the impact that their actions have on others and the importance of communcation