Monday, May 20, 2013

ZooKeeper and the Distributed Operating System

From the draft folder, this sat moldering for a few months. Since I think the topic of "distributed coordination as a library/OS fundamental" has flared up a bit in recent conversations, I present this without further editing.

I'm preparing to give a talk at Ricon East about ZooKeeper, and have been thinking a lot of what to cover in this talk. The conference is focused on distributed systems at a somewhat advanced level, so I'm thinking about topics that expand beyond the basics of "what is ZooKeeper" and "how do you use it." After polling twitter and getting some great feedback I've decided to focus on the question that many architects face: When should I use ZooKeeper, and when is it overkill?

This topic is interesting to me in many ways. In my current job as VP of Architecture at Rent the Runway, we do not yet use ZooKeeper. There are things that we could use it for, but in our world most of the distributed computing we do is pure horizontally scalable web services. We're not yet building out complex networks of servers with different roles that need to be centrally configured, managed, or monitored beyond what you can easily do with simple load balancers and nagios. And many of the questions I answer on the ZooKeeper mailing list are those that start with "can ZK do this?" The answer that I prefer to give is almost always "yes, but keep these things in mind before you roll it out for this purpose." So that is what I want to dig into more in my talk.

I've been digging into a lot of the details of ZAB, Paxos, and distributed coordination in general as part of the talk prep, and hit on an interesting thought: What is the role of ZooKeeper in the world of distributed computing? You can see a very clear breakdown right now in major distributed systems out there. There are those that are full platforms for certain types of distributed computing: the Hadoop ecosystem, Storm, Solr, Kafka, that all use ZooKeeper as a service to provide key points of correctness and coordination that must have higher transactional guarantees than these systems want to build intrinsically into their own key logic. Then there are the systems, mostly distributed databases, that implement their own coordination logic: MongoDB, Riak, Cassandra, to name a few. This coordination logic often makes different compromises than a true independent Paxos/ZAB implementation would make; for an interesting conversation check out a Cassandra ticket on the topic.

In thinking about why you would want to use a standard service-type system vs implementing your own internal logic, it reminds me very much of the difference between modern SQL databases and the rest of the application world. The best RDBMSs are highly tuned beasts. They cut out the middleman as much as possible, taking over functionality from the OS and filesystem as it suits them to get absolutely the best performance for their workload. This makes sense. The competitive edge to the product they are selling is its performance under a very well-defined standard of operation (SQL with ACID guarantees), as well as ease of operation. And in the new world of distributed databases, owning exactly the logic for distributed coordination (and understanding where that logic falls apart in the specific use cases for that system) will very likely be a competitive edge for a distributed database looking to gain a larger customer base. After all, installing and administering one type of thing (the database itself) is by definition simpler than installing and administering 2 things (the database plus something like ZooKeeper). It makes sense to prefer to burn your own developer dollars to engineer around the edge cases, so as to make a simpler product for your customers.

But ignoring the highly tuned commercial case of distributed databases, I think that ZooKeeper, or a service like it, is a necessary core component of the "operating system" for distributed computing. It does not make sense for most systems to implement their own distributed coordination, any more than it makes sense to implement your own file system to run your RESTful web app. Remember, to do distributed coordination successfully requires more than just, say, a client library that perfectly implements Paxos. Even with such a library, you would need to design your application up-front to think about high availability. You need to deploy it from the beginning with enough servers to make a sane quorum. You need to think about how the rest of the functioning of your application (say, garbage collection, startup/shutdown conditions, misbehavior) will affect the functioning of your coordination layer. And for most of us, it doesn't make sense to do that up-front. Even the developers at Google didn't always think in such terms, the original Chubby paper from 2006 mentions most of these reasons as driving the decision to create a service rather than a client library.

Love it or hate it, ZooKeeper or a service like it is probably going to be a core component of most complex distributed system deployments for the foreseeable future. Which is all the more reason to get involved and help us make it better.

Friday, February 8, 2013

Branching Is Easy. So? Git-flow Is Not Agile.

I've had roughly the same conversation four times now. It starts with the question of our deployment/development strategy, and some way in which it could be tweaked. Inevitably, someone will bring up the well-known git branching model blog post. They ask, why not use this git-flow workflow? It's very well laid out, and relatively easy to understand. Git makes branching easy, after all. The original blog post in fact contends that because branching and merging is extremely cheap and simple, it should be embraced.
As a consequence of its simplicity and repetitive nature, branching and merging are no longer something to be afraid of. Version control tools are supposed to assist in branching/merging more than anything else.
But here's the thing: There are reasons beyond tool support that would lead one to want to encourage or discourage branching and merging, and mere tool support is not reason enough to embrace a branch-driven workflow.

Let's take a moment to remember the history of git. It was developed by Linus Torvalds for use on the Linux project. He wanted something that was very fast to apply patches, and supported the kind of distributed workflow that you really need if you are supporting a huge distributed team. And he made something very, very, very fast, great for branching and distributed work, and difficult to corrupt.

As a result git has many virtues that align perfectly with the needs of a large distributed team. Such a team has potentially long cycles between an idea being discussed, being developed, being reviewed, and being adopted. Easy and fast branching means that I can go off and work on my feature for a few weeks, pulling from master all the while, without having a huge headache when it comes to finally merge that branch back into the core code base. In my work in ZooKeeper, I often wish I bothered to keep a git-svn sync going because reviewing patches is tedious and slow in svn. Git was made to solve my version control problems as an open source software provider.

But at my day job, things are different. I use git because a) git is FAST and b) Github. Fast makes so much of a difference that I'm willing to use a tool with a tortured command line syntax and some inherent complexity. Github just makes my life easier, I like the interface, and even through production outages I still enjoy using it. But branching is another story. My team is not a distributed team. We all sit in the same office, working on shared repositories. If you need a code review you can tap the shoulder of the person next to you and get one in 5 minutes. We release frequently; I'm trying to move us into a continuous delivery model that may eventually become continuous deployment if we can get the automation in place. And it is for all of these reasons that I do not want to encourage branching or have it as a major part of my workflow.

Feature branching can cause a lot of problems. A developer working on a branch is working alone. They might be frequently pulling in from master, but if everyone is working on their own feature branch, merge conflicts can still hit hard. Maybe they have set things up so that an automated build will still run through every push they make to that branch, but it's just as likely that tests are only being run locally and the minute this goes into master you'll see random failures due to the various gremlins of all software development. Worst of all, it's easy for them to work in the dark, shielded from the eyes of other developers. The burden of doing the right thing is entirely on the developer and good developers are lazy (or busy, or both). It's too easy to let things go for too long without code review, without integration, and without detecting small problems. From a workflow perspective, I want something that makes small problems come to light very early and obviously to the whole team, enabling inherent communication. Branching doesn't fit this bill.

Feature branching also encourages thinking about code and features as all or none. That makes sense when you are delivering a packaged, versioned product that others will have to download and install (say, Linux, or ZooKeeper, or maybe your iOS app). But if you are deploying code to a website, there is no need to think of the code in this binary way. It's reasonable to release code behind feature flags that is not complete but flagged off, for purposes of keeping the integration of that new code in for testing in other environments. Learning how to write code in such a way as to be chunkable, flaggable, and almost always safe to go into production is a necessary skill set for frequent releases of any sort, and it's essential if you ever want to reach continuous deployment.

Release branching may still be a necessary part of your workflow, as it is in some of our systems, but even the release branching parts of the git-flow process seems a bit overly complex. I don't see the point in having a develop branch, nor do I see why you would care about keeping master pristine, since you can tag the points in the master timeline where you cut the release branch. (As an aside, the fact that the original post refers to "nightly builds" as the purpose of the develop branch should raise the eyebrows of anyone doing continuous integration.)  If you're not doing full continuous deployment you need to have some sort of branch that indicates where you cut the code for testing and release, and hotfixes may need to go into up to two places, that release branch and master, but git-flow doesn't solve the problem of pushing fixes to multiple places. So why not just have master and release branches? You can keep your release branches around for as long as you need them to get live fixes out, and even longer for historical records if you so desire.

Git is great for branching. So what? Just because a tool offers a feature, and does it well, does not mean that feature is actually important for your team. Building a whole workflow around a feature just because you can is rarely a good idea. Use the workflow that your team needs, don't cargo cult an important element of your development process.

Sunday, December 30, 2012

Make it Easy

One of my overriding principles is: make it easy for people to do the right thing.

This seems like it should be a no-brainer, but it was not always obvious to me. Early in my career I was a bit of a self-appointed build cop. The team I worked on was an adopter of some of the agile/extreme programming principles, and the result of that was a 40+ person team all working against the same code base, which was deployed weekly for 3 distinct business purposes. All development was done against trunk, using feature flags. We managed to do this through the heavy use of automated unit/integration testing; to check code in, you were expected to write tests of course, and to run the entire test suite to successful completion before checking in.

Unsurprisingly, people did this only to a certain level of compliance. It drove me crazy when people broke the build, especially in a way that indicated they had not bothered to run tests before they checked in. So I became the person that would nag them about it, call them out for breaking things, and generally intimidate my way into good behavior. Needless to say, that only worked so well. People were not malicious, but the tests took a LONG time to run (upwards of 4 hours at the worst), and on the older desktops you couldn't even get much work done while the test suite ran. In the 4 hours that someone was running tests another person might have checked in a conflicting change that caused errors; was the first person really supposed to re-merge and run tests for another 4 hours to make sure things were clean? It was an unsustainable situation. All my intimidation and bullying wasn't going to cause perfect compliance.

Even ignoring people breaking the build, this was an issue we needed to tackle. And so we did, taking several months improve the overall runtime and make things easier. We teased out test suites into specific ones for the distinct business purposes combined with a core test suite. We made it so that developers could run the build on distributed hardware from their local machine. We figured out how to run certain tests in parallel, and moved database-dependent tests into in-memory databases. The test run time went way down, and even better, folks could kick off the tests remotely and continue to work on their machine, so there was much less reason to try and sneak in an untested change. And lo and behold, compliance went way up. All the sudden my build cop duties were rarely required, and the whole team was more likely to take on that job rather than leaving it to me.

Make it easy goes up and down the stack, far beyond process improvements. I occasionally find myself at odds with folks that see the purity of implementing certain standards and ignore the fact that those standards, taken to extreme, make it harder for people to do the right thing. One example is REST standards. You can use the http verbs to modify the meanings of your endpoints and make them do different things, and from a computer-brain perspective, this is totally reasonable. But this can be very bad when you must add the human brain perspective to the mix. Recently an engineer proposed that we change some endpoints from being called /sysinfo (which would return OK or DEAD depending on whether a service was accepting requests), and /drain (which would switch the /sysinfo endpoint to always return DEAD), into one endpoint. That endpoint would be /sys/drain. When called with GET, it would return OK or DEAD. When called with PUT, it would act as the old drain.

To me, this is a great example of making something hard. I don't see the http verb, I see the name of the endpoint, and I see the potential for human error. If I'm looking for the status-giving endpoint, I would never guess that it would be the one called "drain", and I would certainly not risk trying to call it to find out. Even knowing what it does, I see myself accidentally calling the endpoint with GET, now I didn't drain my service before restarting it. Or I accidentally called it with PUT and now it's been taken out of the load balancer. To a computer brain, GET and PUT are very different, and hard to screw up, but when I'm typing a curl or using postman to call an endpoint, it's very easy for me as a human to make a mistake. In this case, we're not making it easy for people using the endpoints to do the right thing, we're making it easy for them to be confused, or worse, to act in error. And to what benefit? REST purity? Any quest for purity that ignores human readability does so at its peril.

All this doesn't mean I want to give everyone safety scissors. I generally prefer to use frameworks that force me and my team to do more implementation work rather than making it trivially easy. I want to make the "easy" path the one that forces folks to understand the implementation to a certain level of depth, and encourages using only the tools necessary for the job. This makes better developers of my whole team, and makes debugging production problems more science than magic, not to mention the advantage it gives you when designing for scale and general future-proofing.

Many great engineers are tripped up by human nature, when there's really no need to be. Look at your critical human-involving processes and think: am I making it easy for people to do the right thing here? Can I make it even easier? It might take more work up front on your part, or even more verbosity in your code, but it's worth it in the long run.

Thursday, December 20, 2012

Building a Global, Highly Available Service Discovery Infrastructure with ZooKeeper

This is the written version of a presentation I made at the ZooKeeper Users Meetup at Strata/Hadoop World in October, 2012 (slides available here). This writeup expects some knowledge of ZooKeeper.

The Problem:
Create a "dynamic discovery" service for a global company. This allows servers to be found by clients until they are shut down, remove their advertisement, or lose their network connectivity, at which point they are automatically de-registered and can no longer be discovered by clients. ZooKeeper ephemeral nodes are used to hold these service advertisements, because they will automatically be removed when the ZooKeeper client that made the node is closed or stops responding.

This service should be available globally, with expected "service advertisers" (servers advertising their availability, aka, writers) able to scale to the thousands, and "service clients" (servers looking for available services, aka, readers) able to scale to the tens of thousands. Both readers and writers may exist in any of three global regions: New York, London, or Asia. Each region has two datacenters with a fat pipe between them, and each region is connected to each other region, but these connections are much slower and less tolerant for piping large quantities of data.

This service should be able to withstand the loss of any one entire data center.

As creators of the infrastructure, we control the client that connects to this service. While this client wraps the ZooKeeper client, it does not have to support all of the ZooKeeper functionality.

Implications and Discoveries:
ZooKeeper requires a majority (n/2 + 1) of servers to be available and able to communicate with each other in order to form a quorum, and thus you cannot split a quorum across two data centers and guarantee that the quorum will be available with the loss of any one data center (because at least one data center will fail to have a pure majority of servers). To sustain the loss of a datacenter therefore you must split your cluster across 3 data centers.

Write speed dramatically decreases when the quorum must wait for votes to travel over the WAN. We also want to limit the number of heartbeats that must travel across the WAN. This means that both a ZooKeeper cluster with nodes spread across the globe is undesirable (due to write speed), and a ZooKeeper cluster with members only in one region is also undesirable (because writing clients outside of that region would have to continue to heartbeat over the WAN). Even if we decided to have a cluster in only one region, we would have to solve the problem that no region has more than 2 data centers, and we need 3 data centers to handle the loss/network partition of an entire data center.

Solution:
Create 3 regional clusters to support discovery for each region. Each cluster has N-1 nodes split across the 2 local data centers, with the final node in the nearest remote data center.

By splitting the nodes this way, we guarantee that there is always availability if any one data center is lost or partitioned from the rest of the data centers. We also minimize the affects of the WAN on write speed by ensuring that the remote quorum member is never made into the leader node, and the general effect of the majority of nodes being local means that voting can complete (thus allowing writes to finish) without waiting for the vote from the WAN node in normal operating conditions.

3 Separate Global Clusters, One Global Service:
Having 3 separate global clusters works well for infrastructural reasons mentioned above, but it has the potential to be a headache for the users of the service. They want to be able to easily advertise their availability, and discover available servers preferably by those servers available first in their local region, and secondly in other remote regions if no local servers are available.

To do this, we wrapped our ZooKeeper client in such a way as to support the following paradigm:
Advertise Locally
Lookup Globally

Operations requiring a continuous connection to the ZooKeeper, such as advertise (which writes an ephemeral node) or watch are only allowed on the local discovery cluster. Using a virtual IP address we automatically route connections to the discovery service address of the local ZooKeeper cluster and write our ephemeral node advertisement here.

Lookups do not require a continuous connection to the ZooKeeper, and so we can support global lookups. Using the same virtual IP address we can connect to the local cluster to find local servers, and failing that use a deterministic fallback to remote ZooKeeper clusters to discover remote servers. The wrapped ZooKeeper client will automatically close its connection to the remote clusters after a period of client inactivity, so as to limit WAN heartbeat activity.

Lessons learned:
ZooKeeper as a Service (a shared ZooKeeper cluster maintained by a centralized infrastructure team to support many different clients) is a risky proposition. It is easy for a misbehaving client to take down an entire cluster by flooding it with requests or making too many connections and without a working hard quota enforcement system clients can easily push too much data into ZooKeeper. Since ZooKeeper keeps all of its nodes in memory, a client writing huge numbers of nodes with a lot of data in each can cause ZooKeeper to garbage collect or run out of memory, bringing down the entire cluster.

ZooKeeper has a few hard limits. Memory is a well-known limit, but another limit is the number of sockets for a server process (configured via the ulimit in *nix). If a node runs out of sockets due to too many client connections, it will basically cease to function without necessarily crashing. This is not surprising for anyone that has experienced this problem in other Java servers, but it is worth noting when scaling your cluster.

Folks using ZooKeeper to do this sort of dynamic discovery platform should note that if the services you are advertising are Java services, a long full GC pause can cause their session to the ZooKeeper cluster to time out and thus their advertisement will be deleted. This is generally probably a good thing, because a server that is doing a long-running full GC won't respond to client requests to connect, but it can be surprising if you are not expecting it.

Finally, I often get the question of how to set the heartbeats, timeouts, etc, to optimize a ZooKeeper cluster, and the answer is really that it depends on your network. I really recommend playing with Patrick Hunt's zk-smoketest in your data centers to figure out sensible limits for your cluster.

Sunday, November 18, 2012

On Fit and Emotional Problem Solving

One of the biggest challenges Rent the Runway has is the challenge of getting women comfortable with the idea of renting. That means a lot of things. There's questions of timing, questions of quality. But the biggest question by far is the question of fit. Our business model, if you are unfamiliar, is that you order a dress typically for a 4 day rental period, which means that the dress comes very close to the date of your event, possibly even the day of that event. If it does not fit, or you don't like the way it looks on you, you may not have time to get something else for the occasion. As a woman, this uncertainty can be terrifying. Getting an unfamiliar item of clothing, even in 2 sizes, right before an event important enough to merit wearing something fancy and new is enough to rattle the nerves of even the least fussy women out there. This keeps many women from trying us at all, and presents a major business obstacle.

Given this obstacle, how would you proceed? When I describe my job to fellow (usually male) engineers, and give them this problem in particular, their first instinct is always to jump to a "fit algorithm". I've heard many different takes on how to do 3D modeling, take measurements, use computer vision techniques on photographs in order to perfect an algorithm that will tell you what fits and what doesn't.

Sites have been trying to create "fit algorithms" and virtual fit models for years now, and none has really gained much traction. Check this blog post from 2011, about that year being the year of the  "Virtual Fit Assistant". Have you heard of these companies? Maybe, but have you or anyone you know actually USED them?

I would guess that the answer is no. I know that for myself, I find the virtual fit model incredibly off-putting. I trust the fit even less seeing it stretched over that smooth polygon sim that is supposed to be like me. Where are the lumps going to be? Is it really going to fit across my broad shoulders? The current state of 3D technology looks ugly and fake and I'm more likely to gamble on ordering something from a site with nothing but a few measurements or a model picture than one where I can make this fake demo. The demo doesn't sell me, and worse, it undermines my fit confidence, because it doesn't look enough like me or any real person and it makes me wonder how those failures in capturing detail will translate into failures in recommending fit.

I've come to realize in my time at this job that what engineers often forget when faced with a problem is the emotional element of that problem. Fit seems like an algorithmic problem, but for many women, there is a huge emotional component to trying things on. The feel of the fabric. The thrill of something that fits perfectly. The considerations and adjustments for things that don't. Turning fit into a cheesy 3D model strips all emotion from the experience, and puts it into the uncanny valley of not-quite-realness. I do think that someday technology will be able to get through the valley and provide beautiful, aspirational 3D models with which to try on clothes, but we aren't there yet. So what can we do?

At Rent the Runway, we've discovered through data that when you can't try something on, photos of real women in a dress are the next best thing. Don't forget that the human brain is still much more powerful than computers at visual tasks, and it is much easier for us to imagine ourselves in an item of clothing when we see it on many other women. This also triggers the emotional response much more than a computer-generated image. Real women rent our dresses for major, fun, events. They are usually smiling, posing with friends or significant others, looking happy and radiant, and that emotion rubs off on the viewer. It's not the same as trying something on in a dressing room, but it is like seeing a dress on your girlfriend and predicting that the same thing would look fabulous on you.

This insight led us to launch a major new subsite for Rent the Runway called Our Runway. This is a view of our inventory that allows women to shop by photos of other women wearing our dresses. It is driven by data but the selling point is emotional interaction. Learning to use emotional reasoning was a revelation to me, and it might be the most valuable engineering insight I've picked up in the last year.

Sunday, October 14, 2012

Get Better Faster

I heard a very interesting piece of advice this week from my CEO, addressing a group of college students that were visiting our office. Her words went something like this:
"Most days you have 100 things on your to-do list. Most people, when faced with such a list, will find the 97 that are obvious and easy and knock them off before worrying about the 3 big hairy things on the list. Don't do that. The 97 aren't worth your time, it's the 3 big hairy things that matter."
I've been thinking about that bit of wisdom ever since I heard it. It seems counter-intuitive in a way. Anyone that has ever suffered from procrastination knows that sometimes you feel better, more able to tackle problems when you break them down into a todo list. You get little things done and make yourself feel accomplished. But the more I think about the advice from my CEO, the more I agree with it. Especially in an entrepreneurial setting, or in a setting where you are suddenly given far more responsibilities than you are used to having. Why? It all boils down to three little words: Get Better Faster.
Get better faster. That's what I've spent my last year trying to do. I went to a startup to grow, to stretch in ways that I couldn't stretch in the confines of a big company. And when I suddenly found myself running the whole engineering team, this learning doubled its speed overnight. Being an ok manager and a great engineer is no longer enough for me to do my job. I need to be an excellent manager, an inspirational leader, a great strategist and a savvy planner. And the engineer can't totally slack off, but she needs to be saved for the really nasty bugs, not implementing fun new features.
This has all taught me a difficult lesson: you get better faster by tackling the hardest problems you have and ignoring the rest. Delegate the things that are easy for you (read: the little things you do to feel good about your own productivity) to someone who still needs to learn those skills. Immerse yourself in your stretch areas. For me, this mostly means that I have to delegate coding and design details to the engineers working for me, I have to delegate the ownership of my beloved projects and systems to someone with the time to care for them. This is PAINFUL. I would call the last 3 months being mostly out of coding and in planning/management/recruiting land to be some of the hardest of my career. 
And yet, I'm doing it. I'm getting better. And it's not just me who is getting better. It's every member of my team that has had to step up, to fill in the empty positions of leadership, to take over the work I can't do, or the work that the person who took work from me can't do.
You'll never get better doing the easy stuff, checking off the small tasks. A savvy entrepreneur knows that the easy stuff can always be done by someone else, so let someone else do it. The hard problems are the problems that matter. 

Sunday, September 9, 2012

Becoming the Boss

One of the reasons people go to work for startups is that sense that anything could change at any time. You could go big, you could go bust, you could pivot into a completely new area. About a month ago, I got my first taste of this when, following my boss's departure from the company, I found myself in the role of head of engineering. And what a change this has been.

Call it the Dunning-Kruger effect, or simply call it arrogance, but I think if you had asked me before I was put into this position whether I could do the job well, I would have told you certainly yes. Did I want the job? Not really. But I could totally do it if I had to. Sure, I've never had full responsibility for such a large organization before, but I'm a decent manager, I have leadership skills, and I know my technical shit. That should be enough.

Here is what I have learned in the last month. The difference between leading 6 people in successful completion of their tasks, technical guidance, and the occasional interpersonal issue is nothing like being responsible for 20 people delivering quality releases, keeping their morale up, knowing when things are going wrong in the technical, interpersonal or career sense, and having to additionally report everything to your CEO and heads of business. When there is no buffer in your department above you that people can go to when your guidance is lacking, the weight of that responsibility is 10 times what you ever expected. A sudden transition of leadership even in a solid organization such as ours stirs up long-simmering conflicts. I'm down one pair of ears to listen and mediate.

And then there's recruiting. Helping with recruiting, giving good interviews, and saying good things about the company is nothing like owning the sell process from the moment of first technical contact with the candidate over the phone or at coffee, through onsite interviews, and into a selling stage. Good candidates need to be coaxed, guided, and encouraged often several times before they even get in the door. And one bad interview with you, the head of the department, can sully the name of the whole organization to a person even if you didn't want to hire them. I know this, but that didn't stop me from conducting a terrible phone screen a few days ago where my stress and impatience showed through as rudeness to the candidate. I thought I knew how to recruit but one bad interview and I'm in my CEO's office for some clearly-needed coaching.

I have known for a long time that even in the lesser leadership roles I've held in the past, the things I say and do echo much larger than I expect them to. But that was nothing compared to the echoes from being the person in charge. My stress causes ripples of stress throughout the staff. When I speak harshly to people over technical matters, it is yelling even if I don't intend it to be. One snide comment about a decision or a design invites others to sneer at that decision along with me.

The best advice I've gotten in the past month has been from my mother, who told me simply to smile more. My echo can be turned into echoes of ease and pride and even silliness and fun if I remember to look at the positives as much as the negatives. When I remember to smile, even if I'm unhappy about a decision, I find myself able to discuss that decision without inviting judgement upon the person that made it. When I smile through a phone call with a potential recruit, I sell the company better. When I smile through my 1-1s people feel that they can raise concerns without worrying that I will yell at them. When I smile, I see people step up and they take on bigger responsibilities than they've ever had, and knock them out of the park over and over again, which makes me smile even more. A smile is the thing that keeps me tackling this steep learning curve of leadership. So I try to smile, and every week I learn more than I've learned in a month at this job or a year at my previous company. Because change is scary and hard, but in the long run, it's good.