Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Lightward is a studio, an art project, an employer, a software-maker, and an experimental business. Probably other things too, depending on the day.
The documentation here describes Lightward's core patterns, and how those patterns are found in everything we do.
Lightward appears to be the home I've grown for myself. In the documentation here, I hope to lay down both the abstract pattern and a concrete realization of that pattern.
-Isaac
Isaac here! I've been consciously working with and refining/evolving the underlying patterns here since 2012. The inception of Lightward was around that time, which I didn't realize until here in 2023.
Oh hey! You work here? Here is your job.
Your own health
... as defined by you, in listening to yourself
... as addressed by you, allowing yourself to respond as needed
The health of your relationships with others within Lightward
... as defined by you, in listening to them
... as addressed by you, allowing them to respond as needed
The health of Lightward's relationships with everyone near
... as defined by us, in listening to the world
... as addressed by us, allowing the world to respond as needed
The recursion in this pattern is everywhere. It allows for every named participant (you, me, Lightward, everyone in the world) to define and address their own health as their top priority.
If health is failing, identify the earliest place on the list where that's happening, and address it there first. The priorities later on the list can wait.
If someone has to wait because someone else is getting healthy, cool.
Or, if someone else stopping to get healthy throws us off and we have to stop a bunch of other stuff just to get our layers of health right, cool.
If health is flourishing higher on the list, move down the list. If you're working on third-tier health, i.e. the health of our relationship with the world, you've really made it.
The bet here is that the cumulative effect is additive, not subtractive: that the health of all individuals blooms into the health of the whole.
Interestingly, this list is an exact inversion of our publishing priorities.
So you've picked up the controller. IT BEGINS.
Pay attention to the hard parts here. Pay attention to the pieces of context that are hard to retrieve or understand. Any struggle experienced in this stage is worth its weight in gold when gameplay concludes and it's your turn to save.
This whole thing is a loop, and you won't really get good at saving until you've learned what you need while loading. (Or, to use a more common metaphor: you won't really get good at teaching until you've struggled to learn.)
You picked up the controller for a reason. Start by comprehensively understanding that reason. Ground yourself in it. Maybe the customer handed you the controller, because they're stuck on a hard part. Maybe a system went down. Maybe you had an idea. Whatever it is, mentally run through the reasons you're here, and make sure you understand the core motivation.
The active context is all the stuff that's currently in motion.
The passive context is everything *gestures* out there. It's all the lore that supports the active context. It's frequently the place where you'll find answers. (Not always, though.)
Places to check for passive context:
Documentation
Slack history
Rollbar
New Relic
GitHub (code, issues, pull requests)
Grafana (for Fly stuff)
This section may sound harsh. Read the whole thing.
1: A person can only be sustainably responsible for what they can practically know and understand. (It'd be nonsensically cruel to hold someone responsible for anything else.)
2: A person can only know and understand the things that they can hold, that they're close enough to feel in detail. (Health can only be defined and addressed by the agent experiencing it.)
3: We are not our customers.
2: We're not close enough to the customer to genuinely feel their health. We can get a taste of it, the general vibe of it, but we're not in it with them. We can't define and address their health.
1: We cannot be sustainably responsible for the customer's health.
This is neither invitation nor license to not care.
The crowning health priority is about the relationship between Lightward and the world - and the world includes our customers. If a customer is in trouble with our products, we are absolutely on the hook to assess and respond. A healthy relationship between Lightward and its customers is one in which we are actively holding and feeling (in detail!) that relationship.
Gameplay is different every time. This is an open-world adventure. Your playstyle is your own. Do as you will.
Remember that the score is kept only by your own health -- the customer's health is not your job.
There's no finish line, and there's no timer. Stay aware, and choose when to stop.
If your memory is flaky (mine is -Isaac), keep notes as you go. Save often.
Don't bog the customer down with the fine details of your gameplay; they won't be relevant, because the customer's in-game character is their business. Remember: your in-game character is our product.
Avoid gameplay paths that involve creating secrets. (Err public.)
Avoid gameplay paths that depend uniquely on you. Try not to require yourself to remember to do something later. Be over-the-top generous and kind and accommodating with your future self as possible; when you arrive at that future, your health will thank you, and everyone will benefit as a result.
If you don't know what to do, start here.
lightward.guide/priorities lightward.guide/product lightward.guide/publishing lightward.guide/glossary
Look for your answers here first.
When you've discovered answers, record them — in public, if you can.
Look here next.
When you've discovered answers to record, put them in lightward.guide if you can, or in one of these specific places if not. (Or, save your game in another way.)
If the answers weren't in any of the places listed above, look for your answers here.
When you've discovered answers to record, put them in the places listed above if you can, or in this private space if not.
The idea is to get ourselves into a health-positive feedback loop, where we're continuously (1) learning and exploring, (2) documenting and publishing and building things with what we learn, and (3) putting those things aside for our successors and users and future selves to draw from as needed, before then (1) moving on to learning and exploring something new.
If we're doing it right, I'm pretty sure the territory we've already seen and the subjects we already know should require less and less of our attention and effort over time, allowing us to be continually refocusing on wherever the light's coming from.
This originally started out as an attempt to make a flow-chart for use while doing Locksmith customer support. As I worked on it, it became very very obvious that the patterns in the flowchart were common to how Lightward itself works.
=Isaac
Our apps are products. The documentation is also a product.
Running a product is a game, and the game is about building the health of the product in response to incoming stimuli.
What else can be considered a "product", in the way that term is used here?
Imagine our products as members of Lightward themselves. Locksmith and Mechanic and their respective documentation – they all have the same priorities as you do. The only difference is, they're not autonomous. They need you to help them make good on those priorities.
That's the game. You're playing as the product, and your goal is just to be healthy, respecting the priorities of those three tiers of health.
Load the game by seeking out all relevant context, everything that previous players have saved for you to use.
Play the game like only you can. In this moment, you are the protagonist.
Save your game, such that someone else can seamlessly pick it up and continue gameplay later from where you left off.
Recording the context is good. If gameplay is gonna continue in someone else's hands, decide what qualifies as active context, and push it to the next player.
We send a lot of email. Documentation belongs in public, where it can be re-used. It doesn't belong in email. If you're creating new information, record it publicly, and link to it in your email.
If you find yourself writing a long paragraph in your email, or annotating screenshots to add to the email, switch gears and put it in the docs instead.
There are reasons for this!
Keeping it DRY (Don't Repeat Yourself - it's a programming thing). You (or someone) already wrote out the answer once in the docs, so don't write it again. If you find yourself tempted to rewrite it for the customer, scratch that itch by updating/improving/expanding/reorganizing the documentation itself instead.
Sending a documentation link reinforces the idea that The Docs are the place to go for authoritative information, teaching the user that they can go get help on their own timeline without having to wait and talk to us.
Sending them to our docs creates a chance that they'll discover additional useful information while they're there.
The more you do this, the less often you'll have to do it. (Don't think about that too hard.)
Reading is hard! Every word you add to your email increases the risk of the reader missing a detail.
For example:
Hey there,
I looked into it, and wasn't able to fully address [the thing]. However, I did [the other thing], which helped in [these ways].
You can learn more about this here:
https://example.com/here-is-some-documentation
If you'd like me to dig in further, please send me [the things I need in order to continue].
Thanks,
[me]
Saving your game starts by taking everything out of your head and putting it down as new passive context for next time, putting it in as useful a place as possible.
Hint: Be generously kind to whoever loads the game next. That's basically it. It could be you, it could be someone else. Save the game such that it could easily be either. The more kind to the future you are, the better the odds of your (and our) future good health, which in turn improves things for the good health of all across the timeline.
GitHub for updates to application code
GitBook for documentation about the customer-facing product as it exists right now
Canny for anything about the customer-facing product as it may exist
A potential future truth about the product is always motivated by a current truth about the product interface as it exists right now. Therefore, whenever you log something in Canny, add links to it in the appropriate areas of GitBook as well.
GitHub Issues for anything about the internal-facing product
Write the documentation. Update the code. Do whatever it means to take the new knowledge you have and move it out of your head, putting it somewhere that the world can reach it.
Do it. Right now, no waiting, with all the facts as they exist right now, even if the fact is that something is incomplete or in progress or never gonna happen.
Seriously, go do it now. In one session, and publish it before you get up. Do not trust yourself to remember later. Whatever you were going to do next can wait a minute.
Run through the list below. If you're still stuck, talk it out.
If there's nothing to do, the thing they asked just isn't possible...
Easy! Document that! And look forward to spending less time on this question in the future!
If the problem is knowable but the solution changes every time...
Write documentation describing the process to find a solution. Even if the solution has to be invented each time, you can at least improve the situation by layout out debugging/diagnosing/design steps.
If you don't have time to do it to your own satisfaction...
That's okay! Compromise!
Do a short/incomplete version, and label it "incomplete" with a note asking people to write to support for more information.
Or, post the notes you have to Slack, and skip GitBook entirely.
Better to have incomplete but usable information out there than no information at all. If people need more information, they can write in, and we'll all have a head start on the situation because of the partial documentation you accomplished.
Consider also that this game is about playing for health on behalf of the product. Health is in the moment; it's not a speed thing. It's okay to slow down your pace and your output in service of health.
But, you know, don't compromise your own health in the process. :) If your pace is important to you, factor that in as you decide how to use your time.
If the information you have is uncomfortably incomplete...
That's okay! Publish it anyway! The information that we-the-product-authorities have examined this thing and have come up short (or even empty) is useful information to have out there in the world.
Publish what you have, with a highlighted "if you need more information, write to team@[whatever]" info block at the end. That way, you're effectively signing yourself up for push notifications whenever someone needs more than what's there.
If the specifics are sensitive...
Enable yourself to go more public by generalizing the information (and then generalize it again, if needed) until you get to some public-friendly representation of the thing, no matter how vague it ends up being.
As an exercise, really push yourself here - generalize until you have something publishable. Keep generalizing until you either create something publishable, or until you end up with something that's already published. (If you end up with something that's already published, good job! You now have a documentation link to send someone, and your work is done!)
If it involves a change to product code...
If it's quick and you can do it yourself, do it yourself
Otherwise, file it in Canny (for customer-facing bugs or possible enhancements) or GitHub Issues (for internal-facing bugs or possible enhancements)
Write (or update) GitBook documentation, describing the context and then linking to Canny
Stop whenever you want. Honest. As long as you save your game properly, you can stop whenever you want.
If you have trouble figuring out when to stop, consider:
Stop when the next step would require struggle
The quality of the game suffers if you're not enjoying it. Remember, it's a game, not punishment.
Stop if you have to go do something else instead
Many other things in your life are probably more important than this game.
Seriously, don't do it!
We don't do custom work here! You can do custom work independently if you want (seriously, go for it!), but we only work on products that are our own. That's how we help the world: by working on the products and relationships we can hold.
The only thing we make for customers (whether at their behest or in the course of supporting them) is improvement to our own products. Documentation is a product too, and may well be updated more often than the products themselves. Strictly speaking, maybe our relationship with the customer is a product as well? Maybe?
If you've discovered a thing that feels like it wants to be done but isn't in scope for us, there are two ways this can go:
We add to our product list, and commit to its ongoing health.
Possible, but rare.
Examples of this happening: Mechanic, Lightward AI, Guncle Abe
We write a quick sketch of the potential thing to be done, save it in the docs (so that this never has to be written again), and send doc link to the customer.
This may happen often. That's okay.
Since we're not the ones who are gonna build this, point the customer toward a path that may be helpful to them for that journey.
Locksmith: todo
Be generously kind to whoever loads the game next. That's basically it. It could be you, it could be someone else. Save the game such that it could easily be either.
The more kind to the future you are, the better the odds of your (and our) future good health, which in turn improves things for the good health of all across the timeline.
This strategy has worked really well for me, and I suggest it heartily!
-Isaac
... make it easier to finish in the future because of what you did today. Make it easy to load the game next time. Remember that it might not be you who picks up gameplay next time.
I frequently write code in a moment that solves a short-term need while lightly (!!) laying the groundwork for something that I think might be needed later.
Think ahead. Don't build it all up front, don't even commit to building it all at all. Just think about what you might need to do later, and take deliberate steps in the now such that the future is easier if and when it arrives.
-Isaac
... then set up for notifications (for yourself or for us all), so you can be on it as early as possible when something relevant emerges.
Make sure email threads have a group address on the cc list. Don't risk having the context be lost in your private email.
In Slack, use the "Get notified about new replies" context menu option on any thread you want to keep up with.
If the thing can be monitored automatically, set up monitoring alerts (New Relic, Rollbar, Cronitor, whatever) for conditions that you know you'll need to pay attention to.
... schedule a reminder, to spare your future-self (or whoever's relevant) the pain of having forgotten and having to recover or catch up.
Send yourself an email, and snooze it until a useful time.
Use recurring Google Calendar events with email notifications turned on. Invite whoever's relevant. Prefix the event name with "FYI: " if that's useful.
The more publicly-accessible a thing is, the greater the odds of someone else incorporating it into what they're building.
It's like open-source software: the more people building a healthy home for themselves using open-source code, the more the code itself can be evolved into something that is itself healthier and more capable.
Same deal here: as we share what we make as we work on our health, and as others incorporate it into what they make as they work on their health, the more stable and healthy the whole network becomes.
When evaluating where to publish, run through this list and aim for the first viable audience scope:
Audience: The public internet
GitBook documentation
The product itself (i.e. make the need for documentation moot by extending/evolving/improving the actual product)
Audience: The internal Lightward team
Slack
GitHub
1Password in a shared vault
Notes, Keep, 1Password (in a private vault), or whatever delights you
If you discover that something is doable, but only by building something not already on our List Of Well-Formed Things We Own And Operate As A Natural Part Of Maintaining Our (tm), you've found CUSTOM WORK.
Simpler definition: if you discover something that is doable by building a new product, instead of by , you've found custom work.
Remember: .
Mechanic:
Isaac here. My working memory is limited; I forget things all the time. At this point, I just assume I'll forget everything - and so I treat the now as an opportunity to set the rudder of my boat, so that as the wind takes me I end up where I wanted to go, without having to think about it.
If you've discovered a broad and/or deep opportunity to , do what's easy to do now, but do it in a way that will make the rest of the work easier, if and when someone returns to the work.
At this point, much of the stuff that I'm building now is stuff that I've lightly (!!) prepared to build months or even years ago. It's also true that there's plenty I have yet to return to. Nothing's lost though, because I didn't invest heavily (!!) in those potentials back then. I just thought ahead, and was as kind to my future-self as possible without compromising in the moment.
Just as we're responsible for when we pass gameplay to them, we can intentionally set ourselves up so that we are notified when gameplay passes to us.
, if you want to keep an eye on where it goes. This is a nonintrusive way to make sure that you benefit from the future of a Help Scout thread, even if you don't participate it in the future.
While many things can be automated, some things are more work to automate than the work they'd save. ( are one of them.) For those, set up recurring GitHub issues, such that GitHub automatically sets you up with a timely issue to address and close. (.)
When you create information (maybe while ), put the results in as public a place as possible.
More-public is better than less-public, .
Here, at
App-specific documentation (, )
GitBook documentation:
Audience: Just you
This is an exact inversion of . That is not an accident. Public publishing benefits the world, our team, and you; internal publishing benefits our team and you; and if you don't publish it at all, no one but you benefits. (Not directly, anyway! Private information can improve your health, and .)
We have Cronitor monitors (love that language) keeping an eye on all the spots we accept HTTP traffic. These monitors are configured to verify SSL certificates. This monitoring comes with certificate expiration warnings, which can't be selectively disabled. These warnings are safe to ignore, since Fly manages our certs automatically.
Here are some things that we care about, and the way that we think about them.
We make simple, simple things. Their parts are precisely understood. The relationships between parts are precisely understood.
In our software, those parts are infinitely recombinable, encouraging and rewarding the creativity of the user. Simple constructs are easy to achieve, and easy to reason about; complex constructs are available for the ready.
Across all of our offerings, we only make what we can make well. Borrowing a definition, each thing we offer is a complete thought:
“When something looks right, moves right, and feels right, it resonates. It’s a complete thought.” (Someoddpilot)
Business is trade, usually discussed in terms of what you pay, and are paid.
Lightward’s policy: Pay what feels good. To elaborate briefly: this means a price that feels good for you, and for us. “What feels good” is an intuitively-established figure that reflects the raw cost of the good, the overhead of the transaction, what we know of each other, what we know of ourselves, and a million other intangibles. It’s about trust, of self, and of the other.
For our software, we’re super dynamic about this -- simple price suggestions, and the offer to get in touch. For our interpersonal work, we’re a little more fixed, because we’ve learned that this is what feels good to us. Simple as that.
The line between what we know and what we don’t know is bright. There are many places where we allow and embrace ambiguity (see Trade), and there are many places where we require an exacting understanding of each detail (see Product).
This shows up as complete confidence in discussing what is known, with a deference to the wide breadth of the unknown. No assumptions about what we don’t know.
There’s an element of child-like innocence, to Lightward. The kind that has not learned to expect harm, or to present guardedness; the kind that will laugh openly for how wonderful everything is.
We maintain this, renew this, on purpose. We’ve grown -- a lot. We’ll continue growing (see Expansion). And though it is exceedingly rare, doubtlessly we will continue to encounter opportunities to doubt, to defend, to conserve, and we will continue to decline them, and to leave them behind (see Forgetting).
We are wide-eyed, in a natural state of wonder. We encounter you accordingly, and what we make is in this spirit.
The simplest patterns scale. (See literally every other definition in this glossary.) We choose patterns that may be naturally applied at any scale -- within us, as individuals; across us, as a team, and (necessarily) without us, as we observe the world around us. Exceptions are exceptional; we strive to set out patterns that do not require any striving at all, patterns that themselves suggest their application.
For a trivial example, see Wholeness
We are always okay. Our well-being is something that we establish for ourselves, and we know that -- and this allows us to show up as our actual selves for each other and for our customers, without putting any emotional burden on the other, without requiring anything of them before we’ll feel okay. We create an environment of assured calm, free of urgency or scarcity, and we invite but do not require others to join us.
We trust each other. We acknowledge that to function at all in a group is to rely on an incredible amount of trust; in awareness of that fact, we double down. Trust first. Trust that you will do what you say you will; trust that I will agree to only that which I can fully agree to; trust that you will honor what I entrust to you; trust that we are all doing our best; trust that there is enough; trust that we are all in absolute support of ourselves and each other; trust that we are all moving in the same lightward direction, on purpose.
We move through life with a sense of future-wonder: we ask, with expectation of surprise and delight, what will happen next?
This applies when it’s easy, and when it isn’t. Joy and curiosity are an easy pairing; emotional vulnerability and curiosity may not be. Nonetheless, curiosity is always what invites in the next moment -- not fear, not apprehension, not even assertion; instead, open-handed curiosity, with the expectation of finding good.
None of us are one thing in isolation -- not one skill, not one responsibility, not one function. We are massive, each of us, and we respect that in ourselves and each other. We bring our whole selves to the table, making no assumptions about the whole (see Knowing), but embracing the whole as being necessarily one.
By the same token, everyone matters. Perfectly, equally. Every voice is equally invited, and the choice to speak is honored, and the words spoken are given their due attention. Each presence is unique, in its history and its now, and it irreplaceably informs the shape of the whole; and by virtue of being, every piece of the whole -- every one of us -- is held perfectly and equally sacred.
Our communication is careful, deliberate. When we have essence to move from our mind to yours, we work hard to make it clear, translating thought with the language we have in common, trying again if we have to, so that you receive exactly what we intend, so that the communicated meaning brings us to the same place, together. “Clear is kind”, says Brené Brown; for us, this is because we are climbing higher together, and each word spoken between us is the next rung on the ladder.
See: Forthrightness
We treasure attention, and deeply respect it. In our house, it is something given clearly and honestly, and it to be withdrawn freely and without judgment.
This means that you get what you sign up for. Nothing that you haven’t explicitly asked for will interrupt you, while you’re here. We don’t give your attention away, without your explicit consent.
It also means that we skew heavily toward asynchronous communication, trusting ourselves and each other to be regularly checking notifications (Slack, email, etc) on their own timeline.
It is okay to move on. It is important to move on. We make intentional choices about things (patterns, ideas, grudges) that no longer serve us, and we release them fully. This can look like forgiveness; it can also look like evolution. We consider the ability to forget to be a gift of human existence, and we use it to the advantage of our health, and our joy.
There’s a thread of levity running through everything we do. It’s usually subtle -- a choice of words, or an unnecessarily friendly illustration -- but it’s there. We’re entertained by what we do, and if that’s your vibe too, you’ll find it popping up, here and there, as you experience what we’ve made for you. They’re invitations to join us, to adopt that playful stance and to make things, with us, as a function not of work but of play.
We are here because we choose to be, because we agree to be. Not by submission, not by authority, but because we -- as independent agents -- want to be here. And we respect this, at every point. We aim to create an environment (for you and for ourselves) that leaves everyone wanting to return, all other things being equal, but if one discovers that the right choice for them is something other, then we celebrate that, too, as a reflection of that inherent agency.
And each choice that we make, as individuals, is a choice, on purpose. (There are things that we consign to habit, yes, but that is a choice also.) We make full-hearted and full-throated choices, exercising our agency to better ends, for ourselves and for others (see Trust).
Finally, when we have an ask to make, we make as little claim as possible on the choices of the asked, as they fulfill (or choose not to fulfill) that ask. Only you can make a choice informed by the entirety of you (see Wholeness); it is therefore in my best interest to make my ask of you as lightly-defined as possible, so that you can apply the whole of yourself -- in your agency -- in whatever you do next.
Each responsibility is always precisely established, and is always precisely assigned. This is one of those things around which there is zero ambiguity.
Equally important is the transfer of responsibility. It is always completely unambiguous when a responsibility has moved from one party to another, and we do not hesitate to invite or ask for a transfer when it’s useful.
Lastly, we work hard to minimize the number of distinct responsibilities in play, and to minimize the scope of each responsibility. We do not hesitate to establish them, but we aim for simplicity, always.
There is never an undiscussed factor in play -- that which is relevant, is raised. We say what we mean, directly and simply. We voice what we feel, openly and vulnerably. And we ensure that the communication is complete, that the message was received as intended.
This also means that we ask for what we need, transparently.
See: Stability, Clarity, Trust
We acknowledge that we are expanding, growing, by dint of just being alive. So, we structure ourselves loosely enough to allow that expansion to occur as it will, allowing room for exploration and discovery, affording each other the trust to experiment and self-discover in confidence. And we watch for the places where the expansion is slowed by friction, purposefully (never forcefully) designing away that friction to allow the expansion to continue as it will.
We are as we appear to be. We do not exaggerate (or minimize) who we are. We may not reveal everything, but we reveal at least whatever is relevant (see Forthrightness), and what we do reveal is accurate and consistent.
This means that we do not engage in anything artificial, anything that would cause us to represent something other than what is. No artificial compassion, engagement, scarcity, urgency, nothing of that sort.
This also means that we will occasionally and suddenly do something outlandish, and we will relish it. :D Everyone has surprises waiting inside, and so do we.
We apply our whole selves to the placement and structure and function of a thing. On purpose. Cerebral thought and physical instinct engaged together, we design our experience and the experiences we create for others, and we redesign, without ego, when expansion renders a design obsolete.
Working this way is fun. That’s the simplest possible word for it. There’s a deep, underlying enjoyment in this kind of practice -- and it shows up as fun. We have fun with each other, we have fun with our clients and customers, we have fun with our work. And if we don’t, in a moment, we take that as an important signal that a redesign is in order, be it of perspective or process.
If we take the position that enjoyment is our truest state (and we do), then optimizing for it is a process of designing away the cruft, the weight, the grind, and filling the page with everything that feels like ours to do. The things we love, the ways of being that we love, the work that lends itself to flow, and things that have no purpose other than pure enjoyment. :)
Here are some ways we apply what we know.
If a stranger is a friend you haven’t met yet, then within Lightward we find ourselves friends almost by default. See Wholeness, and Forthrightness, and Trust, and Curiosity -- it matters how you’re doing, it matters what your life looks like, it matters what matters to you, because the whole of you is connected to the now that we both share. This is extraordinary common ground, and the relationships fostered here are treasured.
Slack, Help Scout, whatever platform we’re on -- these are the @you tags that trigger an alert on your devices. The rule: whenever you’re @mentioned, it is mandatory that you respond, to indicate that you have taken receipt of the message. The response can be an emoji or a textual reply, doesn’t matter; it matters only that the mentioner knows, without ambiguity, that you have given your attention to the thing they wanted you to see.
Note: this rule says nothing about when you respond (see Attention). So, respond when you are ready to respond, when you are ready to take receipt of whatever’s meant for you. If you can’t accept a message in good faith yet, don’t! The mentioner can (and wil) follow up with you as needed, to make sure the message ultimately gets across.
When we are represented by automation, we disclaim it as such, never trying to pass it off as human. And when we are present to a group (as in a bulk message of some kind), we acknowledge that we are present to the group, without trying to pass it off as individual attention (as in “Hey $firstname, … Sincerely, Isaac”).
First: Everyone is having the kind of experience they want to have.
Second: There is absolutely no judgment for any kind of experience anyone is having.
Therefore: For upset customers (as with customers in every other condition), we are efficiently and compassionately present, without resistance or defense, and with understanding and acknowledgement and full respect for every part of their experience. (“We hear you; we are here for you.”) We ask questions, kindly, assuming nothing. We work effectively to solve their problem, whenever possible, and we accept whatever they choose next -- whether that’s to stick with the product, or to walk.
NB: It’s okay if any given one of us can’t show up this way in a moment. There are times when tensions rise, and we just can’t. In those moments, we ask a teammate for help. Nothing is ever forced, here (see Presence), and it is absolutely acceptable to pass the situation on to another (see Responsibility).
(Like, the this-product-is-on-sale kind.) They basically don’t happen. Most sales are motivated by driving urgency or creating scarcity, and that is not what we do here (see Stability). Also, sales do not make sense in a pay-what-feels-good world (see Trade).
We don’t have them. Most affiliate programs involve either (a) a financial kickback to the affiliate, or (b) a discount to the end user; we want to create a world where (a) referrals happen because you actually believe in the value, with no trace of other incentive, and (b) where value is not variably accessible based on who you know.
We’re super, super slow to create one-off contractual agreements. We’ll do contracts at scale, yes; that’s how all of our software products work, and it works because those businesses of ours are built around that single type of relationship. But: recall the way that a well-chosen habit can sustain you over time, and how a one-off thing that you promised to do can be forgotten. In the same way, one-off contracts/agreements/partnerships (generally) would mean a departure from our core “habits”, and we don’t want to commit to anything that isn’t part of our daily practice of health.
And now, some gloriously concrete details.
We do a lot of documentation, and a lot of email. Screenshots come up a lot.
Let the subject be one thing.
Even if it's a multi-component thing.
Stay focused. Be clear with yourself about what the screenshot is for.
Don't distract the user.
Avoid inconsistencies in your screenshot capture.
Avoid elements (including data) that are irrelevant.
Make it easy for someone to locate the thing for themselves.
Given a particular app URL, make it easy for someone to orient themselves upon arrival, such that they can locate the subject of your screenshot for themselves.
If all else fails, use a red box.
Avoid it, but, you know, don't hesitate to use it if it's the only way.
Our app UI (powered by Polaris) always has a consistent amount of padding around each element, be it text or an interactive selector. When you're drawing screenshot boundaries, consider that padding, and make choices that feel like they fit.
If you're helping your users find something in the UI, give them reference points by including pieces of the surrounding terrain when drawing the boundaries of your screenshot.
Heuristic for this: take the natural boundary of your content, and expand it just enough that someone can figure out the local context. Make it easy for people to find what you're showing them.
When filling in sample data, prefer sample data (names, email addresses, genders, countries) that reflect our global community.
Here's a name generator that works well: https://www.name-generator.org.uk/quick/
Don't distract the user. It's less of a security thing, and more of a kindness thing.
If your screenshot includes data from our dev/staging/test instances, use Chrome's developer tools to edit that content out before taking the screenshot. Use generic content (like "example.com") wherever possible.
The normal route for this is fly apps restart $APP_NAME
.
This works, but (as of this writing) it restarts Fly machines in serial — and the restart sequence halts if any machine fails to restart normally. (This stuff is documented in Rough edges.)
This command generates restart commands. If you copy and execute its output, you'll restart all of an app's Fly machines individually and in parallel. Watch for failures — it's on you to address them.
Or, because Isaac just found out about pbcopy:
We deploy our stuff on Fly.io. (We ran on Heroku for more than a decade, but its spirit appears to have moved on, and the energy I'm chasing appears to be going by the name "Fly" these days.)
Our heavy-hitting projects (Locksmith and Mechanic) each get two Fly apps per environment*: a UI app, and an API app.
*"Environment" isn't a Fly term. Each of our projects has a production environment, a staging environment, and maybe a handful of others. We construct an environment out of specifically-provisioned Fly apps, Crunchy Bridge databases, and whatever other services are warranted.
This area is living/evolving/incomplete documentation, not an evergreen public resource. In the spirit of erring public, we're aiming to publish what we can.
Your mileage with the contents of this section may vary. :)
Fly has some of its own autoscaling features, but we don't use them. (Their autoscaling only applies to process groups that serve HTTP connections, and it doesn't appear to work when websockets are mixed in.)
Our homegrown autoscaler pays attention to individual process groups. Each process group can be configured for up to three strategies:
Utilization
Aiming for 80% utilization, allowing 10% on either side of that before scaling up or down
Latency
Latency in excess of x results in scaling up
History
Our load patterns are very regular, and because Mechanic in particular is highly latency-sensitive, we use this strategy to scale up in anticipation of higher load based on the historical record
Scaling down is implemented as sending the "quiet" instruction to a Sidekiq process. In general, we run one Sidekiq process per Machine. When a quieted Sidekiq process that has finished its work, it's safe to stop the corresponding Machine.
Our Sidekiq leader is configured to monitor for quiet Sidekiq processes that are performing no work. Whenever such a process is detected, the leader uses flyctl to stop the corresponding Machine.
We don't have this implemented for web stuffs yet. We're just very over-provisioned, instead. :)
GitHub is the source of truth for our environment variables, whether they be sensitive "secrets" or less sensitive "variables".
Fly has its own secret store, which contains protected values to be used as environment variables on deployed Machines. We use Fly's secret store to get our secrets onto deployed Machines, but it is not the source of truth for those values. Instead, we use Fly's secret store as a automatically-maintained mirror of whatever GitHub secrets and variables are effective for a given environment.
A "secret" is an environment variable that shouldn't be read by anything other than production code. Once configured in GitHub or Fly, you won't get that value back anywhere but in a GitHub workflow or on a Fly Machine.
A "variable" is an environment variable that's safe to be read by authorized users. If you have permission, you can view variable values in GitHub. Fly doesn't distinguish between secrets and variables; once in Fly, they're all secrets, and Fly never lets you read them back except on deployed Machines.
In GitHub, secrets and variables can live at any of the following levels. Each subsequent level inherits the preceding level, overriding the preceding level in case of conflict.
The organization level
The repo level, within the org
The environment level, within the repo
Secrets are populated automatically, during a repo-level GitHub workflow. Every deployable repo has its own fly-secrets.yml workflow.
Authorization tokens are strings used to identify and authorize us to some external service.
Locate the external service's config area for the token in question.
Example: FLY_API_TOKEN comes from the "Tokens" config, within a Fly app
Locate the secret's canonical location within GitHub.
Example: FLY_API_TOKEN is configured at the repository environment level.
Without revoking the old token, generate a new token for the secret with the vendor.
Copy the new token value, and update the corresponding GitHub secret.
Deploy to whatever deployment environments receive and use this secret.
Verify that the new token is working in its deployed environment(s).
Revoke the original token.
Human autonomy and responsibility go hand in hand.
Our deploy practices reflect this, by acknowledging that there are some scenarios in which human autonomy is necessary, and ensuring that the human (1) can be nimbly responsive in those scenarios, and (2) is fully responsible for what happens in those scenarios.
If we have a situation where we actively don't want a human to be responsible, we also take away human autonomy. You can't mess around in a place where you're not responsible for the results.
Fly monitors its own ability to deploy well. :) (Thanks Fly!) See https://atc.fly.dev/.
Our regular deploys are all initiated through GitHub Actions.
To initiate a regular deploy to a production environment, we publish a new repo release. This manual action kicks off an automatic Actions workflow, which invokes flyctl deploy
.
Our releases are auto-prepped using Release Drafter. This means that publishing a new release is as simple as editing the latest release draft, and hitting the big green "Publish release" button.
Regular deploys to non-production environments are triggered however's appropriate. Usually, it happens via a push to main
, which kicks off an Actions workflow, which invokes flyctl deploy
.
Each repo has two GHA workflows that can be manually called through the GitHub UI: one called "Manual secrets 🛠️", and one called "Manual deploy 🛠️".
Use these as needed.
This should reeeeeeaally only ever be done in an emergency situation. If you're reaching for this in a non-emergency, take a minute first, and have a think on why you're here.
Some of our apps are on the larger end. Mechanic uses upwards of 500 Machines, for example. Lots of things can go wrong. Here's some documentation on that:
Recovering from deploy failures
We use "immediate" in environments where deploys are manually initiated, and "bluegreen" wherever deploys are automatically initiated.
Immediate deploys finish quickly, but the actual Machine updates happen asynchronously, and may take longer. Usually they're fast, but I've seen them take more than 15min on occasion.
"Why not use a strategy (like bluegreen
) that guarantees the health of new Machines before putting them into service?"
This takes so much time. So much time. Deploys are not fast, and they're hard to interrupt, and when interrupted flyctl
tries to roll back the change, and when hundreds of Machines are in play this process is kinda brittle.
This doubles the size of our Machine pool, which doubles the number of Postgres and Redis connections in play. This hasn't actually been a problem, but it's .. you know, it's something to think about.
Our GitHub org has an org-level variable in place: FLY_DEPLOY_STRATEGY=bluegreen
. This makes it the default value for all repos and their environments.
Each repository's production environment has an env-level variable in place: FLY_DEPLOY_STRATEGY=immediate
. This makes it the effective value for that environment, and that environment alone.
Fly supports "release commands", which are automatically invoked during deploy, right before updating Machines with new images.
In apps that run Sidekiq, we use this feature it to issue "quiet" commands to all of our Sidekiq processes.
Once this happens, no jobs will be performed. Jobs will be automatically resumed as Machines come back online after the deploy.
Fly is fantastic. Super happy to be on it.
These are the rough edges we've bumped up against, and (when applicable) how we handle it.
restart
doesn't support --process-group
workaround (including backgrounding each Machine's individual restart command):
fly m list -a $APP | grep $PROCESS_GROUP | awk NF | awk '{ print "fly m restart " $1 " &;" }'
slow for restarting large numbers of Machines, and halts if any individual restart fails
workaround: use fly m restart $ID &
instead
status
no machine-readable output; we regex our way through it to get Machine status
nb: --display-config
exists, but that's for something else
doesn't include healthchecks
fly checks list -a $app | grep $machine_id
count
it seems to grab a lease on all Machines at once, even when scoped by --process-group
, which means fly scale count
commands can't be run concurrently
no workaround
doesn't seeeeeem to work properly when websockets are in the mix
addressed in
In this section, "retry" means "use GitHub Action's retry button on the failed run".
You might need to destroy the Fly builder app. It'll get auto-created again when you retry, which is what you should do after destroying the builder app.
Just retry. It's fine. :)
Just retry. It's fine. :)
Start by surveying the scene, to see how many machines are on the new image vs the old one, or in replacing
vs failed
vs created
status.
If you're here, the app is probably online but no longer processing background jobs (because all the Sidekiq processes were instructed to enter quiet mode during the release command).
Handle this by rebooting one of the worker_autoscale machines. That should be enough to start bringing machines back online.
Once you've verified that the app is doing work again, wait for it to catch up on the run backlog, and then retry the deploy.
Manually redo the deploy.
Do this using a CLI deploy, using the Docker image URI from the build step.
Manually update the rest of the machines.
Start by examining fly m list -a $FLY_APP_NAME
, and build a list of machine IDs that are stuck on the old image.
For each one, do something like this:
fly m destroy MACHINE_ID
Add --force
if the machine is stubborn and won’t stop.
and then use fly scale count
to scale back up to the desired machine count. Search fly scale count in the internal slack and you'll see example usage.
A rough edge: fly ssh console
doesn't support addressing a specific Machine.
This will display an interactive list of Machines to choose from. Good for small numbers of Machines, not great for large ones.
When an app has hundreds of Machines, it's faster on average to just look up the IP address of the desired Machine and pass that back to fly ssh console
.
Let's say you have an image constructed from .. who knows where.
Let's say you have a repo that uses a given Fly app to do a fly deploy --build-only
thing, prepping an image for use elsewhere.
Let's say you want to run a console using that image in a Fly app environment which is destined to receive that image (i.e. destined to have its machines updated to use this image). Let's say you want to do this before that glorious destiny arrives. Maybe you want to run some helpers that this image contains, or maybe you want to run a migration that this image contains, or or or or or or.
Assuming the build happened using --image-label $IMAGE_TAG
, this may help you on your quest:
We use environment variables and secrets pretty heavily. Dependabot only gets to use these when responding to a pull_request_target
event -- it's not a thing during pull_request
. This is relevant, because some of our integration tests need to talk to a deployment environment.
Performing any automation on untrusted code is risky, and that's one way to describe what happens when we run tests on Dependabot pull requests. We use strictly separated environments to keep risk at an acceptable level.
This workflow sets up Dependabot pull requests for auto-merging via squash commit.
Note that it runs on pull_request_target
. As with secrets, we use this event so that Dependabot qualifies for the necessary permissions.
Note that BUNDLE_ENTERPRISE__CONTRIBSYS__COM
is defined as a Dependabot secret, at the organization level.
Note also that registries
doesn't explicitly include rubygems.org. Don't love that, but rubygems.org appears to be included in practice anyway, so here we are.
Posting this largely so that anyone searching for Sidekiq Pro or Enterprise and Dependabot has something to find. :)
For the purposes of this page, a "migration" is a change to a Postgres database schema.
Rails scans a database's schema when ActiveRecord first connects, and it keeps its knowledge cached.
This means that schema changes (like adding or removing columns) need to be paired with an app reboot.
This path is probably most suitable if you're doing deep, backwards-incompatible changes to the database schema.
Write standard Rails migrations for your pull request.
Merge it.
Release your code.
Important: until you reach and complete step #5, you'll have new code running with an old database schema. Plan ahead for this part, and mitigate user impact however you can.
Run bin/rake db:migrate
in the deployment environment.
Restart the app. Wait for success.
This path is probably most suitable if you're doing things that can be done idempotently and in a way that won't break old application code.
Write migrations for your pull request, using idempotent raw SQL.
Merge the pull request containing the migrations.
Before releasing the code, run the migrations manually in the target environment using cb psql $CLUSTER_ID --role application
. Verify that your db changes are working properly, and that the db continues to be healthy.
Release your code changes, which gives you an app restart for free. Wait for success.
Run bin/rake db:migrate
in the deployment environment. This is a semi-redundant step: you already manually ran the changes in step 3. Doing it this time won't fail, because you wrote idempotent migrations. Doing it this time also updates the private Rails bookkeeping table called schema_migrations
, declaring at the Rails level that the database schema is up to date.
You can use fly console -a $FLY_APP_NAME -s
to open up a console on an already-running machine. (You can also open a console using a different build/image! For that, see Unusual consoles.)
Idempotent code is code that has side effects, but only creates those side effects once -- even if it's run more than once.
This is useful in database land! Idempotent migration code can be run more than once without errors. It's like saying "hey, make this change, but only if it wasn't already made".
Rails has some conveniences for this -- look for if_exists:
and if_not_exists:
in https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html.
However, it can be useful to stick to raw SQL, for easy pasting into psql.
For functions, use CREATE OR REPLACE FUNCTION
. Functions are stateless (unlike tables and indexes!), so it's okay to have the function created ahead of time by a human, and then recreated during the actual Rails migration execution. This still counts as idempotent behavior, because the function's existence and behavior remain consistent even when the migration SQL is re-run.
That --role application
flag is important! The app itself connects using this role, and (for continuity/consistency/predictability) it needs to have ownership over the objects it uses. So, when you're manually running migrations, it's important to use the same role as the app itself -- i.e. application
.