Friday, September 13, 2019

A Structured RFC Process by Phil Calçado

When I wrote Technical design: whether, who, how, and what it was partly because I haven't seen a lot of guides of this sort. I'm pleased to say that Phil Calçado has offered a similar how-to at A Structured RFC Process.

Some of the key similarities between A Structured RFC Process and my post are: (1) a discussion of who should be involved, what kinds of topics need this sort of process, what to include in such a design, and what it looks like to solicit and get feedback, (2) the focus is on the higher level or more important aspects, not specifying every detail, (3) the emphasis is on feedback and discussion, not on formal sign-offs, budgets, or other things which might be worth nailing down, but not in this sort of design, (4) relatedly, documents (and other artifacts like presentation slides or videos) produced during technical design have a relatively short half-life. They sometimes can be helpful well into the future but that's not their main purpose. Their main purpose is is to flesh out a change and organize writing code and whatever documents you have to describe "this is the current state of our system" (API documentation, wiki pages, or whatever you find useful). As Calçado says, "once an RFC moves away from Feedback requested, it is considered a historical artifact".

A few differences between my post and A Structured RFC Process are: (1) although I started thinking of a document with comments (as described in A Structured RFC Process), as I wrote I realized that most of what I was saying also applied to hallway conversations, presentations, or other modes of communication, (2) I include a list of technical issues you might want to address (or might not).

One interesting observation was "It is not uncommon for engineers to try and use the process as a way to sell an idea that hasn’t been approved by their stakeholders or managers" which I certainly have seen. Depending on how much of a power vacuum we are talking about (or, relatedly, lack of clear priorities), this could be a large or a small problem, but approaching design deliberately is not a substitute for making choices. It is at best a way to help clarify what choices the organization is facing.

And my favorite quote from the whole article is "The more polished a document looks, the softer and less impactful reviews tend to be". I love this. Not only does it match suggestions I've heard in other realms (for example, "to get good feedback on a user interface, show someone a napkin sketch, not a pixel-perfect mockup"), but it helps clarify one of the reasons why I've not always seen good results from highly formalized documents written in very structured and detailed ways. Not only is the content of such documents sometimes buried in a lot of boilerplate and irrelevance, but the very form discourages the kind of engagement which would make them seeds for raising issues which might be otherwise missed.

Tuesday, June 18, 2019

Technical design: whether, who, how, and what

There is a (lightly edited) 2023 revision to this article at https://www.blogger.com/blog/post/edit/37120426/3347101560781329801 which replaces this version.

This is all on one page; it was originally published 20 Aug 2018 as a four part series: 1 2 3 4

Do I need a technical design?


In agile software development, there is architecture (decisions that are hard to change) and incremental design. Architecture, in this sense, is a pretty small number of things—programming language and probably application frameworks and data storage. Incremental design is the norm: we add classes, endpoints, and database tables as we identify a need for them, or remove them as they are unneeded or replaced.

But what about decisions in between these two extremes? For example, it used to be that users all signed up for the website as individuals, and now there is a need for some kind of organization which can manage the users under it. Or we used to have a bunch of separate products with their own logins, apps, and management and now there is a need to do some or all of those things in ways which apply to all products. Or our application used to assume that all users needed to be connected to the internet at all times, and now we want to build in offline operation.

I won’t completely rule out handling larger changes via the usual communication of incremental development—pair programming, discussion of individual stories, pull request review, and the like. But it can be hard to maintain a clear idea of the larger design that way, and I have usually been happier with a discussion which happens at a higher level and whose goal is to get a direction into which we can fit in the smaller decisions that we will make as we go.

I’ll write more later about who should drive this process, how to develop such a design, and what is worth writing down and communicating. But I’ll conclude this introductory post by asking when we should be doing this design.

It is tempting to say that the high level design of a system must happen before we can start breaking down the work or implementing pieces of it. Which sounds good, and is nice when it works out, but I have yet to see a design of this sort which does not get changed during implementation. There’s a lot of reality check (interactions with existing functionality, feedback which we only get when we have an early version to show, complications which we didn’t notice at first). Therefore I wouldn’t try to finalize the design before we start acting on it. And I wouldn’t go to the other extreme—of trying to make major changes in a fully incremental way and doing all the communication after the fact. My preference is to start with rough ideas and conversations about the design, and as those get refined and conversations continue, there is a point where the general contours start falling into place. That’s about when I start implementation. I want at least some of the coding to be happening (even if we know we might be revising it later), because otherwise I don’t really trust the design. In parallel, I’m stepping up the communication (documents, meetings, etc). As things fall into place (which may include allocating people’s time, agreeing on technical or business decisions, and getting a clearer picture of implementation choices), you’ll fall into the rhythm of building the thing, because the general contours of what you are building have been established by this point.

Who drives a technical design?


So we have a problem which is meaty enough that we don’t think we want to approach it in a purely tactical way, and we’ll even assume we have defined at least the general outlines of what we want this design to accomplish. Who should turn this into a design detailed enough to implement?

Before I discuss who, let me say this is an intrinsically messy process. There are a bunch of things we want out of our design. Things to do now or save for another day. People (in various roles) with opinions (either because, well, people have opinions, or more nobly, because they have a specific organizational goal they are trying to achieve). See for example Gregor Hohpe’s The Architect Elevator. Issues like reliability, security, accessibility, and branding. A large design space (a distinguishing character of software being its malleability—or at least potential for malleability). Pros and cons for pretty much every aspect.

If that seems daunting, don’t despair. Just don’t be surprised if a decision which was discussed at length, carefully considered, agreed by all, and signed off subsequently starts to seem less settled. Or someone who you had thought was aware of what was going on suddenly “discovers” your design and has suggestions. Or your scope seems to keep expanding or contracting.

The most important person in this process is the one who is refining the design and who will be involved in implementing it. We can call them the “responsible” person (although don’t think of the roles too rigidly—I did say this process tends to be on the messy side, didn’t I?). To do all these things, and have time for this design, the responsible person needs to be able to focus on this (usually, this means they aren’t a manager).

But that person can’t produce a good design by sitting in a room and thinking hard (if for no other reason, because getting buy-in is a key part of what will make this design get implemented and achieve its goals). Therefore their main activity is going to be communication. I’ll have a separate post about how to communicate and what to communicate, but in the context of “who”, identify who should be “consulted”. That is, who needs to be aware of the design and would have good ideas about how to do it. Broadcasting what you are doing and inviting input works well, but I’d also directly seek out the people who will be most knowledgeable or important.

One rule of thumb for involving a lot of people is “accept input widely, accept direction narrowly”. You want to hear from as many perspectives as you can. Whether or not you take the advice, thank people and appreciate that they took the time to engage with you. These will be the people who help communicate the changes you are making.

Saying “accept direction narrowly” raises the question of who ultimately will be deciding. This role is generally called the “approver” and will often be the manager of the responsible person (the details will depend on your organization, though). Sign-offs are a good way of formalizing decisions already made and making sure that there is sufficient buy-in throughout the organization. They aren’t good at exploring different possible solutions or weighing pros and cons, so think of formal sign-off type processes (if you have them) as a way of ratifying what is already understood, not as a way of hashing out agreements.

Lastly we have people who aren’t necessarily providing input but who should be “informed” about the design. The basic goal here is to cast as wide a net as feasible (in accordance with “err on the side of overcommunicating” which tends to be good advice especially in larger organizations). Think of ways to reach a variety of audiences: different levels of detail, different ways of presenting the work (for example, it can work to have one document which is technical and one which is more about the business goals and rationales—as long as they are reasonably in sync on topics such as what is in or out of scope), or different places you can announce what you are doing and offer to answer questions or sync up with interested parties.

Describing the responsible, approver, consulted, and informed roles makes it clear that communication is central to the process of making technical decisions and being ready to put them into practice. The next two parts of this series will be about how to communicate, and what topics to include in that communication.

How do I develop and promote my technical design?

In the first two parts of this series we figured out we needed some kind of technical design, and we figured out who should be making that happen. How does the responsible party get this thing going? Do you call a meeting? Write something up?

Typing “useless meeting” into an internet search engine and reading the results should be enough to give us pause about calling a meeting to hash out our technical design. Yet in so many organizations the meeting is the mechanism by which attention is allocated, or is otherwise necessary. So first, what are the pitfalls? The usual risk of a meeting turning into (too much of) an open ended discussion is exacerbated by the large design space and many stakeholders. Another sign that meeting discussion is a bad idea is if the wrong people are there: don’t hesitate to say “can the three of us (less than the whole meeting) have a break-out on this topic after the meeting?” or “would you be willing to talk to X (who is not present) and bring the information back?” Set your goals, such as (1) make a brief announcement about what is underway and how people can get more details or engage further, (2) present your design to date and solicit clarifying questions, or (3) give people an opportunity to raise concerns to be addressed in the future. Or if you do want a longer discussion, set the topic, keep an eye on the clock, and don’t be afraid to steer the group back to the agenda. Also, aim for a level of detail appropriate for the people in the meeting. Software developers may be most interested in database schemas and code organization, infrastructure engineers may be most interested in reliability, security or how your design is spread across various machines, product may be most interested in what functionality your design will or will not unlock, and so on.

I’ve often had good luck circulating the design in document form. People have something to react to and can leave comments on the document itself or in other ways. So is this a Big Design Up Front? Not exactly. I’m aiming for something closer to a High Level Design Written As We Need It. It is at a higher level than code. It is at a higher level than detailed descriptions of functionality (click on button X and see the following fields with the following error conditions). It might contain things like database schemas or protocol specifications, although sometimes even that can be a bit fine grained.

What is a design document for? First of all, as a communication tool. Secondly, to clarify the thinking of the person writing it. What about things like traceability between requirements and implementation, justifying the need for making a change, or documenting what has been changed? I would tend to think of those kinds of documents (how many you need will vary depending on your situation) as separate. The design doc is written and revised as you are thinking something through and figuring it out. More concrete documents (including breakout into tasks, specifying behaviors in detail, or explaining code details), have a greater need for detail and precision and are the output of the design process, although of course the design document can link to them as they are created. Seeing the design document as a communication tool helps focus the process of writing it. Imagine that it is a conference talk and you are trying to figure out who is the audience and what they would want to know about your design.

Expect to iterate on the design. Gather some ideas. Think about them and boil them down to a proposed design. Talk to people one on one. Circulate it in writing. Figure out how else to get it out there. That will generate ideas and reactions. Figure out what to revise based on that. Expect to repeat this process until there is a sufficient degree of convergence on a course of action. Don’t fall into either the extreme of spending all your time talking to people (and not getting around to taking in what they said, researching things as needed, and making some decisions), or the other extreme, of thinking through something and coming up with something which makes sense to you, but which may lack buy-in from other people or may miss important requirements.

So we are developing our design and communicating in diverse ways (presentations, written documents, informal discussions, and yes maybe even meetings). But what topics should we cover? That will be the subject of the last post of this series.

What goes into a technical design?

In the first three posts of this series we decided we need a technical design, figured out who would be doing it, and how we’ll be sending it out and getting input. But what is the content of that communication (for example, what sections would we put into a written design document)?

What to include will vary depending on your organization and the needs of a particular design. For an early stage startup, anything relating to scaling and operations may take a back seat to “am I building something people want and how can I most quickly validate my hypothesis?”. For a company in a highly regulated space, there may be a lot of requirements specific to your field.

The same applies to an individual design. Does my design concern a server with a high or low need to be available? Does my design concern data which is sensitive? Does this design change anything related to this topic? (If not there’s probably little to say on the subject). For that reason, I’d suggest treating templates (including this article) as guidelines, and omitting sections which don’t seem relevant. One of the fastest ways to lose an audience is to include a bunch of material that you aren’t very interested in (and probably didn’t do a very good job with). And of course to prioritize everything is to prioritize nothing, a good motto in a variety of contexts.

So, what might we include?

Goals and non-goals
These are perhaps the most important sections. If you can figure out what your design achieves and what you are leaving for another day or deciding is not worth doing, you are well down the path of figuring out how to do it.
Description of the proposed solution
What changes will we make to code, data, networks, and hardware? How does this design achieve the goals? Give enough detail that people can see some of the implications of various choices, but try to avoid the kinds of details which can easily be fleshed out during implementation.
Security
What data is stored and sent where? How is access controlled? If cryptography is involved, how are keys managed and have we chosen appropriate algorithms? Are some parts of the system isolated from others and if so how?
Reliability
Is there redundancy? What are the consequences of network outages? If data is stored in a master-slave setup, how do we elect a new master? If data is written multiple places how do we reconcile them? Are there rate limits or other ways of keeping a problem one place from cascading elsewhere?
Capacity
What is the expected load on the various systems involved? Does load ramp up gradually or do we expect a sudden spike in traffic? What needs to be handled manually and is there sufficient staffing to do it?
Monitoring
Do we need to report new metrics? How will we know about errors?
Data analytics
How will we measure usage of the new functionality? What kind of analysis might we want to do?
History
Has the company considered this problem before? What previous decisions got us here? If there are documents describing previous designs, I tend to just link to them rather going into a lot of detail about what has gone before.
Storage
What database(s) are involved (new or existing)? What changes in database schemas are required?
Interfaces between systems
Defining these can help clarify the design and is particularly helpful if one of the functions of your design is to coordinate between different teams or companies who are responsible for different pieces.
Alternatives
How else did we consider solving the problem? Why did we choose the solution we are proposing?
Open questions
This section is particularly helpful if you know certain topics are controversial or warrant further discussion. As questions are resolved, move items from here into the main design section or the alternatives section.
Rollout
In what order are we building this? Are we shipping it continuously? In a series of phases? Is it rolled out selectively to certain users?

These questions can be taken as a template for a design document, but they also can be used to figure out who to go talk to, what to put into a presentation, or what anticipated questions to prepare for.

There’s a lot in this series of blog posts about things to do: Did you talk to X? Did you consider Y? What if we did Z? And those are all very helpful up to a point. But only do those things which seem necessary for your particular organizational culture and problem you are trying to solve. The purpose of all these suggestions is to help you build things and solve problems, so as you go, don’t be afraid to keep asking yourself and others: Are people on the same page now? Is this enough specificity to build this? Is my technical design sufficient for what I need?