Dissecting Code Readability

Gilbert

January 2nd, 2019

Artwork: "Rush" by [Sean Parnell](http://parnellart.com" target="_blank)

"This code isn't readable".

Have you heard this before? What does it mean? Is "readability" merely a subjective tool for dismissal? Or are there useful, objective aspects around the concept?

To write readable code, you must first understand what the term actually means. Therefore, we will dissect what programmers intend to say when they talk about code readability.

Discuss this post on HN

Breaking it Down

When someone says "I like readable code," ultimately what they mean is they like code that they can understand. Naturally, that makes it subjective. But is it entirely subjective? Let's break it down and find out.

There are three primary factors that impact readability:

Legibility
Communication
Familiarity

Legibility

It goes without saying that "clear enough to read" is the first step towards readability. For example, if you can't read handwritten letters on a page, then it's not legible, and thus not readable.

Regarding code, legibility is the least subjective of the three aforementioned factors. It involves things like whitespace, decent variable names, and code organization. For example, it's safe to say this code...

function r (a, b) {
  var c = b.d;while(b.d<a.length&&a[b.d]!==';'){b.e()};return a.s(c,b.d).t()
}

...is objectively less readable than this code...

function readEnum (source, pos) {
  var start = pos.i
  while (pos.i < source.length && source[pos.i] !== ';') {
    pos.newcol()
  }
  return source.substring(start, pos.i).trim()
}

...even if you don't understand what the code is really doing.

Where it gets subjective, though, is where you draw certain lines. How long should a function be? When does a file become too large? Max length of a single line? These are important questions, but outside the scope of this post.

Communication

Code communication is the sum of all "hints" that lead you to understanding the code you're reading. It includes:

Variable names. Non-descriptive variable names hurt readability. A good variable or function name describes the "what", even if "how" is unclear.
File names. Like a well-written article, a file's name is the title to its contents.
Comments. Code can self-describe its operation, but not its purpose. Comments fill the "why" gap of understanding. You should also use comments to expose any implicit invariants the code relies on.
Common idioms. Using fewer or well-understood patterns helps the reader spend less effort on the "how" of the code at hand.

Writing these things well is more art than science. After all, what makes a good writer? The answer is subjective. However, it's less subjective to ask "what makes a poor writer?". Even if good writing is subjective, it's easier to point out what makes writing poor.

With that in mind: not doing any of the above at all will certainly make your code hard to read.

Take note, however: Each of the above fully relies on the reader's prior knowledge of the names and terms you use in the code. Which brings us to our final and most important aspect...

Familiarity

Familiarity is essential to readability. You cannot have readability with zero familiarity. For example, you can read the sentence "Zon bilm deska", but if you're not familiar with any of those words, then the sentence is not readable!

Readability relies on the reader's familiarity of the written. This is universal. When you think code is readable, that's because the code fundamentally draws from your knowledge and past experience as a programmer. Without that knowledge and experience, the code cannot be read.

Of course, it's possible to have readability without 100% familiarity – this is the most common case. Consider storywriting. How many stories stop and explain what a castle is? Virtually none. However, the writing does draw from your familiarity of a castle to teach you the story's setting.

It's the same with code. Take the following example:

fetchUser(40).then(function(user) {
  respond(user);
});

If you're a programmer, you can probably guess this code is fetching a user and using it as a response. The code is readable; it uses names and concepts you're familiar with to tell the "story" of what it's doing. To illustrate this further, consider:

fetchUser is "fetching" a user. What does that mean? You'd guess it's querying a database, file, or API, based on past experience with I/O.
fetchUser takes 40 as an argument. What does it represent? Probably an identifier, based on knowledge of data structures & schema design and/or past experience with databases and APIs.
function(user) is an anonymous function passed to .then. Why? Probably because the fetching is asynchronous, based on knowledge of concurrency and past experience with fetching things in certain languages.
What does respond do? Probably sends a message to the client of some kind of request, based on knowledge of the typical request/response cycle and past experience writing servers and APIs.
And so on.

You cannot have readability with zero familiarity. Yet, not all familiarity is equal. For example, you can expect virtually everyone to know what a piano is, but you can't expect everyone to know what a euphonium is (sorry, euphonium players!).

With code readability, familiarity is king. Code will inevitably be more readable to those with relevant experience, and less readable to those without.

Writing Readable Code

If readability depends so much on the reader, is there anything you can do about it?

Of course, the answer is yes. Whether you're writing code for a corporation or for the open source world, you should always keep in mind the different types of familiarity required to understand your code:

Fundamentals
Common Idioms
Domain Knowledge

Code Readability Ryramid

After understanding these types, you should then document them in your projects appropriately. This will provide a path of learning for those wanting to understand and contribute to your codebase.

Fundamentals

Fundamentals are concepts that apply to most all of programming. Strings, hashes, arrays, variable assignments, function calls, etc. span across most all programming tasks you will ever encounter.

I say "most all" because no knowledge truly spans across every domain. For example, assembly has jump instructions and memory registers – fundamentals for sure, but also domain knowledge specific to low-level architecture that you don't need to deal with when working in higher-level languages like JavaScript.

Common Idioms

Programming idioms are patterns that are useful in many situations. For example, a for loop:

var sum = 0;
var nums = [10,20,30];
for (var i=0; i < nums.length; i++) {
  sum += nums[i];
}
console.log("Got sum:", sum);

The for loop iterates through a collection (normally an array) and does something with each item in that collection. In this case, we add up all the numbers in the nums array.

The for loop is a very common idiom. But keep in mind: "common" is relative. There is no universal idiom. Some languages don't have for loops, for example.

When writing readable code, you need to decide on which knowledge domains you want to use and how deep you want to go for each one. For example, imagine you're writing JavaScript. How much OOP do you want to use? How much FP? It's not bad to use advanced concepts from either of the two domains, but you probably wouldn't want to use both.

In general, try to restrict advanced idioms to as few domains as possible.

Domain Knowledge

The previous two sections covered domains in programming knowledge. However, in software engineering, domain knowledge refers to knowledge about the industry you're building software for. For example, if you're working on the web app for healthcare.gov, then healthcare law is the relevant domain knowledge for your code.

Code can be extremely hard to read without proper domain knowledge. For example, take this snippet from the Mithril.js source code:

function createFragment(parent, vnode, hooks, ns, nextSibling) {
  var fragment = $doc.createDocumentFragment()
  if (vnode.children != null) {
    var children = vnode.children
    createNodes(fragment, children, 0, children.length, hooks, null, ns)
  }
  vnode.dom = fragment.firstChild
  vnode.domSize = fragment.childNodes.length
  insertNode(parent, fragment, nextSibling)
}

Is this code readable? Yes, but it requires domain knowledge to read! If you're not familiar with virtual DOM, regular DOM, or tree structures, you're gonna have a bad time reading this code.

Code comments can definitely help out with this. But they can only help so much. For example, you shouldn't expect React.js's source code to explain what a <select> tag is; you should study HTML and the DOM before studying React.

Key point: If you don't know the relevant domain knowledge for a codebase, the code will always feel hard to read, no matter how clean it is. Think about that before you consider rewriting a legacy codebase.

Conclusion

There are several factors to readability, many of which depend on the reader. But just because readability is subjective doesn't mean you can't write more readable code.

To recap:

Code must be legible to be read. This is not hard to do, but always keep it in mind.
Write your code with high communicative. Make the story clear to follow.
All code requires some familiarity to be readable. Recognize this, embrace it, and document all required domains of knowledge. This will make it much easier for your future readers to understand the wonderful code you've written.

I hope this post helps you better understand code readability, as well as be more informed the next time a debate around the topic crops up. Thanks for reading!

Discuss this post on HN

Permalink: /dissecting-code-readability/