Lines of Code
Lines of Code is a lousy metric — at least on a small scale. I accept that it may be useful to compare big projects or repositories, but most often, when I encounter it, it is just meaningless.
The Big Scale
It may be valuable on a big scale.
Google Stores Billions of Lines of Code in a Single Repository
A billion is a big number, a meaningful number. We can state things and ask questions about it.
- Google has a huge repository. Maybe even the biggest one in the world.
- Did they modify git/svn/Perforce to make it possible?
- My workflow is git-centric. I’m curious about how this repository affects their workflow? How is it different from mine?
The Linux kernel has around 27.8 million lines of code in its Git repository
- The Linux kernel is way smaller than the Google codebase.
- It’s still a tremendous effort.
- But hey. Isn’t it too big for a “kernel”? Oh! The drivers are there. Makes sense!
Developers seem to care about lines of code much more than I’d expect.
Why care about LoC?
I heard a story about a programmer paid by lines of code. I hope it was a joke. This is like paying a construction worker by the materials he uses! Lines of code have no intrinsic value. This is a cost we pay, a ballast.
We can use it as an approximation of the effort we already invested. Better yet, we can take numbers of added and deleted lines of code through a timespan and compare it to LoC of the entire repository, so we get the idea of how the project is changing over time. I’d argue that a changelog or a list of work items (Jira tickets, PRs merged?) gives us more information.
On its own, LoC of a file for example — It’ just a number. Sad and lonely.
150 🍎 < 75 🍊
Can we say that a 500 line long file is long? Or short? Or is it just perfect?
We can’t and this would be pointless. Don’t let any lint rule tell you otherwise. We need more information. The language isn’t enough. Let’s take modern JavaScript.
Is it declarative, imperative, or functional? The answer may differ between files in the same repository, and when we’re actually close enough to read what’s inside the file, we may form better models to think about it than LoC.
Let me continue with the declarative. Declarative is easier to read than imperative, right? We don’t have to think as much while reading, because it just is, while imperative does. For example, HTML is easier to read than JavaScript of the same length.
In a React class component, 150 lines of JSX in a render
function is “less
code” than 75 lines of class logic.
I will go further. 150 lines of JSX in a functional component is “less code”
than 75 lines of hooks. useState
, useEffect
, useLayoutEffect
,
useSelector
, useMachine
. A lot happens there. The difference may not be as
big as when comparing declarative UI composition with imperative code in
lifecycle hooks, but I’d argue that it still holds. We have fewer things to
comprehend in JSX because much of it is self-explanatory. (Go away <Fetch />
component, you’re the outlier.)
This is all JavaScript, but aren’t we comparing apples to oranges?
There are different kinds of code.
- Declarative, but non-functional code, like HTML, CSS, SQL and GraphQL may be verbose, but it’s trivial to read.
- Imperative code will certainly be harder to read, and way harder to maintain.
- Functional code doing the same thing may be more concise and easier to debug, but require detailed reading at first.
We can divide it in more ways! In the same language, the same codebase, there will be some important code and some cheap code. The text doesn’t hold this information.
Better Metrics
We would like to measure things that matter, obviously.
Ease of comprehension is a good one. It is a very soft thing, though. Can we measure something easier and assume it’s correlated with cognitive complexity?
Enter cyclomatic complexity, a measure of the number of linearly independent paths in a program’s control flow graph.
The more paths we have, the more we need to think about, and what’s important, the more we have to test.
Further reading
- SonarSource has a nice heuristic for cognitive complexity. I didn’t read their whitepaper thoroughly, but it makes a lot of sense. This is the rule I’m using in my ESLint config.
- “Danger of Simplicity” is a good read