Information Curation for the Programmer

I ended the last post, “Generative AI and the Programmer,” on a note that’s a Rorschach test for your own outlook on life: the fate of the technology industry relies on the ability of our practitioners to curate and digest information. My own outlook is frankly pessimistic. Most of us have ignored our own history and, worse, the broad range of “liberal arts,” producing professionals uniquely and ironically unsuited to critically engaging with the informational ecosystems that we have been instrumental in building.

Turning to the liberal arts to cope with information overload may seem strange if you’re not familiar with the history of curation. For as long as there have been writers, there has been too much writing to read and digest. During the early Roman Empire, for example, Seneca wrote a letter to a younger correspondent:

You should be extending your stay among writers whose genius is unquestionable, deriving constant nourishment from them if you wish to gain anything from your reading that will find a lasting place in your mind.¹

On the other side of the world, a few centuries earlier, Confucius had articulated a warning about reading and study:

Learning without thought is labor lost. Thought without learning is perilous. The study of strange doctrines is injurious indeed!²

The problem is no better now. A modern critic, Neil Postman, wrote:

Information has become a form of garbage, not only incapable of answering the most fundamental human questions but barely useful in providing coherent direction to the solution of even mundane problems.³

The problem promises to metastasize in the era of generative AI, because no “content” can be assumed to be animated by human inspiration, which remains the only genuine source of insight and value.

Fixing this at the scale of the internet is an intractable problem. It must be accepted that, by the iron law of the “Tragedy of the Commons,” there are people motivated to ruin the information ecosystem as a whole and so they will do it. You and I can’t prevent it. What we can do, though, is to take a firm hold on that concept of curation and make it our lifeline. The specific approaches vary based on whether you’re a new learner or an established professional, though.

If you’re new to programming, your task is surprisingly simple. A generally accepted canon exists, as in the liberal arts. You could do worse than picking a classic textbook like Structure and Interpretation of Computer Programs, available online, and working through it. You’ll have the advantage of history and supplemental resources, like the MIT OpenCourseWare course, to make it more approachable. With a couple of practical classics like The Pragmatic Programmer and Operating Systems: Three Easy Pieces, you can build a foundation that will serve you in good stead throughout your career.

Where curation becomes challenging for the beginner is in the transition from that foundation to practice. It may be hard to believe, but almost all technology books and resources about specific technologies are valueless. Part of your goal during the construction of your foundation is to learn how to think about APIs and use documentation, because documentation is always more accurate, helpful, and up-to-date for practical use than books and tutorials about specific technologies. For example, you’re better served using your testing framework’s documentation for day-to-day reference while reading Working Effectively With Legacy Code to learn more about testing concepts rather than buying books like xUnit Test Patterns or The Cucumber Book.

Intermediate and advanced programmers have it worse. The blessing of the beginner is that important topics are paths that everyone has walked, and the availability of canonical works means you need only select from a list. Later topics will be more specific to your own career trajectory and interests, and are less likely to have indisputably “great” works emerge because the audience is smaller. This is where curation and reading techniques really come into their own, and where the “STEM-focused” programmer can benefit from academic techniques.

If you have trouble reading critically—being willing to say “this is not useful, I don’t need to read anymore”—then that’s probably the most valuable thing you can learn when you find yourself moving past the basics. There are two excellent resources on that topic: “How to Read a Paper” and “How to Read a Book.” Also consider using a tool like Obsidian to build your own knowledge base. The steady decay of search engines, the worsening signal-to-noise ratio of web content, and attacks on projects like the Internet Archive mean that future retrievability may not be a safe assumption.

Once you can read critically and efficiently, finding things to read becomes the challenge. Papers We Love does an great job finding readable papers, and will likely provide enough reading material for the working software developer; certainly enough to help them stay well above average in terms of their knowledge of the field. As you read more, you’ll also develop your own interests and learn about authors you can follow on your own. I read almost every paper that Philip Wadler publishes, for example, and citations in his papers help me find new authors of interest. The same approaches apply to books, although they represent a larger investment in time and money than papers.

Ironically, internet resources now represent the most challenging place to find value for both the beginner and the established professional. During the “content farm” phase of the internet during the 2010s, search engine optimization favored a specific authorial voice with repetitive, low-complexity content. The same style is now even more prominent because of generative AI, and you need to learn to recognize and reject the content to conserve your own attention.

This content tends to be overly personal, often expressing opinions or emotions that seem out of place within the material. It features a lot of bulleted and numbered lists, and overuses bold for emphasis. If you see unrendered Markdown symbols, **like this**, that’s a dead giveaway. View these as freebies; anything with these signals can immediately be discarded. Mastering older material will give you more tools as you learn to recognize more sophisticated rhetorical strategy. Read Seneca or Montaigne and you’ll never mistake AI for human writing again.

As we navigate this new informational landscape, some people, companies, and institutions will fall by the wayside because they can’t cope with the volume of “output” that we’re faced with. Others will fail because they underestimate the quality problems that come with generative AI. My hope is that, with tools new and old judiciously applied, we can make sure that newcomers to the profession understand the challenges that they’re facing so that we can develop and maintain a sustainable talent pipeline.

Seneca. “Moral Letters to Lucilius,” Letter 2. ↩︎
Confucius. The Analects, “On Governance.” ↩︎
Postman. Technopoly: The Surrender of Culture to Technology, p. 69. ↩︎