Number 6
May 24, 2023

A character string is a character string is a character string

I’m a fan of strict typing in programming languages: I don’t want to mix numbers and text, lists and booleans, tuples and data structures. I like knowing that this variable belongs to that type, and I like it when the compiler tells me I’m mistaken. Therefore, I am not a fan of those languages in which data types hold more promiscuous relationships.

Let’s use Javascript as an example. That language’s attitude is: “You need a number but you have text? No problem! Oh, you had an array, no worries; we’ll figure something out.” The language does more typecasts than a Hollywood agency, and some people had to build tables to determine how the language would convert a value when comparing it to a differently typed value.

Some people have no issue with this. Not only that: they even use character strings when they are inappropriate.

I once asked someone to write a function to decode UTF-8 encoded text. In this encoding, a text character may occupy several bytes. Hence, each byte’s bits need to be manipulated so you can join them up and decode them. What was the first thing this person did? He wrote a function to convert bytes into a text string containing their binary representation. I stopped him right there, so I don’t know how he was expecting to continue, but I suppose he was going to cut those text strings, join them to build a binary representation of a large number, and then turn it into the corresponding number.

If the problem is not apparent, let me tell you: building or manipulating binary representations in character strings is unnecessary. Computers are designed to work with bits directly, so a simple number is the best data type for a bit sequence.

Don Quixote reading a book and imagining all sorts of adventures.Programmers are not the only ones who are troubled by texts.

To those with enough experience, it is evident that what that person did was incorrect. However, whether we have more or less experience, most of us occasionally make similar mistakes without noticing; for example, when we use character strings to represent URIs.

We are used to dealing with text in the “real world.” To us, a sentence is a text, but so is a number, a list, a table, a car’s license plate, a telephone number, or an email address. Everything is text to us, and that’s why we often use strings to store that kind of data instead of using or defining more appropriate types, such as integers, lists, arrays, a class named “License Plate,” a class called “Telephone Number,” or a class named “Email Address.”

Most of those classes would use character strings internally, so one would be tempted to ask oneself if it wouldn’t be more efficient to go ahead and use a character string directly. Well, no: in some languages, both options would end up being compiled into the same machine code, so the program will not use more memory or run more slowly.

More importantly, those classes help us keep the different data types separated in our heads and keep us from trying to perform invalid operations on them. For example, while you can concatenate two character strings to create a new one, you can’t do the same with two email addresses. Therefore, treating them as character strings doesn’t make sense.

Therefore, whenever you find yourself using a text string in a program, you should stop to think about whether you couldn’t use a more suitable data type. This will help you write better, less buggy programs.

The illustration for this Coding Sheet comes from an engraving by Gustave Doré.