In the time I have been programming, and mostly doing web programming recently, I have learned a few things. Notably, I have learned that there are a few things that people think are simple to deal with, but aren’t. These “simple” things that people think they’re doing when programming don’t really exist. Here are three examples:
I don’t always agree with everything Joel Spolsky says, but he’s right in his rant about Unicode:
There Ain’t No Such Thing As Plain Text.
When dealing with input and output, you never have the luxury of just having “text”. What you really have is a byte stream using a specific character encoding. If you don’t know what encoding you’re dealing with, you’ve got nothing. Every input stream has to be decoded; every output stream has to be encoded.
Even once you have encodings sorted out, there’s a lot of question about what a “string” is in your program. Consider the distinction Django makes between strings and safestrings that allows the auto escaping to work: some strings contain HTML code, and some contain text that the user should see as-is. You can’t “output a string” without knowing how (or if) it has to be processed/escaped/cleaned first.
It’s never just “plain text”.
It’s very easy in most languages to store date and time values. Unfortunately, there’s not really any such thing as a “time” either.
As I sit here, it is about midnight (0:00) PST. It’s 8:00 in London and 16:00 in Beijing. A time is no good to anybody without a time zone to tell you how it fits into the world. This comes into much sharper focus with web applications where users are probably going to be in different time zones.
But it’s not even as easy as storing a time + timezone: one week (7 days × 24 hours/day) ago, it was 1:00 PDT, not 12:00 PST. You can’t just add n days to a time and get the same time n days later. Time zones can change, even for a particular user, even if they don’t change their location. (And if not for knowing the time zone, I would have absolutely no way to notice these gotchas.)
Suppose I was using a calendaring application and I enter a meeting at “13:00” on a particular date.
How does the program represent that? The first instinct would probably be to store “<date> 13:00 PST” (using the entered date/time and my current time zone) but that’s not right if there’s a time change before that date. I have seen calendar error announcements “all meetings after the time change will be off by an hour” because of this mistake. Should it really be stored as “<date> 13:00 PDT” depending on the date? What if the North American daylight savings rules change again before this meeting?
I don’t even want to think about two users in different time zones trying to schedule a meeting, but it should definitely be possible.
The only real thing to do is store “<date> 13:00 America/Vancouver” and hope some timezone library is smart enough to save us later. That means we need a date library with a lot of smarts, like pytz for Python.
It also means that you have to at least be very careful with any built-in date/time library (and possibly data type) your language comes with. It might mean you have to bypass them entirely.
“Appearance of a web page”
[I know it’s not really “programming”, but just move on, okay?]
This one shouldn’t be a surprise to anybody who knows anything about the web, but web pages simply don’t have a single unique appearance. The way a page looks depends on the browser, window size, available fonts, font size settings, and who knows how many other factors.
If you’re making web pages, you simply have to understand and live with this limitation. As I have said many times in lectures: if you don’t like it, don’t make web pages.
Also, what the page looks like to you has relatively little relation to the way Google or other bots “see” it, but that’s another rant.