E-Mail:
Get our new Windows 7 eBook (PDF) for $7 with 70+ Tips. Download Now!

Why Shouldn’t I Use TAB Characters In My Source Files?

  • No Related Post

Someone’s telling me not to use tabs in code files. Why? He mumbles something about “cross-platform issues.”

Is this even true!? is it even *possible* for (say) a Mac or a Unix to botch the reading of a Windows file (or, most likely, I’d think) a Mac or Unix to botch the other?

I do know that between Mac and Unix, one uses CR (000D) and the other, LF (000A), whereas Windows uses both (though I know not in which order).

That being the case, what’s the deal with the HT code (0009)? Don’t all three use the same 0009 to indicate a tab?

(Shudder)

No, this has nothing to do with platforms. All use hex 0×09 to represent a tab character.

The problem is much, much deeper than that.

This is about programmers, religion, and the meaning of that lowly little character we call “tab”.

I shudder because I’ve witnessed religious flame wars between computer programmers on this issue. Seriously.

To understand why something this seemingly simple would inspire deep passion, we need to define a few things. Like “tab”. And in doing so, we’ll see that there is no one true definition. Only the one you choose to adhere to. (It’s sounding like religion already!)

First let’s separate out two concepts: the tab character and the tab key. They’re not the same. That’s part of the confusion.

TAB: The Character

The tab ASCII character, hexadecimal value 0×09, decimal value 9, probably dates back to the days of the teletype. As defined by how the hardware was constructed, a tab character moved the print head to the next tab stop, which was the next multiple of 8 character positions. If a tab was encountered as the teletype printed, the print head would jump ahead to column 9, 17, 25 and so on - whichever one was next.

Put another way, a tab character was hard-coded to be every 8 columns. And you can’t get much harder than hardware.

My guess is simply that the tab was a convenient form of compression. A long run of blank space could be described by a much smaller number of tabs followed by an appropriate number of individual single spaces.

Indenting: The Root of the Controversy

Let’s step away from the tab character issue for a moment, and talk about indenting. Indenting is an approach to making programming code, be it HTML, C, Basic, or who-knows-what, easier to read by laying out the individual instructions in a manner that visually mimics the intended structure of the code. For example:

if (1 == a)
    printf ("'a' is one\n");
else
    printf ("'a' is not one\n");

This silly little snippet of code uses indentation as a visual aid to show
the structure of the ‘if’ statement. This is equivalent:

if (1 == a)
printf ("'a' is one\n");
else
printf ("'a' is not one\n");

but as you can see it’s much harder to get a sense for what’s happening.

Now the controversy. How many columns should indented lines be indented? My
example above is 4. Here’s that same example with an indent of 2:

if (1 == a)
  printf ("'a' is one\n");
else
  printf ("'a' is not one\n");

and again with an indent of 8:

if (1 == a)
        printf ("'a' is one\n");
else
        printf ("'a' is not one\n");

Which is “better” is a matter of personal taste and readability. I have seen, and at various times used, intents of 1, 2, 3, 4 and 8. And while it sounds silly to some, programmers do get passionate at times as to how much to indent - this is code they have to look at every day, and they want it to be as readable and understandable as possible. Indenting is part of that.

In particular, when you have multiple programmers working on the same source code, it’s critical that they agree on how much indenting they’ll use. Why? Because if some indent at 2, and others indent at 4, for example, the code will over time become more and more difficult to read. And that, in turn, makes the code more fragile and error prone.

The Tab Key: A Solution, and Yet…

It’s easy to indent to any column just by typing the spacebar the appropriate number of times. However that quickly gets cumbersome. But what about tab?

If you can standardize on an indent of 8, well then having the tab key insert a tab character, and then just typing tab the appropriate number of times will get you right to your level of indent. Very quick, easy to use, easy to do.

But what if your indent style isn’t 8? What if it’s 4? Or 3?

Two different approaches are commonly used:

Redefine the tab character, and still have the tab key insert a tab character: Many, if not most, text editors will allow you to redefine what the tab character physically means. So many programmers will simply use this to say “a tab character means tab stops every 4 columns”. Then they define the tab key to insert a tab character. Very quick, easy to use, easy to do. But it only displays properly if that redefinition of tab is used. If someone else who hasn’t defined the tab character to be every 4 columns looks at the file, the indentation will be wrong. Things won’t line up properly.

Redefine the tab key, and leave the tab character definition alone. Many, if not most, text editors actually understand that “indent” is a different concept than “tab”. By assigning the tab key to some kind of smart indent feature in the editor, the computer can simply automatically insert the correct number of tab characters and spaces to move to the desired indent column. For example, if you want to indent to column 13, the computer would simply insert one tab and 4 spaces, without you’re having to think about it. Once again simple, efficient, allows you to use tab characters according to their default definition, and makes the computer do the work.

The Third Option: Kill the Tab Character

The problem is that those two different approaches are, indeed, commonly used. That means that you may well find programmers who have their text editors, viewers and other tools to assume that a tab character means every 3 spaces. Or every 4. Or every 8. And that means that any file that contains tabs, regardless of the definition may not display properly for everyone.

It’s important to have a public convention that says “tab characters mean this”, so that when someone views a file they can adjust your settings so that it will view properly, but still… that’s cumbersome and easily forgotten, especially for folks that move between projects that have different conventions.

One solution is simply to avoid the problem. Don’t use tabs. Indent to whatever degree your convention calls for, just use spaces to do it. Let your text editor simply insert the appropriate number of spaces to get to the next tab stop.

That way, all files that are tab-free are guaranteed to display properly no matter what your tab character is defined to be.

The cost? Your source files will be a little bigger. A trivially small cost in today’s environment of huge disks and storage.

Who’s Right?

So what’s the answer? Who’s right?

Everyone. No one.

Regardless of how we got here, here we are. The tab character does, and doesn’t, get redefined. People do use mixes of tabs, tabs and spaces or only spaces to indent their code.

What’s most important is that everyone working on the same source code use the same convention, and that the convention is somehow documented and easy to find, so that others looking at, or perhaps about to work on the code can adjust their own settings, if needed.

And I say that as someone who’s been a referee on too many of these coding standards religious battles.

(And I won’t even touch the issue of where the curly braces go in programming languages like C or C++. :-)

Have any comments? Join the discussion here!

What Do You Think?

 

Posted Recently

41 queries / 0.456 seconds.