Sunday, September 14, 2008

Ten Lies About Microprocessors

Processor selection too often turns into a religious war. Debunking the dominant myths is the first step towards making a rational choice.

Talk about sports teams, politics, religion, or your favorite boy band and most bartenders won't raise an eyebrow. But get a group of engineers and programmers arguing over which microprocessor is best and you're liable to get eighty-sixed for trash-talking x86.

People get passionate about processors in a way they don't over DRAMs or decoders. Everyone has favorites, as well as horror stories about the one they'll never use again. Legend and lore surround microprocessors. Some is useful, but a lot is superstition ingrained by tradition.

Myth #1: Few processor choices
This is the most insidious misconception. If you're designing an embedded system, how many 32-bit processors can you choose from? 10? 20? In reality, there are more than 100 different 32-bit embedded processors for sale right now. (And that's not counting different packaging options or speed grades.) Dozens of companies make 32-bit processors, representing more than 15 different CPU architectures and instruction sets. Add in a few hundred more 16-bit processors and a few hundred 8-bit processors and you've got an embarrassment of riches.

#2: Intel rules the world
If you say "microprocessor," a lot of people think, "Pentium." The mainstream press is partly to blame. Newspapers proclaim that Intel has a 95% share of the microprocessor market. That's off by almost two orders of magnitude.

As we saw in the January issue, only about 2% of all microprocessors made drive PCs ("The Two Percent Solution," p. 29). Intel's Pentium has a dominant share of the PC business (the Federal Trade Commission stopped just short of declaring it a monopoly), but PCs are a tiny slice of the microprocessor pie. The other 98% is embedded CPUs; Intel's not even in the top five of that group.

Even if we weed out the enormous volume of 8-bit and 16-bit chips and focus on 32-bitters, Intel's name still appears well down the list. ARM vendors alone sell about three times more processors than Intel sells Pentiums.

#3: Instruction sets don't matter
Whether you program in C/C++, BASIC, Ada, or Java, your code ultimately boils down into the hardware instruction set of the processor it's running on. You may not need to know all the machine instructions your CPU provides, but the instruction set does affect your code. A few elegant lines of C may produce a hideous tangle of assembly instructions and vice versa.

Performance, predictability, and even power consumption all depend heavily on the underlying instruction set of the processor, and there's nothing a high-level language can do to change that.

Let's take a simple example of multiplying two numbers together. This is trivial in any language and hardly something programmers will worry over. Yet different chips handle multiplication in different ways. For a while, many RISC chips couldn't even do multiplication—it was considered "impure" and not part of the RISC canon. Many viewed multiplication as glorified adding and shifting, so early RISC compilers had to synthesize their own integer multiply functions. It worked, but it wasn't fast.

Now most (but not all) processors have a built-in multiply instruction. But not all multipliers are the same. Some chips can multiply two numbers much faster than other chips, and it has nothing to do with clock frequencies. As the chart in Figure 1 shows, some chips (such as Hitachi's SH7604 and SH7708) can multiply any two 32-bit numbers in four cycles or less. Other chips (notably Motorola's 68020 and '030) take more than 40 cycles to do the same math.

Stranger still, most chips are unpredictable. The minimum time for a multiplication might be less than half of the maximum time. What's the difference? Bigger numbers require longer calculations, and that takes more time.

Finally, the order of the numbers matters. In grade school we were taught that multiplication is commutative, that the order of the two numbers doesn't affect the answer. That's still true, but the order does affect the time required to do the math. On many chips, multiply time is determined by one of the two operands. Swap their order and you may cut your multiply time in half. Good luck guessing which way is better, though.

None of this is visible to high-level source code. Few C compilers are even aware of these differences because most customers—developers of embedded systems—never ask. Many processor users just don't know what's going on under the hood.

#4: RISC is better than CISC
We covered this one in March, so let's just say that RISC is different from CISC ("RISCy Business," p. 37); neither is necessarily better all the time, and both have their strengths. CISC chips provide better code density (smaller memory footprint) and more mature software tools, but RISC chips have higher clock rates and more glamorous marketing. Take your pick, but make it an informed one.

#5: Java chips are coming
So's Christmas. Actually, Christmas is a lot closer because it's going to be here this year. Java chips have more in common with Santa Claus than Christmas: a nice fable for nave young engineers who aren't yet old enough to know better.

Java is remarkable in a number of ways, most of them having to do with marketing. But it's also remarkably resistant to hardware implementation. A number of companies have tried to produce an all-Java microprocessor and every one has failed to some degree. This trend is likely to continue.

Apart from being hilariously ironic—wasn't the whole point of Java to be hardware independent?—Java processors run headfirst into the low doorway of logic. The Java language was never meant to be handled in hardware, and it shows. Garbage collection, threads, stack orientation, and object management take about a megabyte worth of Java virtual machine to translate into something that even today's fastest microprocessors struggle to execute. Decades of computer evolution and research at companies and universities around the world have failed to produce anything that looks like a Java machine. This is not a coincidence.

Today Java "accelerator" chips are available from Nazomi, Zucotto, inSilicon, Octera, and many others. Most execute 30% to 60% of Java's bytecodes in hardware. The rest they punt and handle in software because it's simply too awkward to do otherwise. Following the standard 80/20 rule, these chips accelerate the most used Java instructions to produce a noticeable speedup in overall Java performance. But they're a far cry from a 100% Java implementation.

After a few years of rapid improvement, Java chips seem to have plateaued at that 60% level. Sun itself canceled its Java chip development. We've reached the point of diminishing returns, where implementing the remaining Java instructions in hardware doesn't produce worthwhile benefits. If you're in the market for Java chips, this is about as good as it's going to get.

#6: Dhrystone MIPS is a useful benchmark
The term MIPS is bandied about more than any other in the microprocessor business. It's become utterly hollow, unless you interpret it as Meaningless Indicator of Performance for Salesmen.

As I explained earlier, instructions aren't the same from processor to processor, so counting and comparing them isn't useful. It's like saying the German word for windshield wipers (Windschutzscheibewischerbltter) is longer than the English equivalent. Duh.

MIPS is commonly derived from something called the Dhrystone benchmark, which is more than 30 years old, was written in PL/I, and was meant to compare the VAX 11/780 to other mainframes. It's also only about 4KB of code, fits easily into cache, and doesn't do any useful work. Because of its diminutive size but exaggerated importance, Dhrystone is subject to some, shall we say, creative optimization. There are C compilers with a -dhrystone switch that drastically improves reported results. Today's MIPS ratings are achieved by dividing Dhrystone scores by 1,757 because that's what the first VAX scored back in the 1970s. We're measuring VAX-equivalents using a 4KB snippet of PL/I code that's been translated to C and tweaked who-knows-how-many times to produce a score that Marketing "accidentally" misprints with an extra zero behind it. Now how useful is that?

#7: Price is proportional to performance
Microprocessors are now sold like perfume: the price on the label has no connection to the cost of the ingredients. It's tempting to assume some meaningful relationship between cost and price. Save your time—there isn't one. Cost is what it takes to build a chip; price is whatever the marketing department wants it to be. Happily, we work in an industry where market pressures drive up value and drive down price all the time. As chip consumers, we benefit from the cutthroat cost cutting and market-share horse trading.

The cost to make a silicon chip has little to do with the amount of silicon in it. Cost is mostly determined by overhead amortization and the depreciation of the fab. Price, however, is determined by market forces—good ol' supply and demand. If your chip runs Windows XP, you can charge an arm and a leg for it. If it doesn't, the same amount of silicon will command a much lower price.

Even within the embedded world, there are $15 processors that outperform $150 processors. Price is negotiable, malleable, and wholly unpredictable. Shop around.

#8: ARM is lowest power
There aren't many strong brand reputations in the microprocessor business but ARM enjoys one of the best. According to their reputation, ARM's chips are endowed with an almost magical ability to run on bright sunlight or the energy released by rubbing a cat. An ARM processor, two lemons, and some copper wire are all that's needed to build the latest PDA, it seems.

Like many myths, this one is rooted in reality, but that reality has changed and the myth has expanded. In the early '90s, ARM was one of the first 32-bit processors to be embedded into ASICs, rather than soldered alongside as a separate chip. Compared to the big 68030, 29000, and 486DX chips of the day, the wee ARM6 consumed less total energy than the others gave off as heat. That's because the ARM had no floating-point unit, no cache, no outside bus, no drivers, and not much of an instruction set.

Today there are plenty of 32-bit processors available as ASIC cores. Many are smaller than the ARM7, to say nothing of the newer ARM10 or ARM11. Many use less power, both in standby mode and when they're active. If power consumption is your primary consideration, by all means give ARM a call. But ten years of progress and competition have moved ARM to the middle of the pack when it comes to power efficiency.

#9: Second sourcing micros
Second sourcing used to be the watchword of purchasing departments everywhere. Hardware engineers often aren't allowed to specify any component unless it's available from two or more sources. That's fine for resistors—it reduces risk and dependency on any one supplier—but it's now impossible for microprocessors.

Sure, you can get MIPS chips from a dozen different sources, such as NEC, PMC-Sierra, IDT, and Intrinsity, but they aren't interchangeable each other. They all execute the same instruction set, but their buses, pin-outs, peripherals, speeds, and packages are all different. At best, the programmers can keep most of their code, but the hardware engineers will have to design an all-new system.

There was a time when Motorola and Hitachi provided identical 68k processors, DMA controllers, and other chips. Intel and AMD used to second-source each other's processors as well (remember when AMD and Intel were friends?). Many low-end parts in the 8051 or 6805 family also used to be double- or even triple-sourced. Alas, competition has brought an end to those days. Now every processor chip is unique, even if its instruction set isn't.

#10: The great processor shakeout
With more than 100 different embedded 32-bit processors for sale, there must be too many choices for the market to support, right? Who's going to win and who's going to lose? Come the revolution, who will be first against the wall?

Probably none of them. In fact, the number of embedded processors is likely to grow, not shrink. Those hundred-odd chips are all in volume production with dozens of happy customers who wouldn't use anything else. Those chips are around for a reason, and the number of reasons keeps growing. MP3 players, digital-video cameras, automotive electronics, and other new toys are popping up all the time, and they each need a new and different kind of processor. There's no such thing as a typical embedded system and there's no such thing as a typical embedded processor. As long as embedded developers invent new devices, new embedded processors will be there to make them tick.

[Via Embedded.com]

No comments: