Where Bulldozer shines, and where it mires

AMD has released their server-version of the Bulldozer CPU class they released over a month ago, called Interlagos.

Bulldozer/Interlagos is AMD's attempt to grab more of the market from Intel. Currently, it's competing in the value sector but not on performance. The days when AMD CPUs were the virtualization kings have been gone for a couple years now. AMD would like that crown back, thank you, and they're driving to go there.

That said, comparing performance between equivalently clocked AMD and Intel CPUs is hard. They're optimized for different tasks, which means that the smart Systems Engineer looking for the next CPU to base their environment on should pay attention. Workload matters! Those AMD CPUs may be damned cheap compared to Intel, but if you're doing the wrong things with them you'd be better off buying previous-gen Intel chips.

The most controversial thing AMD has done is to make two cores share a Floating Point Unit. They've also done quite a bit of optimization in their Arithmatic Logic Unit, where Integer math is handled. The reasoning behind this is that most server usage these days is integer heavy, highly parallelizeable workloads; most database and simple web-serving workloads are entirely Integer and parallel-friendly, and that's a large part of the webapp stack right there. The likes of Google Plus, StackExchange, and Reddit do far more Integer work than floating-point, so something like Interlagos should be a good fit.

And the early benchmarks show that AMD does indeed have an edge on integer-heavy workloads over equivalent generation Intel parts. Intel still has an edge on compute-performance-per-watt, but AMD holds the edge on compute-performance-per-GHz. Pick which is more important to you.

Specialist workloads like render farms are edge cases, if big consumers, so engineering to handle those workloads is not worth the time. By staking out the middle of the market, AMD can drive innovation in the marketplace by forcing Intel to get creative in the middle. It's good for everyone.

Yes, but what about me, you cry.

Let us posit a web-service that does the following:

Users submit a web-page to the service.
The service then crawls that web-page and renders it to an image format (PDF, SVG, FLV, whatever).
The service stores the rendered document
Users can share such 'archived' pages with each other

A valuable service! Digging into it, what kind of systems would this kind of service require:

A front-end service, that actually drives the user interface.
A processing service, that crawls submitted web-pages, renders them, and stores them.
A storage service that actually hosts the rendered documents (may be a CDN).
A database to rule them all.

There are a bunch of value-adds such a service may offer, like automatic translation, image-to-text translation, and suchlike, but I'm leaving that out for now. This is a workable service. Now what do each of these tiers do?

Front End
This is web-serving and application logic. The web-serving bits are almost entirely integer, and highly parallelizeable at that. Application-logic is probably mostly integer, and may be highly parallelizeable, though that depends on the framework being used.

Mostly integer, therefore Interlagos would be a good fit

Failure modes:

If app-logic is highly single-threaded, scaling will be restricted by the clock-speed of the processor more than the integer performance of the processor. A 3.2GHz Intel would perform better than a 2.8GHz AMD.
If app-logic is actually significantly floating-point heavy for some reason. Perhaps encryption is in heavy use and crypt/decrypt operations are very common.

Processing
Fetching remote web-pages would be significantly integer. However, converting them to an image format, whether bitmapped or vectored, is going to be floating-point heavy. In fact, the image conversion step is likely to be the bulk of the computation this service is going to drive.

Significantly floating-point, therefore Intel would be a better fit.

Failure modes:

If the render engine has been optimized for the latest CPU microcode, the more-cores AMD chip may outperform Intel.

Storage
This is where files are kept. It's a file-server based on HTTP, and may be massively distributed. This is almost entirely integer operation since very few filesystem operations depend on floating-point.

Interagos should be fine.

Failure modes:

Weird edge-cases like experimental filesystems

Database
It's a database of some kind. May be a NoSQL, or an RDBMS. Doesn't much matter. Inserting and retrieving records is pretty heavily Integer.

Interlagos should be fine.

Failure modes:

The database is single-threaded (some NoSQL systems are still single-threaded).
The database handles multiple cores poorly (again, some NoSQL falls into this category).
Inserts incur floating-point calculations (stored procedures updating internal values, SQL commands cause server-side calculation, encrypted fields) to a significant degree.

Whew!

Of the four sub-services, Interlagos should be a good fit for three of 'em. The Processing sub-service is pretty clearly an Intel domain, but the others should be Integer enough for AMD to provide real value. Bifurcating the environment has its own costs, but if you have enough servers managing two CPU architectures shouldn't be too hard.

But there are limits. Application logic may end up being significantly floating-point, especially if encryption is being used anywhere. The database may not parallelize well, or have floating-point dependencies enough that it's better to go with either much higher clocked but low core-count Intel processors or equivalently clocked but high core-count Intel processors.

This, by the way, is where the "engineer" part of "Systems Engineer" comes in to play. When making the decision about what platforms to upgrade onto, knowing the processing needs of each layer is needed to be able to make a fully informed decision. A system with Tomcat on the front and MySQL on the back has a lot of parallel potential, but one written with Ruby 1.8.7 on the front and Tokyo Cabinet on the back is single-threaded to a high degree. Knowing this, or knowing how to find out, is our job.

A lot of this analysis also supports finding the answer to this question:

Should we go with high core-count but low clocked CPUs, or low core-count but highly clocked CPUs?

This can be a complicated thing to figure out, but knowing you have significant single-thread dependencies in your environment is key to answering it. In the above example, each sub-system has its own needs and you may end up with a mix of low-count-high-GHz / high-count-low-GHz systems depending on needs.

Of course, if you're in the cloud this is all moot. You just live with what you're given and hope that whatever they're using at the base works well with how you've chosen to run your systems.

Where Bulldozer shines, and where it mires

Categories: