Skip to content

32-bits Less is More

December 7, 2011

I just got through creating the release candidates for the next release of Haskell Platform for Mac OS X. This release will come in both 32-bit (i386) and 64-bit (x86_64) versions.

tl;dr: Unless you are writing programs that address over 2G bytes of data at once, install the 32-bit version of the upcoming Haskell Platform.

Given that I had both, it seemed like a good idea to benchmark them somehow, and give people some guidance as to which to install. I choose Johan Tibell’s unordered-containers package, as it has a nice benchmark built with criterion that tests both it, and the common Map and IntMap data types. (And also because he was sitting at the desk next to me at work today, so I could pester him with questions!)

The results were a little surprising: The 64-bit version ran between 0% and 139% slower on most benchmarks, averaging 27% slower. For a small few, it ran a little bit faster (0% to 15%, average 8%). Details and discussion after the break.

Details of what I tested:

  • Haskell Platform 2011.4.0.0 RC2 – final is out in a week or two
  • GHC 7.0.4
  • containers-
  • unordered-containers-
  • Hardware: MacBook Pro (2010) Intel Core i5 2.4Ghz, and MacBook Air (2011) Intel Core i7 1.8Ghz — both with 4 GB RAM and SSD disks.
  • Command line: ./benchmark -g -u output.csv +RTS -H

Johan and I discussed this outcome, and I also consulted my resident HW expert (my husband used to be a CPU architect for Intel). On the one hand, code compiled for the 64-bit instruction set should be able to be more efficient in use of HW resources, and be smaller in code size. However, on any given processor, the underlying CPU resources are the same, and so these effects should be moderate (tens of % points). On the other hand, compiling for the ILP64 model (where int, size_t, and pointers are all 64-bits), effectively doubles the size of most Haskell data in memory, increasing memory demand, doubling the scanning time of GC, and making the CPU’s data caches half as effective. Our suspicion is that GHC must not be taking as much advantage of the 64-bit instruction architecture as it could, and that in data heavy (but not overly large) benchmarks like these the memory and cache disadvantages dominate.

It would be interesting to see if these results hold on later versions of GHC, on other OSes, and with other benchmarks.

For those that want the numbers: This chart shows the delta running time on 64-bit over 32-bit as a percentage (positive is slower). MBP and MBA are the two machines I ran these benchmarks on.



Map/lookup/String 24% 12%
Map/lookup/ByteString 9% 14%
Map/lookup-miss/String 28% 26%
Map/lookup-miss/ByteString 35% 13%
Map/insert/String 33% 28%
Map/insert/ByteStringString 33% 30%
Map/insert-dup/String 45% 38%
Map/insert-dup/ByteStringString 32% 24%
Map/delete/String 40% 24%
Map/delete/ByteString 20% 22%
Map/delete-miss/String 17% 8%
Map/delete-miss/ByteString 18% 8%
Map/size/String 29% 16%
Map/size/ByteString 28% 15%
Map/fromList/String 19% 27%
Map/fromList/ByteString 23% 18%
IntMap/lookup 4% 0%
IntMap/lookup-miss 11% 1%
IntMap/insert 20% 9%
IntMap/insert-dup 55% 29%
IntMap/delete 22% 12%
IntMap/delete-miss 37% 12%
IntMap/size -1% -5%
IntMap/fromList 46% 7%
lookup/String 3% -0%
lookup/ByteString 44% 29%
lookup/Int 29% -2%
lookup-miss/String 34% 17%
lookup-miss/ByteString 20% 18%
lookup-miss/Int 19% -7%
insert/String 139% 20%
insert/ByteString 55% 25%
insert/Int 17% -2%
insert-dup/String 95% 26%
insert-dup/ByteString 50% 3%
insert-dup/Int 46% 7%
delete/String 84% 47%
delete/ByteString 26% 24%
delete/Int 19% 22%
delete-miss/String 58% 29%
delete-miss/ByteString 45% 16%
delete-miss/Int 47% 28%
union 14% 4%
map 11% 12%
difference -9% -9%
intersection -9% -13%
foldl’ -0% 1%
foldr 62% 39%
filter 13% 16%
filterWithKey 22% -3%
size/String -15% -14%
size/ByteString -13% -13%
size/Int -14% -13%
fromList/String 56% 19%
fromList/ByteString 29% 16%
fromList/Int 34% 29%

From → Haskell

  1. I’ve noticed similar issues, where programs on my 64bit machine use roughly twice as much ram as on my 32 bit machine. I end up doing most calculations (typically involving several years worth of wave measurements (spectra)) on my 32 bit machine (with only 2gb).

    On the (admittedly a few years older 64 bit machine) with 4 gb of ram, it start swapping too quickly and without more optimization I tend to get bored well before they finish (and the machine stops responding usefully). Painful when it is your main desktop machine.

  2. Svein Ove Aas permalink

    Any chance (most?) of this is because of using ILP64 instead of LP64 in Haskell?

  3. I noted a difference for the binary library too regarding 32 vs 64, wrote about it in this blog post:

  4. I’d love to see if there are any difference at all with the LLVM backend (-fllvm). I doubt there is in this case as we don’t do that much interesting work.

  5. Good work measuring it! Any guesses why the Air’s i7 had less 32/64 disparity than the Pro’s i5? (Maybe cache size? Which processor’s caches are bigger?)

    (A nitpick: This statistic is true but unfair: “The 64-bit version ran between 0% and 139% slower on most benchmarks”. For all benchmarks, it’s between -15% and 139%. Or use median-like practices: since you clipped the bottom five data points to say “0%”, you could reasonably clip the top five data points to “56%” e.g. “The 64-bit version ran between 0% and 56% slower on most benchmarks”… Is your average of 27% among your most benchmarks or all your benchmarks?)

  6. Paul Liu permalink

    I’d really love to see ghc behave like gcc, switching from 32bit to 64bit is then a simple matter of flipping the option -m32 to -m64.

Comments are closed.

%d bloggers like this: