armadillo, lapack, and intel’s MKL

I normally use ruby/gsl for everyday numerics because it’s fast to code and extend, and semantically clear, plus you can do cool things like previewing energy spectrum from within irb by interfacing with the gnuplot module—normally you’d write to a tmp file, switch to gnuplot console, and plot it there.

The problem with rb/gsl is speed. The other day, my boss walked in and compiled his fortran code on my machine. It beat my ruby code hands-down—one would think gsl, being a C library, should be on par with lapack, but apparently it’s not, and it’s not (entirely) ruby’s fault since I rewrote the same stuff in C, using gsl, and it’s still about 4 times slower than my boss’s fortran code using lapack. (I guess partly it’s b/c gsl is using its own blas implementation).

So I want something faster but still with natural semantics—e.g., matrix multiplication should be A*B, not dgremm(...A...B...)—so C++ seems a reasonable place to look at. So far I find armadillo suits my needs quite well. It is a c++ linear algebra library interfacing with lapack/blas.

On gentoo, this is what one needs:

If you have intel cpu, emerge sci-libs/mkl. This is a blas/lapack drop-in replacement from Intel. You need to go to intel’s website and register (free) for a non-commercial usage license. AMD has a similar library. Otherwise, just setup your blas/lapack of choice.
use eselect blas/cblas/lapack list/set to choose mkl as the blas/lapack implementation to use.
now emerge sci-libs/armadillo. It is important to install it after 1) and 2) b/c it will decide what library it’s linking against during compilation time

Now, #include <armadillo> in your cpp code, and use
g++ -O3 -larmadillo your_code.cpp -o your_exe
to compile. You don’t have to use -lblas -llapack as these are already taken care of by -larmadillo (which was done in 3 above, that’s why emerge order is important).

On a Mac OS X (my office workstation), the appropriate way to compile is instead:
g++ -O3 -framework Accelerate your_code.cpp -o your_exe
as per armadillo’s readme file.

I transcribed my ruby/gsl code into c++/armadillo, and on the Mac, it actually is 2 times faster than my boss’s fortran code—since both are using the same underlying blas/lapack implementation, I suppose there’s some unnecessary computation in the fortran code. At any rate, it seems safe to say the c++ overhead is not significant so I’ll happily settle for armadillo for the moment.

armadillo, lapack, and intel’s MKL

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112