Apple Accelerate libSparse performance

I've created a Julia interface for Apple Accelerate's libSparse, via calling the library functions as if they were C (@ccall). I'm interested in using this in the context of power systems, where the sparse matrix is the Jacobian or the ABA matrix from a sparse grid network. However, I'm puzzled by the performance.

I ran a sampling profiler on repeated in-place solves of Ax = b for a large sparse matrix A and random dense vectors b. (A is size 30k, positive definite so Cholesky factorization.) The 2 functions with the largest impact are _SparseConvertFromCoordinate_Double from libSparse.dylib, and BLASStateRelease from libBLAS.dylib. That strikes me as bizarre. This is an in-place solve: there should be minimal overheard from allocating/deallocating memory. Also, it seems strange that the library would repeatedly convert from coordinate form. Is this expected behavior?

Thinking it might be an artifact of the Julia-C interface, I wrote up a similar program in C/Objective-C. I didn't profile it, but timing the same operation (repeated in-place solves of Ax = b for random vectors b, with the same matrix A as in the Julia) gave the same duration. I've attached the C/Objective-C below.

If you're familiar with Julia, the following will give you the matrix I was working with:

using PowerSystems, PowerNetworkMatrices
sys = System("pglib_opf_case30000_goc.m")
A = PowerNetworkMatrices.ABA_Matrix(sys).data

where you can find the .m file here. (As a crude way to transfer A from Julia to C, I wrote the 3 arrays A.nzval, A.colptr, and A.rowval to .txt files as space-separated lists of numbers: the above C/objective-C reads in those files.) To duplicate my Julia profiling, do pkg> add AppleAccelerate#libSparse Profile--note the #libSparse part, these features aren't on the main branch--then run

using AppleAccelerate, Profile
# run previous code snippet to define A
M, N = 10000, size(A)[1]
bs = [rand(N) for _ in 1:M]
aa_fact = AAFactorization(A)
factor!(aa_fact)
solve!(aa_fact, bs[1]) # pre-compile before we profile.
Profile.init(n = 10^6, delay = 0.0003)
@profile (for i in 1:M; solve!(aa_fact, bs[i]); end;)
Profile.print(C = true, format = :flat, sortedby = :count)

Hi,

Please could you post your code that uses the Sparse API directly? SparseConvertFromCoordinate_Double is an explicit user call and that shouldn't appear unless the Julia interface is calling it.

Many thanks.

Here's the file with the API calls. I have to provide the correct symbol, not the function name, so it's a bit hard to read: e.g. you'll see _Z12SparseFactorh18SparseMatrix_Float instead of SparseFactor. I imagine you probably have Apple internal tools to deal with that...but if you don't, here's the mangled-demangled pairs.

All the relevant components can be found in that same GitHub repo, in the src/libSparse folder of the libSparse branch of AppleAccelerate.jl.

Apple Accelerate libSparse performance
 
 
Q