python - Efficiently compute columnwise sum of sparse array where every non-zero element is 1 -
i have bunch of data in scipy compressed sparse row (csr) format. of course majority of elements zero, , further know non-zero elements have value of 1. want compute sums on different subsets of rows of matrix. @ moment doing following:
import numpy np import scipy sp import scipy.sparse # create data sparsely distributed ones data = np.random.choice((0, 1), size=(1000, 2000), p=(0.95, 0.05)) data = sp.sparse.csr_matrix(data, dtype='int8') # generate column-wise sums on random subsets of rows nrand = 1000 k in range(nrand): inds = np.random.choice(data.shape[0], size=100, replace=false) # 60% of time spent here extracted_rows = data[inds] # 20% of time spent here row_sum = extracted_rows.sum(axis=0)
the last few lines there bottleneck in larger computational pipeline. annotated in code, 60% of time spent slicing data random indices, , 20% spent computing actual sum.
it seems me should able use knowledge data in array (i.e., non-zero value in sparse matrix 1; no other values present) compute these sums more efficiently. unfortunately, cannot figure out how. dealing data.indices
perhaps? have tried other sparsity structures (e.g. csc matrix), converting dense array first, these approaches slower csr matrix approach.
it known indexing of sparse matrices relatively slow. , there have questions getting around accessing data attributes directly.
but first timings. using data
, ind
show get
in [23]: datad=data.a # times @ 3.76 ms per loop in [24]: timeit row_sumd=datad[inds].sum(axis=0) 1000 loops, best of 3: 529 µs per loop in [25]: timeit row_sum=data[inds].sum(axis=0) 1000 loops, best of 3: 890 µs per loop in [26]: timeit d=datad[inds] 10000 loops, best of 3: 55.9 µs per loop in [27]: timeit d=data[inds] 1000 loops, best of 3: 617 µs per loop
the sparse version slower dense one, not lot. sparse indexing slower, sum faster.
the sparse sum done matrix product
def sparse.spmatrix.sum .... return np.asmatrix(np.ones((1, m), dtype=res_dtype)) * self
that suggests faster way - turn inds
appropriate array of 1s , multiply.
in [49]: %%timeit ....: b=np.zeros((1,data.shape[0]),'int8') ....: b[:,inds]=1 ....: rowmul=b*data ....: 1000 loops, best of 3: 587 µs per loop
that makes sparse operation fast equivalent dense one. (but converting dense slower)
==================
the last time test missing np.asmatrix
present in sparse sum
. times similar, , results same
in [232]: timeit b=np.zeros((1,data.shape[0]),'int8'); b[:,inds]=1; x1=np.asmatrix(b)*data 1000 loops, best of 3: 661 µs per loop in [233]: timeit b=np.zeros((1,data.shape[0]),'int8'); b[:,inds]=1; x2=b*data 1000 loops, best of 3: 605 µs per loop
one produces matrix, other array. both doing matrix product, 2nd dim of b
against 1st of data
. though b
array, task delegated data
, matrix product - in not transparent way.
in [234]: x1 out[234]: matrix([[9, 9, 5, ..., 9, 5, 3]], dtype=int8) in [235]: x2 out[235]: array([[9, 9, 5, ..., 9, 5, 3]], dtype=int8)
b*data.a
element multiplication , raises error; np.dot(b,data.a)
works slower.
newer numpy/python
has matmul
operator. see same time pattern:
in [280]: timeit b@dataa # dense product 100 loops, best of 3: 2.64 ms per loop in [281]: timeit b@data.a # slower due `.a` conversion 100 loops, best of 3: 6.44 ms per loop in [282]: timeit b@data # sparse product 1000 loops, best of 3: 571 µs per loop
np.dot
may delegate action sparse
, though have careful. hung machine np.dot(csr_matrix(b),data.a)
.
Comments
Post a Comment