python - A quick way to compute the average of multiple timeseries? -


i'm writing k-means algorithm in python numpy. distance-to-all-centroids part pretty optimized (compute matrix of centroids instead of each 1 separately), i'm struggling compute-new-centroid part. i'm copying data per centroid dataset compute mean.

i think faster without copying. how do in python/numpy?

code snippet:

    c_i in range(k):         sub_data = np.zeros([n_per_c[c_i],data_width])          sub_data_i = 0         data_i in range(data_length):             if label[data_i] == c_i:                                     sub_data[sub_data_i,:] = data[data_i,:]                 sub_data_i += 1          c[c_i] = np.mean(sub_data, axis=0) 

c list of centroids have, data entire dataset, label list classlabels.

i think following same code, without explicit intermediate array:

for c_i in range(k):     c[c_i] = np.mean(data[label == c_i, :], axis=0) 

getting rid of last loop tougher, should work:

label_counts = np.bincount(label) label_sums = np.histogram2d(np.repeat(label, data_length),                             np.tile(np.arange(data_length), k),                             bins=(k, data_length),                             weights=data.ravel())[0] c = label_sums / label_count[:, none] 

Comments

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -