python - A quick way to compute the average of multiple timeseries? -
i'm writing k-means algorithm in python numpy. distance-to-all-centroids part pretty optimized (compute matrix of centroids instead of each 1 separately), i'm struggling compute-new-centroid part. i'm copying data per centroid dataset compute mean.
i think faster without copying. how do in python/numpy?
code snippet:
c_i in range(k): sub_data = np.zeros([n_per_c[c_i],data_width]) sub_data_i = 0 data_i in range(data_length): if label[data_i] == c_i: sub_data[sub_data_i,:] = data[data_i,:] sub_data_i += 1 c[c_i] = np.mean(sub_data, axis=0)
c list of centroids have, data entire dataset, label list classlabels.
i think following same code, without explicit intermediate array:
for c_i in range(k): c[c_i] = np.mean(data[label == c_i, :], axis=0)
getting rid of last loop tougher, should work:
label_counts = np.bincount(label) label_sums = np.histogram2d(np.repeat(label, data_length), np.tile(np.arange(data_length), k), bins=(k, data_length), weights=data.ravel())[0] c = label_sums / label_count[:, none]
Comments
Post a Comment