machine learning - How to randomly split a dataset into training set, test set, and dev set in Python? -


i have large dataset , want randomly split dataset 70% train, 25% test, , 5% dev. how can in python scikit-learn?

i wonder if using sklearn.cross_validation.train_test_split(*arrays, **options) function example in following link?

http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html

you use:

from numpy.random import multinomial  n_total_samples = 1000 # or whatever  indices = np.arange(n_total_samples) inds_split = multinomial(n=1,                          pvals=[0.7, 0.25, 0.05],                          size=n_total_samples).argmax(axis=1)  train_inds = indices[inds_split==0] test_inds  = indices[inds_split==1] dev_inds   = indices[inds_split==2]  print len(train_inds) / float(n_total_samples) # => 0.713 print len(test_inds) / float(n_total_samples)  # => 0.24 print len(dev_inds) / float(n_total_samples)   # => 0.047 

it's not pretty built-in function, believe need.


Comments

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -