Вопрос по python – Сохранить классификатор на диск в Scikit-Learn

149

Error: User Rate Limit ExceededNaive Bayes classifierError: User Rate Limit ExceededdiskError: User Rate Limit ExceededpredictError: User Rate Limit Exceeded

Error: User Rate Limit Exceeded

from sklearn import datasets
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)
print "Number of mislabeled points : %d" % (iris.target != y_pred).sum()

Ваш Ответ

5   ответов
3

sklearnError: User Rate Limit Exceeded__getstate__Error: User Rate Limit ExceededGMMError: User Rate Limit ExceededError: User Rate Limit ExceededError: User Rate Limit Exceeded

    try:
        state = super(BaseEstimator, self).__getstate__()
    except AttributeError:
        state = self.__dict__.copy()

    if type(self).__module__.startswith('sklearn.'):
        return dict(state.items(), _sklearn_version=__version__)
    else:
        return state

Error: User Rate Limit ExceededpickleError: User Rate Limit Exceeded

from sklearn import datasets
from sklearn.svm import SVC
iris = datasets.load_iris()
X = iris.data[:100, :2]
y = iris.target[:100]
model = SVC()
model.fit(X,y)
import pickle
with open('mymodel','wb') as f:
    pickle.dump(model,f)

Error: User Rate Limit Exceeded(such as being locked into an old version of sklearn).

Error: User Rate Limit ExceededError: User Rate Limit Exceeded:

In order to rebuild a similar model with future versions of scikit-learn, additional metadata should be saved along the pickled model:

The training data, e.g. a reference to a immutable snapshot

The python source code used to generate the model

The versions of scikit-learn and its dependencies

The cross validation score obtained on the training data

This is especially true for Ensemble estimatorsError: User Rate Limit Exceededtree.pyxError: User Rate Limit ExceededIsolationForestError: User Rate Limit Exceeded

Error: User Rate Limit ExceededjoblibError: User Rate Limit Exceeded

In the specific case of the scikit, it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators, but can only pickle to the disk and not to a string:

165

import cPickle
# save the classifier
with open('my_dumped_classifier.pkl', 'wb') as fid:
    cPickle.dump(gnb, fid)    

# load it again
with open('my_dumped_classifier.pkl', 'rb') as fid:
    gnb_loaded = cPickle.load(fid)
Error: User Rate Limit Exceeded
Error: User Rate Limit Exceeded
76

Model persistence

clf = some.classifier()
clf.fit(X, y)

1) Using Pickle

import pickle
# now you can save it to a file
with open('filename.pkl', 'wb') as f:
    pickle.dump(clf, f)

# and later you can load it
with open('filename.pkl', 'rb') as f:
    clf = pickle.load(f)

2) Using Joblib

from sklearn.externals import joblib
# now you can save it to a file
joblib.dump(clf, 'filename.pkl') 
# and later you can load it
clf = joblib.load('filename.pkl')

184

>>> from sklearn.externals import joblib
>>> from sklearn.datasets import load_digits
>>> from sklearn.linear_model import SGDClassifier

>>> digits = load_digits()
>>> clf = SGDClassifier().fit(digits.data, digits.target)
>>> clf.score(digits.data, digits.target)  # evaluate training error
0.9526989426822482

>>> filename = '/tmp/digits_classifier.joblib.pkl'
>>> _ = joblib.dump(clf, filename, compress=9)

>>> clf2 = joblib.load(filename)
>>> clf2
SGDClassifier(alpha=0.0001, class_weight=None, epsilon=0.1, eta0=0.0,
       fit_intercept=True, learning_rate='optimal', loss='hinge', n_iter=5,
       n_jobs=1, penalty='l2', power_t=0.5, rho=0.85, seed=0,
       shuffle=False, verbose=0, warm_start=False)
>>> clf2.score(digits.data, digits.target)
0.9526989426822482
Error: User Rate Limit Exceededscikit-learn.org/stable/tutorial/basic/…
Error: User Rate Limit ExceededfitError: User Rate Limit Exceededjoblib.loadError: User Rate Limit Exceededjoblib.dumpError: User Rate Limit Exceeded
Error: User Rate Limit Exceeded
Error: User Rate Limit Exceeded
Error: User Rate Limit Exceeded--pylabError: User Rate Limit Exceeded%pylabError: User Rate Limit Exceeded%matplotlib inlineError: User Rate Limit Exceeded
19

import pickle
with open('model.pkl', 'wb') as fout:
  pickle.dump((vectorizer, clf), fout)

with open('model.pkl', 'rb') as fin:
  vectorizer, clf = pickle.load(fin)

X_new = vectorizer.transform(new_samples)
X_new_preds = clf.predict(X_new)

vectorizer.stop_words_ = None

clf.sparsify()

clf.coef_ = scipy.sparse.csr_matrix(clf.coef_)

Error: User Rate Limit Exceeded
Error: User Rate Limit Exceeded

Похожие вопросы