python - sklearn's PLSRegression: "ValueError: array must not contain infs or NaNs" -


when using sklearn.cross_decomposition.plsregression:

import numpy np import sklearn.cross_decomposition  pls2 = sklearn.cross_decomposition.plsregression() xx = np.random.random((5,5)) yy = np.zeros((5,5) )   yy[0,:] = [0,1,0,0,0] yy[1,:] = [0,0,0,1,0] yy[2,:] = [0,0,0,0,1] #yy[3,:] = [1,0,0,0,0] # uncommenting line solves issue  pls2.fit(xx, yy) 

i get:

c:\anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:44: runtimewarning: invalid value encountered in divide   x_weights = np.dot(x.t, y_score) / np.dot(y_score.t, y_score) c:\anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:64: runtimewarning: invalid value encountered in less   if np.dot(x_weights_diff.t, x_weights_diff) < tol or y.shape[1] == 1: c:\anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:67: userwarning: maximum number of iterations reached   warnings.warn('maximum number of iterations reached') c:\anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:297: runtimewarning: invalid value encountered in less   if np.dot(x_scores.t, x_scores) < np.finfo(np.double).eps: c:\anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:275: runtimewarning: invalid value encountered in less   if np.all(np.dot(yk.t, yk) < np.finfo(np.double).eps): traceback (most recent call last):   file "c:\svn\hw4\code\test_plsr2.py", line 8, in <module>     pls2.fit(xx, yy)   file "c:\anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py", line 335, in fit     linalg.pinv(np.dot(self.x_loadings_.t, self.x_weights_)))   file "c:\anaconda\lib\site-packages\scipy\linalg\basic.py", line 889, in pinv     = _asarray_validated(a, check_finite=check_finite)   file "c:\anaconda\lib\site-packages\scipy\_lib\_util.py", line 135, in _asarray_validated     = np.asarray_chkfinite(a)   file "c:\anaconda\lib\site-packages\numpy\lib\function_base.py", line 613, in asarray_chkfinite     "array must not contain infs or nans") valueerror: array must not contain infs or nans 

what issue?

i aware of scikit-learn github issue #2089, since use scikit-learn 0.16.1 (with python 2.7.10 x64) problem should solved (the code snippets mentioned in github issue work fine).

please check if of values being passed in nan or inf:

np.isnan(xx).any() np.isnan(yy).any()  np.isinf(xx).any() np.isinf(yy).any() 

if of yields true. remove nan entries or inf entries. e.g. can set them 0 with:

xx = np.nan_to_num(xx) yy = np.nan_to_num(yy) 

it's possible numpy fed such large positive , negative , zeroed values, equations deep down in library producing zeros, nan's or inf's. 1 workaround, oddly enough, send in smaller numbers (say representative numbers between -1 , 1. 1 way standardization, see: https://stackoverflow.com/a/36390482/445131

if none of solves problem, may dealing low level bug in library using, or sort of singularity in data. create sscce , post stackoverflow or create new bug report on library maintaining software.


Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -