I'm having a movie rating prediction problem that I'm trying to solve for my personal machine learning practice.
I have 2 csv files. One with movies(movieId, title, genres) and one with ratings(userId, movieId,rating, timestamp).
After doing some data preprocessing, apply word embeddings for movie titles and one-hot encoding for genres and shuffle my final dataframe Ι came up to this
userId movieId rating embeddings genres
0 545 2020 5.0 [0.081246674, 0.046522498, -0.014943261, 0.025... [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, ...
1 427 3186 2.0 [0.09334839, 0.057055157, -0.020527517, 0.0301... [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, ...
2 102 2144 3.0 [0.062349755, 0.04466611, -0.011009981, 0.0187... [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
3 30 5927 4.0 [0.18021354, 0.119208135, -0.036116328, 0.0466... [0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, ...
4 537 1022 3.0 [0.026805451, 0.025356086, -0.004603084, 0.013... [0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, ...
... ... ... ... ... ...
After that I tried to split my data to X and y and apply train-test-split
X = df_ratings.drop(['rating'], axis=1).values
y = df_ratings['rating'].values
Tried to :
X = np.asarray(X).astype('float32')
y = np.asarray(y).astype('float32')
TypeError Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-46-ee1241369db7> in <module>
----> 1 X = np.asarray(X).astype('float32')
2 y = np.asarray(y).astype('float32')
ValueError: setting an array element with a sequence
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)
I created my model (I don't know if this is fully correct) and tried to fit my train-data to the model
def movies_model():
model = Sequential()
# Add layers
model.add(Dense(512, input_dim = X_train.shape[1], activation='relu'))
model.add(Dense(512, input_dim = X_train.shape[1], activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation="linear"))
return model
optimizer = tf.keras.optimizers.Adam()
model.compile(optimizer=optimizer, loss='mean_absolute_error')
history = model.fit(X_train, y_train, epochs=10, batch_size=1024, verbose=1)
I got this error :
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).
embeddings and genres are of type numpy.ndarray.
I've searched for a hint but with no result.
I would appreciate if you could help me figure out where the error came from. (I also have tried to convert embeddings and genres to other types)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100004 entries, 0 to 100003
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 userId 100004 non-null int64
1 movieId 100004 non-null int64
2 rating 100004 non-null float64
3 embeddings 100004 non-null object
4 genres 100004 non-null object
dtypes: float64(1), int64(2), object(2)
question from:
https://stackoverflow.com/questions/65887340/machine-learning-prediction-failed-to-convert-a-numpy-array-to-a-tensor