Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
267 views
in Technique[技术] by (71.8m points)

tensorflow - Passing a dict of tensors to a Keras model

I am trying to preprocess the infamous Titanic data (from Kaggle) by following this tutorial. Everything was okay until I get to run the titanic_processing Model on the data (titanic_features) and I get this error:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

In the tutorial it is mentioned that one should transform the data into a dict of tensors, but:

  1. I don't see how the code (see HERE1 tag in my code below) makes a dict of tensors (there is no tf.convert_to_tensor for example)

  2. I don't understand why one should retransform all the data as the previous code was suppose to do just that (when one create preprocessed_inputs etc.)

Here is my code, but you can also execute it on Google Colab here.

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing


url = "https://raw.githubusercontent.com/aymeric75/IA/master/train.csv"
titanic = pd.read_csv(url)


titanic_features = titanic.copy()
titanic_labels = titanic_features.pop('Survived')


inputs = {}

for name, column in titanic_features.items():
    dtype = column.dtype
    if dtype == object:
        dtype = tf.string
    else:
        dtype = tf.float32
    inputs[name] = tf.keras.Input(shape=(1,), name=name, dtype=dtype)

numeric_inputs = {name:input for name,input in inputs.items()
                  if input.dtype==tf.float32}

x = layers.Concatenate()(list(numeric_inputs.values()))
norm = preprocessing.Normalization()
norm.adapt(np.array(titanic[numeric_inputs.keys()]))

all_numeric_inputs = norm(x)
preprocessed_inputs = [all_numeric_inputs]


for name, input in inputs.items():
    if input.dtype == tf.float32:
        continue
    
    lookup = preprocessing.StringLookup(vocabulary=np.unique(titanic_features[name].dropna()))
    one_hot = preprocessing.CategoryEncoding(max_tokens=lookup.vocab_size())

    x = lookup(input)
    x = one_hot(x)
    preprocessed_inputs.append(x)


preprocessed_inputs_cat = layers.Concatenate()(preprocessed_inputs)
titanic_preprocessing = tf.keras.Model(inputs, preprocessed_inputs_cat)

titanic_features_dict = {}

# This model just contains the input preprocessing. You can run it to see what it does to your data.
# Keras models don't automatically convert Pandas DataFrames because
# it's not clear if it should be converted to one tensor or to a dictionary of tensors. So convert it to a dictionary of tensors:
# HERE1

titanic_features_dict = {name: np.array(value) 
                         for name, value in titanic_features.items()}

features_dict = {name:values[:1] for name, values in titanic_features_dict.items()}

titanic_preprocessing(features_dict)

Thanks a lot for you support!

Aymeric

[UPDATE] if you can answer question 2 ("I don't understand why one should retransform all the data as the previous code was suppose to do just that (when one create preprocessed_inputs etc.") then I will validate your answer, because I think I need to reformat the input indeed (but I don't see what it the point of doing all the code before...)

question from:https://stackoverflow.com/questions/65882803/passing-a-dict-of-tensors-to-a-keras-model

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In your case, the problem is caused by the fact that your feature "Cabin" contains some nan (Not a Number) values. Tensorflow is fine with nan in floating point and integer data types, but not for strings.

You can replace all those nan values with an empty strings in your pandas dataframe :

titanic_features["Cabin"] = titanic_features["Cabin"].fillna("")

The previous code simply declares a preprocessing function as a keras model. You don't actually preprocess any data until your call to the titanic_preprocessing model.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...