Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
413 views
in Technique[技术] by (71.8m points)

machine learning - Can I use autoencoders for dimensionality reduction with a very small dataset?

I have a numeric dataset with just 55 samples and 270 features. I'm trying to separate these samples into clusters, however, it is hard to perform clustering in such high-dimensional space. Thus, I'm thinking about using autoencoders for performance dimensionality reduction, but I'm not sure if it is possible to do that with such a small dataset. Notice that this application is useful because the idea is to allow it to deal with different datasets that have similar characteristics.

With the following code, using Mean Squared Error as the loss function, I have achieved a loss of 4.9, which I think that is high. Notice that the dataset is already normalized.

Is it possible to use autoencoders for dimensionality reduction in this case?

This is the source for building the autoencoder and training it:

import keras
import tensorflow as tf
from keras import layers
from keras import regularizers
from keras.datasets import mnist
import numpy as np
import matplotlib.pyplot as plt
from keras.callbacks import EarlyStopping

features = 0
preservationRatio = 0.99
epochs = 500

data = loadData("dataset.csv")
samples = len(data)
features = len(data[0])

x_train = data
x_test = data

es = EarlyStopping(monitor='val_loss', mode='min', verbose=1,min_delta=0.001, patience=50)

encoding_dim = int(features*preservationRatio)

input_number = keras.Input(shape=(features,))
# Add a Dense layer with a L1 activity regularizer
encoded = layers.Dense(encoding_dim, activation='relu',
                activity_regularizer=regularizers.l1(1e-7))(input_number)
decoded = layers.Dense(features, activation='sigmoid')(encoded)

autoencoder = keras.Model(input_number, decoded)

encoder = keras.Model(input_number, encoded)

# This is our encoded input
encoded_input = keras.Input(shape=(encoding_dim,))
# Retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# Create the decoder model
decoder = keras.Model(encoded_input, decoder_layer(encoded_input))

autoencoder.compile(optimizer='adam', loss=tf.keras.losses.MeanSquaredError())

x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape)
print(x_test.shape)

history = autoencoder.fit(x_train, x_train,
                epochs=epochs,
                batch_size=20,
                callbacks=[es],
                shuffle=True,
                validation_data=(x_test, x_test),
                verbose = 1)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...