pandas - Standard implementation of vectorize_sequences

Question

Welcome To Ask or Share your Answers For Others

pandas - Standard implementation of vectorize_sequences

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

pandas - Standard implementation of vectorize_sequences

In Fran?ois Chollet's Deep Learning with Python, appears this function:

def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results

I understand what this function does. This function is asked about in this quesion and in this question as well, also mentioned here, here, here, here, here & here. Despite being so wide-spread, this vectorization is, according to Chollet's book is done "manually for maximum clarity." I am interested whether there is a standard, not "manual" way of doing it.

Is there a standard Keras / Tensorflow / Scikit-learn / Pandas / Numpy implementation of a function which behaves very similarly to the function above?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:11:50+0000

Solution with `MultiLabelBinarizer`

Assuming sequences is an array of integers with maximum possible value upto dimension-1, we can use MultiLabelBinarizer from sklearn.preprocessing to replicate the behaviour of the function vectorize_sequences

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer(classes=range(dimension))
mlb.fit_transform(sequences)

Solution with Numpy broadcasting

Assuming sequences is an array of integers with maximum possible value upto dimension-1

(np.array(sequences)[:, :, None] == range(dimension)).any(1).view('i1')

Worked out example

>>> sequences
[[4, 1, 0], 
 [4, 0, 3],
 [3, 4, 2]]

>>> dimension = 10
>>> mlb = MultiLabelBinarizer(classes=range(dimension))
>>> mlb.fit_transform(sequences)

array([[1, 1, 0, 0, 1, 0, 0, 0, 0, 0],
       [1, 0, 0, 1, 1, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 0, 0, 0]])


>>> (np.array(sequences)[:, :, None] == range(dimension)).any(1).view('i1')

array([[0, 1, 1, 1, 0, 0, 0, 0, 0, 0],
       [1, 0, 1, 0, 1, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 1, 0, 0, 0, 0, 0]])

Categories

pandas - Standard implementation of vectorize_sequences

pandas - Standard implementation of vectorize_sequences

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Solution with `MultiLabelBinarizer`

Solution with Numpy broadcasting

Worked out example

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

pandas - Standard implementation of vectorize_sequences

pandas - Standard implementation of vectorize_sequences

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Solution with MultiLabelBinarizer

Solution with Numpy broadcasting

Worked out example

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Solution with `MultiLabelBinarizer`