Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
585 views
in Technique[技术] by (71.8m points)

python - Populate a Pandas SparseDataFrame from a SciPy Sparse Coo Matrix

(This question relates to "populate a Pandas SparseDataFrame from a SciPy Sparse Matrix". I want to populate a SparseDataFrame from a scipy.sparse.coo_matrix (specifically) The mentioned question is for a different SciPy Sparse Matrix (csr)... So here it goes...)

I noticed Pandas now has support for Sparse Matrices and Arrays. Currently, I create DataFrame()s like this:

return DataFrame(matrix.toarray(), columns=features, index=observations)

Is there a way to create a SparseDataFrame() with a scipy.sparse.coo_matrix() or coo_matrix()? Converting to dense format kills RAM badly...!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

http://pandas.pydata.org/pandas-docs/stable/sparse.html#interaction-with-scipy-sparse

A convenience method SparseSeries.from_coo() is implemented for creating a SparseSeries from a scipy.sparse.coo_matrix.

Within scipy.sparse there are methods that convert the data forms to each other. .tocoo, .tocsc, etc. So you can use which ever form is best for a particular operation.

For going the other way, I've answered

Pandas sparse dataFrame to sparse matrix, without generating a dense matrix in memory

Your linked answer from 2013 iterates by row - using toarray to make the row dense. I haven't looked at what the pandas from_coo does.

A more recent SO question on pandas sparse

non-NDFFrame object error using pandas.SparseSeries.from_coo() function


From https://github.com/pydata/pandas/blob/master/pandas/sparse/scipy_sparse.py

def _coo_to_sparse_series(A, dense_index=False):
    """ Convert a scipy.sparse.coo_matrix to a SparseSeries.
    Use the defaults given in the SparseSeries constructor. """
    s = Series(A.data, MultiIndex.from_arrays((A.row, A.col)))
    s = s.sort_index()
    s = s.to_sparse()  # TODO: specify kind?
    # ...
    return s

In effect it takes the same data, i, j used to build a coo matrix, makes a series, sorts it, and turns it into a sparse series.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...