Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
336 views
in Technique[技术] by (71.8m points)

python - How to preserve column names starting with a minus when using numpy.genfromtxt?

Similar to this question, numpy.genfromtxt modifies my columns' names:

import numpy as np
from io import BytesIO  # https://stackoverflow.com/a/11970414/321973

str = 'x,-1,1
0,1,1
1,2,3'
data = np.genfromtxt(BytesIO(str.encode()), delimiter=',', names=True)
print(data.dtype.names)

yields ('x', '1', '1_1') instead of the desired ('x', '-1', '1') (or even better, ('x', -1, 1)). I tried deletechars="""~!@#$%^&*()=+~|]}[{';: /?>,<""" as suggested there to no avail.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The behavior you're seeing is caused by the fact that np.genfromtxt uses the NameValidator class here to automatically strip certain non-alphanumeric characters from the field names.

It's perfectly legal for a field name to contain a '-' character, e.g.:

x = np.array((1,), dtype=[('-1', 'i')])
print(x['-1'])
# 1

In fact, two out of three of the modified field names you get back from np.genfromtxt are also not "valid Python identifiers" ('1' and '1_1', since they start with digits).

It's therefore possible to construct the array you describe as long as you bypass using np.genfromtxt to set the field names. One way to do it would be to initialize an empty array, specify the field names and dtypes explicitly, then fill it with the rest of the string contents:

names = str.splitlines()[0].split(',')
types = ('i',) * 3
dtype = zip(names, types)

data = np.empty(2, dtype=dtype)
data[:] = np.genfromtxt(BytesIO(str.encode()), delimiter=',', dtype=dtype,
                        skiprows=1)
print(repr(data))
# array([(0, 0, 1), (1, 0, 2)], 
#       dtype=[('x', '<i4'), ('-1', '<i4'), ('1', '<i4')])

However, just because you can doesn't mean you should - there may well be other unpredictable consequences to having a '-' in one of your field names. The safest option is to stick with using only valid Python identifiers as field names.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...