Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
326 views
in Technique[技术] by (71.8m points)

python - 在pandas数据框中选择多个列(Selecting multiple columns in a pandas dataframe)

I have data in different columns but I don't know how to extract it to save it in another variable.

(我在不同的列中有数据,但是我不知道如何提取数据以将其保存在另一个变量中。)

index  a   b   c
1      2   3   4
2      3   4   5

How do I select 'a' , 'b' and save it in to df1?

(如何选择'a''b'并将其保存到df1?)

I tried

(我试过了)

df1 = df['a':'b']
df1 = df.ix[:, 'a':'b']

None seem to work.

(似乎没有任何工作。)

  ask by user1234440 translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The column names (which are strings) cannot be sliced in the manner you tried.

(列名(字符串)无法按照您尝试的方式进行切片。)

Here you have a couple of options.

(在这里,您有两个选择。)

If you know from context which variables you want to slice out, you can just return a view of only those columns by passing a list into the __getitem__ syntax (the []'s).

(如果您从上下文中知道要切出哪些变量,则可以通过将列表传递给__getitem__语法([])来仅返回那些列的视图。)

df1 = df[['a','b']]

Alternatively, if it matters to index them numerically and not by their name (say your code should automatically do this without knowing the names of the first two columns) then you can do this instead:

(另外,如果需要对它们进行数字索引而不是按其名称进行索引(例如,您的代码应在不知道前两列名称的情况下自动执行此操作),则可以执行以下操作:)

df1 = df.iloc[:,0:2] # Remember that Python does not slice inclusive of the ending index.

Additionally, you should familiarize yourself with the idea of a view into a Pandas object vs. a copy of that object.

(此外,您应该熟悉Pandas对象与该对象副本的视图概念。)

The first of the above methods will return a new copy in memory of the desired sub-object (the desired slices).

(上述方法中的第一个将在内存中返回所需子对象(所需切片)的新副本。)

Sometimes, however, there are indexing conventions in Pandas that don't do this and instead give you a new variable that just refers to the same chunk of memory as the sub-object or slice in the original object.

(但是,有时熊猫中有一些索引约定不这样做,而是为您提供了一个新变量,该变量仅引用与原始对象中的子对象或切片相同的内存块。)

This will happen with the second way of indexing, so you can modify it with the copy() function to get a regular copy.

(第二种索引方式将发生这种情况,因此您可以使用copy()函数对其进行修改以获取常规副本。)

When this happens, changing what you think is the sliced object can sometimes alter the original object.

(发生这种情况时,更改您认为是切片对象的内容有时会更改原始对象。)

Always good to be on the look out for this.

(始终对此保持警惕。)

df1 = df.iloc[0,0:2].copy() # To avoid the case where changing df1 also changes df

To use iloc , you need to know the column positions (or indices).

(要使用iloc ,您需要知道列位置(或索引)。)

As the column positions may change, instead of hard-coding indices, you can use iloc along with get_loc function of columns method of dataframe object to obtain column indices.

(由于列位置可能会发生变化,因此可以使用ilocget_loc对象的columns方法的get_loc函数一起使用,而不用对索引进行硬编码,以获取列索引。)

{df.columns.get_loc(c):c for idx, c in enumerate(df.columns)}

Now you can use this dictionary to access columns through names and using iloc .

(现在,您可以使用此词典通过名称和iloc访问列。)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...