Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
428 views
in Technique[技术] by (71.8m points)

pandas - I want to be able to create a polynomial function using the data frame column names as terms and the column values as their power raised to it

I want to be able to create a polynomial function using the data frame column names as terms and the column values as their powers raised, I've added an example below on what I am looking for, but unfortunately running out of ideas on how to do it

enter image description here

just to add to this, I have a data file separately with the column names which I will be feeding in, i just wanna be able to iter through each row and create a final function, not sure if it's possible but hardcoding this is a pain and very time consuming, any inputs are helpful

question from:https://stackoverflow.com/questions/65882103/i-want-to-be-able-to-create-a-polynomial-function-using-the-data-frame-column-na

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Given df for instance as:

   coefficient  Term1  Term2
0           25      1      0
1           36      2      0
2          -16      0      0
3            4      2      1

and a dataframe dfv with values:

   Term1  Term2
0      0      1
1      2      0
2      3      0

you can do

dfv.apply(lambda x: (np.c_[df.coefficient, x.to_numpy()**df.iloc[:,1:]]).prod(1).sum(), 1)

to get

0    -16
1    178
2    383

Full reproducible example with your sample data:
import pandas as pd
import numpy as np

np.random.seed(1)
term_cols = [f'Term{i}' for i in range(1,8)]
df = pd.DataFrame([[ 25,   1,   0,   0,   0,   0,   0,   0],
                   [ 36,   2,   0,   2,   0,   0,   0,   1],
                   [-16,   0,   0,   0,   0,   0,   1,   2],
                   [  4,   2,   1,   1,   0,   0,   0,   0]],
                  columns=['coefficient']+term_cols)

dfv = pd.DataFrame(np.random.randint(0, 5, (3,len(term_cols))), columns=term_cols)

print(dfv[term_cols].apply(lambda x: (np.c_[df.coefficient, x.to_numpy()**df[term_cols]]).prod(1).sum(), 1))

Result:

0       75
1      985
2    37220

Update: as per request in comment, here the same as formula:
s = ''
for r in df.iterrows():
    r = r[1].loc[r[1].ne(0)]
    s += f'{r[0]:+d}*' + '*'.join([f'(dfv.loc[i,"{k}"]**{v})' if v > 1 else f'(dfv.loc[i,"{k}"])' for k,v in zip(r[1:].index.tolist(), r[1:].tolist())])
    
print(s)

for i in dfv.index:
    print(eval(s))

Output:

+25*(dfv.loc[i,"Term1"])+36*(dfv.loc[i,"Term1"]**2)*(dfv.loc[i,"Term3"]**2)*(dfv.loc[i,"Term7"])-16*(dfv.loc[i,"Term6"])*(dfv.loc[i,"Term7"]**2)+4*(dfv.loc[i,"Term1"]**2)*(dfv.loc[i,"Term2"])*(dfv.loc[i,"Term3"])

75
985
37220

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...