Citizens of StackOverflow,
I am currently running iterations over a dataframe that can be millions of rows long. In each row of my dataframe I have leading NaNs (desired), followed by values. I want to only have X number of values in each row, followed by NaN's after that. Effectively I want a window of only X values, beginning with the first non-NaN and all other positions in the row will be NaN.
My solution is very slow. Additionally, I didn't find similar questions to be sufficiently helpful (most concerned just first/last NaN).
An example where the window size is 3:
import pandas as pd
import numpy as np
x = 3
data = {'2018Q3': [0, np.nan, np.nan, np.nan, np.nan],
'2018Q4': [1, np.nan, np.nan, np.nan, 10],
'2019Q1': [2, 3, np.nan, np.nan, 12],
'2019Q2': [3, 4, np.nan, 8, 14],
'2019Q3': [4, 5, np.nan, 9, 22]}
df = pd.DataFrame.from_dict(data)
print(df)
2018Q3 2018Q4 2019Q1 2019Q2 2019Q3
0 0.0 1.0 2.0 3.0 4.0
1 NaN NaN 3.0 4.0 5.0
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN 8.0 9.0
4 NaN 10.0 12.0 14.0 22.0
Results should look like this:
2018Q3 2018Q4 2019Q1 2019Q2 2019Q3
0 0.0 1.0 2.0 NaN NaN
1 NaN NaN 3.0 4.0 5.0
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN 8.0 9.0
4 NaN 10.0 12.0 14.0 NaN
MY SOLUTION:
def cut_excess_forecast(num_x, dataf):
Total_Col = len(dataf.columns.values) # total columns
df_NEW = pd.DataFrame()
for index, row in dataf.iterrows():
nas = row.isnull().sum(axis =0) # number of nulls
good_data = nas + num_x # gives number of columns that should be untouched
if good_data >= Total_Col: # if number of columns to not be touched > available columns, pass
pass # all data available is needed
else:
cutoff = Total_Col-good_data
row[-cutoff:] = np.nan #change to NaN excess columns in this row
df_NEW = df_NEW.append(row.copy()) #append changed row to new index
df_NEW.index = dataf.index #move over original index to the new dataframe
return df_NEW.copy()
df2 = cut_excess_forecast(x, df)
print(df2)
Sorting is allowed, so long as the index is untouched.
Cheers and thanks in Advance.
question from:
https://stackoverflow.com/questions/65836235/in-each-row-of-pandas-starting-at-the-first-non-nan-a-window-of-x-values-remain