Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
690 views
in Technique[技术] by (71.8m points)

python - How to flatten a nested JSON into a pandas dataframe

I have a bit of a tricky JSON I want to put into a dataframe.

{'A': {'name': 'A',
  'left_foot': [{'toes': '5'}],
  'right_foot': [{'toes': '4'}]},
 'B': {'name': 'B',
  'left_foot': [{'toes': '3'}],
  'right_foot': [{'toes': '5'}]},
...
}

I don't need the first layer with A and B as it is part of name. There will always only be one left_foot and one right_foot.

The data I want is as follows:

     name  left_foot.toes right_foot.toes
0       A           5           4
1       B           3           5

Using this post is was able to get the feet and toes but that is if you say data["A"]. Is there an easier way?

EDIT I have something like this, but I need to specify "A" in the first line.

df = pd.json_normalize(tickers["A"]).pipe(
    lambda x: x.drop('left_foot', 1).join(
        x.left_foot.apply(lambda y: pd.Series(merge(y)))
    )
).rename(columns={"toes": "left_foot.toes"}).pipe(
    lambda x: x.drop('right_foot', 1).join(
        x.right_foot.apply(lambda y: pd.Series(merge(y)))
    )).rename(columns={"toes": "right_foot.toes"})
question from:https://stackoverflow.com/questions/65851044/how-to-flatten-a-nested-json-into-a-pandas-dataframe

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
  • Given your data, each top level key (e.g. 'A' and 'B') is repeated as a value in 'name', therefore it will be easier to use pandas.json_normalize on only the values of the dict.
  • The 'left_foot' and 'right_foot' columns need be exploded to remove each dict from the list
  • The final step converts the columns of dicts to a dataframe and joins it back to df
  • It's not necessarily less code, but this should be significantly faster than the multiple applies used in the current code.
    • See this timing analysis comparing apply pandas.Series to just using pandas.DataFrame to convert a column.
  • If there are issues because your dataframe has NaN (e.g. missing dicts or lists) in the columns to be exploded and converted to a dataframe, see How to json_normalize a column with NaNs
import pandas as pd

# test data
data = {'A': {'name': 'A', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'B': {'name': 'B', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}, 'C': {'name': 'C', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'D': {'name': 'D', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}}

# normalize data.values and explode the dicts out of the lists
df = pd.json_normalize(data.values()).apply(pd.Series.explode).reset_index(drop=True)

# display(df)
  name      left_foot     right_foot
0    A  {'toes': '5'}  {'toes': '4'}
1    B  {'toes': '3'}  {'toes': '5'}
2    C  {'toes': '5'}  {'toes': '4'}
3    D  {'toes': '3'}  {'toes': '5'}

# extract the values from the dicts and create toe columns
df = df.join(pd.DataFrame(df.pop('left_foot').values.tolist())).rename(columns={'toes': 'lf_toes'})
df = df.join(pd.DataFrame(df.pop('right_foot').values.tolist())).rename(columns={'toes': 'rf_toes'})

# display(df)
  name lf_toes rf_toes
0    A       5       4
1    B       3       5
2    C       5       4
3    D       3       5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

56.9k users

...