Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
144 views
in Technique[技术] by (71.8m points)

python - Plotting number of texts through time

I am trying to plot the frequency through time of a dataset.

Date              Col1                 Col2            Label
0   2020-05-28  It is not true that ... www.love.com    COOL
1   2020-05-28  Japan, tourism ...  www.travel.com      COOL
2   2020-05-31  You are the best     loving                 1
3   2020-05-31  Incredible!!! You won  who                  0
4   2020-05-28  Mickey Mouse rules the world!  myphone.com  1

I would like to plot the number of texts by date. I did as follows

df_plot = df.groupby(["Date"]).count().reset_index()
df_plot

Then I used seaborn to plot the frequency as follows:

import seaborn as sns

df_plot['Date'] =pd.to_datetime(df_plot.Date)
sns.scatterplot(x = 'Date', y = 'Col2', hue='Label', data = df_plot)

but the output is not as I would expect (x axis does not show months so I have one column and no possibility to spot the trend).

Can you please have a look at these steps and tell me if I am doing something wrong?

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Update

From the OP's image, it looks like there may be some spurious or earlier dates in the df. When I try with the example data provided, it works just fine. Here is a way to ensure the data is clean:

df = df.assign(
    Date=pd.to_datetime(df['Date'])
).set_index('Date').sort_index()

# then, truncate anything before year 2020 in the plot:
ax = sns.scatterplot(
    x='Date', y='Col2', hue='Label',
    data=df.truncate(before='2020-01-01').groupby('Date').count())

# additionally, enforce a desired date format
from matplotlib.dates import DateFormatter

ax.xaxis.set_major_formatter(DateFormatter("%Y-%m-%d"))
ax.xaxis.set_tick_params(rotation=30)

Outcome (based on the example data):

enter image description here

Original answer:

Why not use sns.barplot?

sns.barplot(x='Date', y='Col2', hue='Label', data=df_plot)

But personally, in that case I would prefer just making a series, and using the builtin pandas.plot:

df.assign(
    Date=pd.to_datetime(df['Date'])
).groupby(['Date']).size().plot.bar()

or, if you want a scatterplot:

df.assign(
    Date=pd.to_datetime(df['Date'])
).groupby(['Date']).size().plot(style='o')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...