There are lots of different ways to do this.
If you are filtering the value of a single column, then you can use the .agg with a custom lambda function.
(df.groupby(["name"])
.agg(
num_dogs=("pet", lambda x: np.sum(x == "dog")),
num_cats=("pet", lambda x: np.sum(x == "cat")))
)
Or
(df
.groupby(["name", "pet"])
.size()
.unstack("pet", fill_value=0)
.add_prefix("num_").add_suffix("s")
)
You can also use a pivot table.
df.reset_index().pivot_table(index="name", columns="pet", values="index", aggfunc="count", fill_value=0)
But if you need to filter based on two columns, then that approach will not work. For example if you need to know how many old dogs.
df = pd.DataFrame({'name': ["Rashida", "Rashida", "Joe", "Joe"],
'pet': ['dog', 'cat', 'dog', 'dog'],
'age': ["old", "old", "old", "young"]})
You can use the pivot table.
df.reset_index().pivot_table(index="name", columns=["pet", "age"], values="index", aggfunc="count", fill_value=0)
Or a crosstabs.
pd.crosstab(df["name"], [df["pet"], df["age"]], dropna=False).unstack().reset_index()
Or you can use the port of Dplyr called siuba to mimic the original R syntax but I haven't used this enough to know how to use it well.
from siuba import group_by, summarize, _