I'm trying to work out how to illustrate in a table and graph, the frequencies (numbers
of groups) by permutation group sizes of my data.
My data consists of 8 variables (dog breeds) assigned, 'my_col', with 3 levels (colours) assigned, 'my_lev'.
I have generated a random dataset with 50,000 outputs.
So far I have calculated there are 6,557 possible unique rows (i.e. permutations of category levels).
As this is a random dataset, there are some similar rows of data, as indicated by the COUNT in this snapshot table;
df[, .(COUNT = .N), by = names(df)]
Poodle Labrador Pug Chihuahua Collie Shitzu Bulldog Lurcher COUNT
1: brown brown black black brown white black white 8
2: black white brown white black brown brown brown 7
3: white black brown brown black black black black 6
4: brown brown brown brown brown black black white 11
5: brown black black black white white brown white 10
---
6553: brown black white black brown white black brown 3
6554: brown black white white white brown black white 1
6555: brown black white white brown black brown black 1
6556: black white brown brown black white black black 1
6557: white white white black brown white brown white 1
I would like to end up with a new table with 2 headings, 'Group Size' and 'No. of Groups'.
How do I calculate how many groups are unique combinations (group size = 1), how many groups are made up of a pair of matching combinations (group size = 2), how many groups are made up three the same, etc?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…