Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
199 views
in Technique[技术] by (71.8m points)

r - How to select n rows such as the values of one given column are all distinct?

Good afternoon!

Assume I have the following matrix :

 Sensor location  Target location detection Probability
1                 7              13             0.2943036
2                21              15             0.2943036
3                16              13             0.2943036
4                18              15             0.2943036
5                21              15             0.2943036
6                 1               2             0.2943036
7                16              22             0.2943036
8                10               4             0.2943036
9                16              17             0.2943036
10                2               5             0.2943036
11               13              16             0.2943036
12                9              12             0.2943036
13                2               8             0.2943036
14                7               1             0.2943036
15                7              10             0.2943036
16                1               2             0.2943036
17               18              12             0.2943036
18               23              17             0.2943036
19               21              15             0.2943036
20               20              21             0.2943036
21                2               1             0.2943036
22               12              18             0.2943036
23               24              21             0.2943036
24               22              23             0.2943036
25                2               3             0.2943036
26               11              10             0.2943036
27                7              10             0.2943036
28                2               3             0.2943036
29               12               6             0.2943036
30                2               1             0.2943036
31               24              21             0.2943036
32               14               8             0.2943036

How can I sample from this matrix n rows such as the values of the second column are all distinct?

Example of desired output ( The Target location column values must be unique ):

with n=4 :

         Sensor location  Target location detection Probability 
4                18              15             0.2943036
7                16              22             0.2943036
8                10               4             0.2943036
9                16              17             0.2943036

Undesired output ( The value 15 is present more than once in the second column ) :

      Sensor location  Target location detection Probability 
4                18              15             0.2943036
2                21              15             0.2943036
8                10               4             0.2943036
9                16              17             0.2943036

I know that dplyr has functions like sample_n() and dplyr::distinct, I had tried :

data %>% distinct("Target location")

I hope my question is clear, Thank you a lot for your help!

question from:https://stackoverflow.com/questions/66065644/how-to-select-n-rows-such-as-the-values-of-one-given-column-are-all-distinct

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can do the following:

n <- 25
indicesToSampleFrom <- which(!duplicated(data[["Target location"]]))
data[sample(indicesToSampleFrom,n),]

Edit: If you want to apply this logic to other columns, it would be nice to check whether there are enough ditstinct values. So instead of sampling n rows, sample min(n,length(indicesToSampleFrom)).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...