Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
223 views
in Technique[技术] by (71.8m points)

r - How to split data into training/testing sets using sample function

I've just started using R and I'm not sure how to incorporate my dataset with the following sample code:

sample(x, size, replace = FALSE, prob = NULL)

I have a dataset that I need to put into a training (75%) and testing (25%) set. I'm not sure what information I'm supposed to put into the x and size? Is x the dataset file, and size how many samples I have?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There are numerous approaches to achieve data partitioning. For a more complete approach take a look at the createDataPartition function in the caTools package.

Here is a simple example:

data(mtcars)

## 75% of the sample size
smp_size <- floor(0.75 * nrow(mtcars))

## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(mtcars)), size = smp_size)

train <- mtcars[train_ind, ]
test <- mtcars[-train_ind, ]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...