Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

ggplot2 - Regressions lines for entire dataset in R (two regression lines for each group)

I have a dataframe with three columns, call it (X,Y,Z). Such that:

  • X is numeric variable
  • Y is a numeric variable
  • Z is a factor variable

I want to plot (using ggplot2) Y againts X and make color groups based on the factor variable Z. This I have managed!

Now I need to plot some regression lines, I know how to plot a regression line for each set of points belonging to the same category (i.e. same factor variable Z). However what I need is to plot TWO regression lines for each category (might seem weird but in the problem I am dealing with it is the way is always done). So, for each category (Z) I should have a regression line computed from the first n elements (belonging to that category) and a second regression line made from the remaining points in the given category, of course both of these lines should have the same color as they interpolate points in a given category (i.e. same color group).

Any help is very much appreciated! Thank you in advance

question from:https://stackoverflow.com/questions/65863986/regressions-lines-for-entire-dataset-in-r-two-regression-lines-for-each-group

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If the two ranges of x that you want to are independent and you want to generate 4 separate regression lines (as is my understanding of your question) then you can specify the data to use in 2 calls to geom_smooth(). Here, head() and tail() are indicating which values of x you want to regress on, assuming the points are ordered in df. If they are not ordered, you will need to do that first (e.g. using a call to arrange() by values on the x-axis).

library(tidyverse)

# some test data with 3 variables: a random response (y), a sequence (x), and a factor (z).
df<-tibble(x = seq(0.5, 25, 0.5),
           y = rnorm(50),
           z = sample(x = c("A", "B"), replace = T, size = 50))

# a plot with each factor of z coloured and 2 regression lines for each factor
ggplot(df, aes(x, y, colour = z))+
  geom_point()+
  geom_smooth(data = ~head(df, 30), method = "lm", se = F)+
  geom_smooth(data = ~tail(df ,20), method = "lm", se = F)+
  theme_minimal()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...