This is only meant to supplement the accepted answer.
According to Hadley Wickkam, the author of ggplot2, in his book 'ggplot2: Elegant Graphics for Data Analysis' (link here) on p. 91, of section 5.2 'Building a plot layer by layer' :
You only need to set one of stat and geom: every geom has a default stat, and every stat has a default geom.
The accepted answer above explains well why the two are different. This is meant to explain why they are difficult to distinguish in practice -- whenever you use a geom layer, you are also implicitly using a stat layer (even if it is just the identity transformation); likewise, whenever you use a stat layer, you are also implicitly using a geom layer.
If you are fine with the defaults used by either layer, then it would be redundant to state explicitly both layers. Even if you are not fine with the defaults provided by either layer, you can modify the defaults as parameters to each layer (i.e. you can modify the default geom as a parameter to pass to any stat_*
function, and you can modify the default stat as a parameter to pass to any geom_*
function). In the words of Hadley Wickham (same source as above):
You can pass params in ...
(in which case stat and geom parameters are automatically teased apart)
This is kind of difficult to understand conceptually, which is why I have had this question as well. In his paper about the philosophy underlying ggplot2, found here, in Section 4, a 'Hierarchy of Defaults', Hadley Wickham explains the practical considerations behind this default behavior in terms of simplifying code which would otherwise unnecessarily long.
For example, without default specifications, and using the grammar of graphics alone, the code for a simple scatter plot might look like:
ggplot() +
layer(
data = diamonds, mapping = aes(x = carat, y = price),
geom = "point", stat = "identity", position = "identity"
) +
scale_y_continuous() +
scale_x_continuous() +
coord_cartesian()
Using defaults for the scales and coordinates, we can write something instead like:
ggplot(data = Diamonds, aes(x = carat, y = price)) +
layer(
geom = "point", stat = "identity", position = "identity"
)
But this is still annoyingly long of course, since the values of stat
and position
are just "identity"
, which basically means 'do nothing' -- so why have to say that explicitly?
However, the layer()
function does not have default values for stat
or position
-- they need to be specified explicitly in a call to the layer()
function.
To get around this, Hadley made the geom_*
functions as well as the stat_*
functions as wrappers to the layer()
function which have default values for both the geom
and stat
parameter. The difference between the stat_*
and geom_*
functions is which parameter has an immutable (unchangeable) default value, stat
or geom
.
Source: http://ggplot2.tidyverse.org/reference/layer.html
So for the geom_*
functions you can change the default value of the stat
parameter but not the default value of the geom
parameter, while for the stat_*
functions you can change the default value of the geom
parameter but not the default value of the stat
parameter.
A layer is a combination of data, stat and geom with a potential position adjustment. Usually layers are created using geom_*
or stat_*
calls but it can also be created directly using this function [the layer()
function].