| group_by {dplyr} | R Documentation |
Most data operations are done on groups defined by variables.
group_by() takes an existing tbl and converts it into a grouped tbl
where operations are performed "by group". ungroup() removes grouping.
group_by(.data, ..., add = FALSE, .drop = group_by_drop_default(.data)) ungroup(x, ...)
.data |
a tbl |
... |
Variables to group by. All tbls accept variable names. Some tbls will accept functions of variables. Duplicated groups will be silently dropped. |
add |
When |
.drop |
When |
x |
A |
A grouped data frame, unless the combination of ... and add
yields a non empty set of grouping columns, a regular (ungrouped) data frame
otherwise.
group_by() is an S3 generic with methods for the three built-in
tbls. See the help for the corresponding classes and their manip
methods for more details:
data.frame: grouped_df
data.table: dtplyr::grouped_dt
SQLite: src_sqlite()
PostgreSQL: src_postgres()
MySQL: src_mysql()
The three scoped variants (group_by_all(), group_by_if() and
group_by_at()) make it easy to group a dataset by a selection of
variables.
Other grouping functions: group_by_all,
group_indices, group_keys,
group_map, group_nest,
group_rows, group_size,
group_trim, groups
by_cyl <- mtcars %>% group_by(cyl)
# grouping doesn't change how the data looks (apart from listing
# how it's grouped):
by_cyl
# It changes how it acts with the other dplyr verbs:
by_cyl %>% summarise(
disp = mean(disp),
hp = mean(hp)
)
by_cyl %>% filter(disp == max(disp))
# Each call to summarise() removes a layer of grouping
by_vs_am <- mtcars %>% group_by(vs, am)
by_vs <- by_vs_am %>% summarise(n = n())
by_vs
by_vs %>% summarise(n = sum(n))
# To removing grouping, use ungroup
by_vs %>%
ungroup() %>%
summarise(n = sum(n))
# You can group by expressions: this is just short-hand for
# a mutate/rename followed by a simple group_by
mtcars %>% group_by(vsam = vs + am)
# By default, group_by overrides existing grouping
by_cyl %>%
group_by(vs, am) %>%
group_vars()
# Use add = TRUE to instead append
by_cyl %>%
group_by(vs, am, add = TRUE) %>%
group_vars()
# when factors are involved, groups can be empty
tbl <- tibble(
x = 1:10,
y = factor(rep(c("a", "c"), each = 5), levels = c("a", "b", "c"))
)
tbl %>%
group_by(y) %>%
group_rows()