# How can I group by in R as in SQL?

I have this table `test`

``````test<-matrix(c(1,1,1,1,1,2,2,2,2,2,2011,2012,2012,2013,2014,2011,2013,2013,2014,2014,1,1,3,2,1,2,1,1,3,1),  10,3)
test<-as_data_frame(test)
colnames(test)<-c("T","Y","S")
``````

And I want to create a variable `x` that is a sum of variable `S` in the rows which year `Y` is the same as the line or one year before.

That is what I am expecting:

``````test<-cbind(test,c(1,5,5,6,3,2,4,4,6,6))
colnames(test)[4]<-"x"
``````

I think in SQL is something like this (as a I remember at least):

``````proc sql;
create table test as select
a.T,
a.Y,
sum(case when Y eq a.Y or Y eq a.Y+1 then S else 0 end) as x
from test as a
group by T, Y;
end;
``````
rgroup-byaggregate

answered 4 months ago DJV #1

If I understood you correctly, you can use the `tidyverse` approach.

``````require(tidyverse)

test %>%
group_by(Y) %>%
mutate(x = sum(S, na.rm = TRUE)) %>%
ungroup()

T     Y     S     x
<dbl> <dbl> <dbl> <dbl>
1    1. 2011.    1.    3.
2    1. 2012.    1.    4.
3    1. 2012.    3.    4.
4    1. 2013.    2.    4.
5    1. 2014.    1.    5.
6    2. 2011.    2.    3.
7    2. 2013.    1.    4.
8    2. 2013.    1.    4.
9    2. 2014.    3.    5.
10    2. 2014.    1.    5.
``````

answered 4 months ago G. Grothendieck #2

Try the following left self join:

``````library(sqldf)

sqldf("select a.*, sum(b.S) as x
from test a
left join test b on a.T = b.T and b.Y between a.Y-1 and a.Y
group by a.rowid")
``````

giving:

``````   T    Y S x
1  1 2011 1 1
2  1 2012 1 5
3  1 2012 3 5
4  1 2013 2 6
5  1 2014 1 3
6  2 2011 2 2
7  2 2013 1 2
8  2 2013 1 2
9  2 2014 3 6
10 2 2014 1 6
``````

## Note

This was used as input to produce the output above:

``````test <- structure(list(T = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), Y = c(2011,
2012, 2012, 2013, 2014, 2011, 2013, 2013, 2014, 2014), S = c(1,
1, 3, 2, 1, 2, 1, 1, 3, 1)), row.names = c(NA, -10L), class = "data.frame")
``````

answered 4 months ago MKR #3

One option using `dplyr::left_join` can be achieved using `self-join`. The concept is that join `test` with `test` (after increasing `Y` by `1`). Now, if you join with `left_join` than each row will be joined with a row belonging to 1 less value of `Y`. At the end, one has to `sum` both `(S.x, S.y)` the columns (rowwise).

``````library(tidyverse)

test %>% left_join(mutate(., Y = Y+1), by=c("T", "Y")) %>%
rowwise() %>%
mutate(x = sum(S.x, S.y, na.rm = TRUE)) %>%
select(T, Y, S = S.x, x) %>%
as.data.frame()
#    T    Y S x
# 1  1 2011 1 1
# 2  1 2012 1 2
# 3  1 2012 3 4
# 4  1 2013 2 3
# 5  1 2013 2 5
# 6  1 2014 1 3
# 7  2 2011 2 2
# 8  2 2013 1 1
# 9  2 2013 1 1
# 10 2 2014 3 4
# 11 2 2014 3 4
# 12 2 2014 1 2
# 13 2 2014 1 2
``````

answered 4 months ago Carlos Eduardo Lagosta #4

I did not exactly understood what you are trying to calculate, but you can try to use data.tables. The syntax is `data.table[WHERE, SELECT, GROUP_BY]`, which is familiar if you're accustomed to SQL. It would be something like this:

``````library(data.table)

test.dt <- as.data.table(test)

test.dt[ Y >= Y-1, x := sum(S), by = .(T, Y) ]
``````

Where `:=` indicates to create a new column named "x" (without it it will only display the result).