How to count number of a particular event before another event in SQL Bigquery?

VSR Source

I have a table containing date and events and users. There is event named 'A'. I want to find out how many time a particular event occurred before and after event 'A' in Sql Bigquery. for Example,

   User           Date             Events
    123          2018-02-13            X.Y.A
    123          2018-02-12            X.Y.B
    134          2018-02-10            Y.Z.A
    123          2018-02-11            A
    123          2018-02-01            X.Y.Z
    134          2018-02-05            X.Y.B
    134          2018-02-04            A

The Output would be something like this

User       Event    Before   After
123          A      1        3
134          A      0        1

The event that I have to count contains a particular prefix. Means I have to check events that start with ( X.Y.then some event name). So, X.Y.SomeEvent are the events that I have to set counter for. Any Suggestions?

sqlgoogle-bigquerylegacy-sql

Answers

answered 3 months ago Gordon Linoff #1

User window functions to find the date when "A" occurs. Then use conditional aggregation to count the events before and after:

select userid,
       sum(case when date < a_date and event like 'X.Y%' then 1 else 0 end) as before,
       sum(case when date > a_date and event like 'X.Y%' then 1 else 0 end) as before
from (select t.*,
             min(case when event = 'A' then date end) over (partition by userid) as a_date
      from t
     ) t
group by userid

answered 3 months ago Mikhail Berlyant #2

below is for BigQuery SQL

#standardSQL
SELECT user, event, before, after 
FROM (
  SELECT user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING ) after
  FROM `project.dataset.events`
)
WHERE event = 'A'
-- ORDER BY user

you can test it with dummy data in your question

#standardSQL
WITH `project.dataset.events` AS (
  SELECT 123 user, '2018-02-13' dt, 'X.Y.A' event UNION ALL
  SELECT 123, '2018-02-12', 'X.Y.B' UNION ALL
  SELECT 123, '2018-02-11', 'A' UNION ALL
  SELECT 134, '2018-02-10', 'Y.Z.A' UNION ALL
  SELECT 134, '2018-02-05', 'X.Y.B' UNION ALL
  SELECT 134, '2018-02-04', 'A' UNION ALL
  SELECT 123, '2018-02-01', 'X.Y.Z' 
)
SELECT user, event, before, after 
FROM (
  SELECT user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING ) after
  FROM `project.dataset.events`
)
WHERE event = 'A'
ORDER BY user

comments powered by Disqus