Cleaning origin-destination data in R

Sam Source

I have trip data that looks something like this

ClientID <- c("45675")
Date <- c("10/10/2016")
PickUpAddress <- c("123 Street", "45 Way", "66 Blvd")
DropOffAddress <- c("45 Way", "66 Blvd", "123 Street")
PickUpTime <- c("08:00", "17:00", "18:00")
DropOffTime <- c("8:30", "17:30", "19:00")

df <- data.frame(ClientID, Date, PickUpAddress, DropOffAddress, PickUpTime, DropOffTime)

df
  ClientID       Date PickUpAddress DropOffAddress PickUpTime DropOffTime
1    45675 10/10/2016    123 Street         45 Way      08:00        8:30
2    45675 10/10/2016        45 Way        66 Blvd      17:00       17:30
3    45675 10/10/2016       66 Blvd     123 Street      18:00       19:00

But with thousands of records and varying numbers of trips per client though the year.

The third row in this example is the return trip (the trip to the original origin). I would like to remove all return trips from the database.

Any suggestions?

rdplyr

Answers

answered 3 months ago ANG #1

You can try the following solution which is based of the definition of client home address.

library(dplyr)
library(lubridate)

# create date/time format variables
df$Date_PickUpTime <- paste(df$Date, df$PickUpTime, sep = " ")
df$Date_DropOffTime <- paste(df$Date, df$DropOffTime, sep = " ")

df$Date_PickUpTime <- mdy_hm(df$Date_PickUpTime)
df$Date_DropOffTime <- mdy_hm(df$Date_DropOffTime)

str(df) # as you can see Date_PickUpTime and Date_DropOffTime are in POSIXct format

# define the client home address
df %>%
  group_by(ClientID) %>%                 # group by client
  arrange(Date_PickUpTime) %>%           # order the data by Date_PickUpTime
  mutate(HomeAddress = PickUpAddress[1]) # client home address is the first PickUpAddress

# ... then add filter to the above code

df %>%
  group_by(ClientID) %>% # group by client
  arrange(Date_PickUpTime) %>%      # order the data
  mutate(HomeAddress = PickUpAddress[1]) %>% # client home address
  filter(DropOffAddress != HomeAddress) # condition for filter:
                                        # DropOffAddress is different to HomeAddress
                                        # return trip (3rd) is not selected

comments powered by Disqus