Error with memory space using read_csv in r

jesusgarciab Source

I am trying to analyze 6 really big CSV files (at least big for our computing resources). I developed a script that is able to analyze all the files. But I kept getting errors telling me "Cannot read file Z:~/AC2561 Images for settings/EMT1.csv: Not enough storage is available to process this command.". I used rm() in order to erase variables that were no longer necessary and got it to run "manually" by changing the index myself. But I was not able to run the for loop.

I will need to run this scripts on a big number of files in the future, so it will be really useful to get the for loop (or another alternative to work)

#loading necessary packages
library(dplyr)
library(readr)

#making list of file names (6 csv files in this case)
temp = list.files(pattern="*.csv")

#starting empty list to save summary per file
summary_data <- list()


for (i in seq_along(temp)){
    df <- read_csv(temp[i])%>%
        #selecting only the columns I will work with (original file has 68 columns)
      select(`Dye 2 Positive`, `Dye 3 Positive`, `Dye 4 Positive`, `Dye 4 Cytoplasm Intensity` )

    #Each of these objects is a distinct subset of cells 
    #I'm trying to measure n() of `Dye 4 Positive` and mean() of `Dye 4 Cytoplasm Intensity`

    FITCPos_CD45Neg <- df%>%
        filter(`Dye 2 Positive` == 1, `Dye 3 Positive` == 0)%>%
        summarise(Total_GD2_count = n(), MFI = mean(`Dye 4 Cytoplasm Intensity`))
    FITCPos_CD45Pos <- df%>%
        filter(`Dye 2 Positive` == 1, `Dye 3 Positive` == 1)%>%
        summarise(Total_GD2_count = n(), MFI = mean(`Dye 4 Cytoplasm Intensity`))
    FITCNeg_CD45Neg <- df%>%
        filter(`Dye 2 Positive` == 1, `Dye 3 Positive` == 0)%>%
        summarise(Total_GD2_count = n(), MFI = mean(`Dye 4 Cytoplasm Intensity`))

    FITCNeg_CD45Pos <- df%>%
        filter(`Dye 2 Positive` == 1, `Dye 3 Positive` == 0)%>%
        summarise(Total_GD2_count = n(), MFI = mean(`Dye 4 Cytoplasm Intensity`))
    #generating name for each file summary
    item_nam <- temp[i]

    #Appending the summary to initial list
    summary_data[item_nam] <- list(FITCPos_CD45Neg,FITCPos_CD45Pos,FITCNeg_CD45Neg,FITCNeg_CD45Pos)
    #removing "objects" that are no longer needed
    rm( "FITCNeg_CD45Neg", "FITCNeg_CD45Pos", "FITCPos_CD45Neg","FITCPos_CD45Pos", "df")

}
rreadr

Answers

comments powered by Disqus