This is for a VB.NET project. My existing method converts a comma-delimited file to a pipe-delimited file. It got a little challenging because some of the fields had commas within them, so those fields had double-quotes around the fields contents.
Here's the working code (thanks a million to The Blue Dog for the research on this):
Private Function ConvertCommaSepToPipeSep() As Boolean Dim line, result As String Dim pattern As String = ",([^,""]*(?:""[^""]*"")?[^,""]*)(?=,|$)" Dim replacement As String = "|$1" Dim rgx As New Regex(pattern) 'Console.WriteLine("Conversion start time: " & DateTime.Now.ToLongTimeString()) Try Using sw As New StreamWriter("output.csv") Using sr As New StreamReader("source.csv") While Not sr.EndOfStream line = sr.ReadLine result = rgx.Replace(line, replacement) sw.WriteLine(result.Replace(Chr(34), "")) End While End Using End Using Catch ex As Exception MessageBox.Show("There was a problem converting the file." & vbcrlf & ex.message) Return False End Try 'Console.WriteLine("Conversion end time: " & DateTime.Now.ToLongTimeString()) Return True End Function
I found out, however, that some of the fields have double-quotes within them as well.
Here are some sample lines from the source file that I am converting.
122749,JOHN DOE,ACS155,7/5/2014,P,SCH/RC Activation Week 2,HRLY,1299577,Scheduler IT,2204,CVISA-Client Activation,1220000,Svcs Clin Implement,34 110310,JANE DOE,ACS150,2/8/2014,P,"Developed Employee Interface""",HRLY,1267305,Project Management - Client Implementation Services,2500,PJM -Project Management,1410000,Tech Services Development,8 110310,MARY DOE,ACS160,2/8/2014,P,EDManage+ CSV data extract,HRLY,1527401,Project Management - Client Implementation Services,2500,PJM -Project Management,1410000,Tech Services Development,8 129084,ROBERT SMITH,ACS80,9/27/2014,P,,PTO,0,Company General Services,1030,"Time Off - PTO, Holiday, Personal Holiday, FTO",1100000,Client Services Technical,40 117592,HARRY JOHNSON,ACS64,5/10/2014,P,"helped penny post AP ""E"" cks",HRLY,1554404,General Financials IT,2120,CCON-Client Conference Call,1100000,Client Services Technical,1.5 110310,MARK WILSON,ACS130,2/8/2014,P,"""Charge Vs Payment""",HRLY,1267305,Project Management - Clinical Implementation Services,2500,PJM -Project Management,1410000,Tech Services Development,8
Those same rows need to be converted to look like this:
122749|JOHN DOE|ACS155|7/5/2014|P|SCH/RC Activation Week 2|HRLY|1299577|Scheduler IT|2204|CVISA-Client Activation|1220000|Svcs Clin Implement|34 110310|JANE DOE|ACS150|2/8/2014|P|Developed Employee Interface""|HRLY|1267305|Project Management - Client Implementation Services|2500|PJM -Project Management|1410000|Tech Services Development|8 110310|MARY DOE|ACS160|2/8/2014|P|EDManage+ CSV data extract|HRLY|1527401|Project Management - Client Implementation Services|2500|PJM -Project Management|1410000|Tech Services Development|8 129084|ROBERT SMITH|ACS80|9/27/2014|P||PTO|0|Company General Services|1030|Time Off - PTO, Holiday, Personal Holiday, FTO|1100000|Client Services Technical|40 117592|HARRY JOHNSON|ACS64|5/10/2014|P|helped penny post AP E cks|HRLY|1554404|General Financials IT|2120|CCON-Client Conference Call|1100000|Client Services Technical|1.5 110310|MARK WILSON|ACS130|2/8/2014|P|Charge Vs Payment|HRLY|1267305|Project Management - Clinical Implementation Services|2500|PJM -Project Management|1410000|Tech Services Development|8
In this CSV, columns that have commas in the text are given double-quotes around the column and the regex above accounts for that. But I found out that some fields also have double-quotes within them. Any instances of double-quotes within a field can be removed, but in some cases the field can end or start with a double quote, resulting in three double-quotes, but I can't just remove all double-quotes because they help delineate where fields that have commas in them start and end.
What needs to be added to the regex to do that?regexvb.netcsv