.NET Regex To Remove Line Breaks Within Quotes

Charles Arthur Source

I am trying to clean up a text file so that it can be imported into Excel but the text file contains line breaks within several of the double quoted fields. The file is tab delimited.

Example would be:

"12313"\t"1234"\t"123

5679"
"test"\t"test"\t"test"
"test"\t"test"\t"test"
"12313"\t"1234"\t"123

5679"

I need to remove the line breaks so that it will ultimately display like:

"12313"\t"1234"\t"1235679"
"test"\t"test"\t"test"
"test"\t"test"\t"test"
"12313"\t"1234"\t"1235679"

The "\t" is the tab delimiter.

I've looked at several other solutions on SO but they don't seem to deal with multiple lines. We've tried using several CSV parser solutions but can't seem to get them to work for this scenario. The goal is to pass the entire string into a REGEX expression and have it return with all line breaks between quotes removed while the line breaks outside of the quotes remain.

c#regex

Answers

answered 4 years ago Ulugbek Umirov #1

string output = Regex.Replace(input, @"(?<=[^""])\r\n", string.Empty);

Demo with the input provided

answered 4 years ago anubhava #2

You can use this regex:

(?!(([^"]*"){2})*[^"]*$)\n+

Working Demo

This one matches one or more newline character that are not followed by even number of quotes (It assumes there is no escaping exceptions in the data).

answered 4 years ago Derek #3

If just removing blank lines works:

string text = Regex.Replace( inputString, @"\n\n", "" , RegexOptions.None | RegexOptions.Multiline );

answered 3 years ago Anton Damhuis #4

This worked for me:

var fixedCsvFileContent = Regex.Replace(csvFileContent, @"(?!(([^""]*""){2})*[^""]*$)\n+", string.Empty);

This didnt work:

var fixedCsvFileContent = Regex.Replace(csvFileContent, @"(?!(([^""]*""){2})*[^""]*$)\n+", string.Empty, RegexOptions.Multiline);

Thus one must not add RegexOptions.Multiline when doing the check on the input string.

comments powered by Disqus