Why does string "\u0022" start with this string "\u204D"

Zoom Source

Why under .net (Windows 8) does this string comparison return true?

"\u0022".StartsWith("\u204D");

It is true under all cultures and if you switch the StartsWith to an Equals it returns false.

There are many different characters that return true for a StartsWith comparison. Is this part of an odd Unicode rule or does Windows have its own rules here?

c#.netunicodestring-comparison

Answers

answered 4 years ago Jeppe Stig Nielsen #1

It is not easy to know what rules lie under the culture-dependent string comparisons. It does seem quite consistent that the punctuations U+0022 " (QUOTATION MARK) and U+204D (BLACK RIGHTWARDS BULLET) are considered "equal enough" under culture comparisons (including those of the invariant culture). These examples all indicate that:

// culture-sensitive:

Console.WriteLine("\"".StartsWith("⁍"));
Console.WriteLine("⁍".StartsWith("\""));
Console.WriteLine("\"".StartsWith("⁍", StringComparison.InvariantCulture));
Console.WriteLine("⁍".StartsWith("\"", StringComparison.InvariantCulture));

Console.WriteLine("\"".Equals("⁍", StringComparison.CurrentCulture));
Console.WriteLine("⁍".Equals("\"", StringComparison.CurrentCulture));
Console.WriteLine("\"".Equals("⁍", StringComparison.InvariantCulture));
Console.WriteLine("⁍".Equals("\"", StringComparison.InvariantCulture));

Console.WriteLine(StringComparer.CurrentCulture.Equals("\"", "⁍"));
Console.WriteLine(StringComparer.CurrentCulture.Equals("⁍", "\""));
Console.WriteLine(StringComparer.InvariantCulture.Equals("\"", "⁍"));
Console.WriteLine(StringComparer.InvariantCulture.Equals("⁍", "\""));

Console.WriteLine("\"".CompareTo("⁍"));
Console.WriteLine("⁍".CompareTo("\""));

Console.WriteLine(StringComparer.CurrentCulture.Compare("\"", "⁍"));
Console.WriteLine(StringComparer.CurrentCulture.Compare("⁍", "\""));
Console.WriteLine(StringComparer.InvariantCulture.Compare("\"", "⁍"));
Console.WriteLine(StringComparer.InvariantCulture.Compare("⁍", "\""));

Other examples could have been given, for example the static methods on string, but they are equivalent.

With an ordinal comparison, certainly U+0022 must be different from (less than) U+204D (that is simple!):

// ordinal:

Console.WriteLine("\"".StartsWith("⁍", StringComparison.Ordinal));
Console.WriteLine("⁍".StartsWith("\"", StringComparison.Ordinal));

Console.WriteLine("\"".Equals("⁍"));
Console.WriteLine("⁍".Equals("\""));

Console.WriteLine(StringComparer.Ordinal.Equals("\"", "⁍"));
Console.WriteLine(StringComparer.Ordinal.Equals("⁍", "\""));

Console.WriteLine(StringComparer.Ordinal.Compare("\"", "⁍"));
Console.WriteLine(StringComparer.Ordinal.Compare("⁍", "\""));

comments powered by Disqus