Friday, November 20, 2015

Bizar replaceAll tricks

Image you want to replace all asterisk (*) characters in an input string with \*. In other words you want to escape them. One way of doing this in Java is using the String.replaceAll() method:
"foo*bar".replaceAll("\\*", "\\\\*");
To understand what's going on here, let's remind ourselves of what replaceAll() actually does:
/**
 * Replaces each substring of this string that matches the given regular expression
 * with the given replacement.
 * ...
 */
public String replaceAll(String regex, String replacement) {
So that already explains why the first argument to replaceAll() is "\\*": for it to be a valid regular expression we need to escape the asterisk (which of course means zero or more times in a regular expression) using a backslash, and we all know that a backslash character in a Java String needs to be escaped.

But what about the second argument? Shouldn't that just be "\\*" also: a backslash followed by an asterisk? It turns out replaceAll() doesn't treat the replacement as a simple string literal. The Javadoc states the following:

* Note that backslashes (\) and dollar signs ($) in the
* replacement string may cause the results to be different than if it were
* being treated as a literal replacement string
So just having "\\*" as the replacement string would mean we have a backslash in there which we again need to escape! Hence the "\\\\*". It's interesting to note that in a funny twist of fate this actually makes the code less bizar. If the replacement string would have been a simple literal the code would have been "foo*bar".replaceAll("\\*", "\\*");. Imagine coming across that gem when trying to maintain some old piece of code... :-)