Javascript Gotchas 1 – /x*$/ in global replace

Here is a sample javascript function:

function quote(s) {
    s = s.replace(/^\s+|\s+$/g, '');
    s = s.replace(/^"*|"*$/g, '"');
    return s;
}

The two lines of the function are intended to do the following:

  1. remove any number of whitespace characters at the start or end of the line (by replacing them with an empty string); and
  2. replace any number of (including zero) double-quotation marks at the start or end of the line with one double-quotation mark.

For example, the string Hello world should come out as "Hello world", and the string "Oh wow" should come out unmodified; etc.

Unfortunately, as highlighted in this question on StackOverflow, it doesn't quite behave as expected. The combination of zero or more [characters] followed by the end-of-string and the global regular expression parameter to String.prototype.replace combine in an unfortunate edge case where the machinery of the regular expression engine and replacement algorithm must be understood before the behaviour can be predicted.

Put simply, "Oh wow" comes out as "Oh wow""

Thom Blake provided the following explanation, which I will copy verbatim. Click to show the explanation below:

Essentially: because there is no actual "end-of-string token" the regular expression engine can't consume the $, and because the String.prototype.replace function relies on explicitly falling off the end of the string, our function:

  1. matches zero or more (i.e. one) quotation mark at the end of the string and replaces it with a single quotation mark,
  2. repeats the match at the current location (i.e. immediately after the newly-added quotation mark), because the previous match was at a different location in the string,
  3. matches zero or more (i.e. zero) quotation marks at the end of the string and replaces it with a single quotation mark,
  4. attempts to repeat the match at the next position in the string (because the previous match was at the same location), but this falls off the end of the string, so it stops.

Note that it's not because we're replacing quotes with quotes; it's only because we're matching against zero or more [something] at the end of the string and finding "more" and then "zero".

So unfortunately there is no way to use a single regexp replacement to achieve our goal. A working solution is to remove all quotation marks from the string, and then affix a pair. This has the added advantage of santising the string, and ensuring matched quotation marks.

function quote(s) {
    s = s.replace(/^\s+|\s+$/g, '');
    s = s.replace(/^"+|"+$/g, '');
    s = '"' + s + '"';
    return s;
}

Examples: [src]

Update:

Apparently Ruby's regular expression engine behaves the same way:

irb(main):001:0> 'foo'.gsub(/\A#*|#*z/, '#')
=> "#foo#"
irb(main):002:0> '#foo'.gsub(/\A#*|#*\z/, '#')
=> "#foo#"
irb(main):003:0> '##foo'.gsub(/\A#*|#*\z/, '#')
=> "#foo#"
irb(main):004:0> 'foo#'.gsub(/\A#*|#*\z/, '#')
=> "#foo##"
irb(main):005:0> 'foo##'.gsub(/\A#*|#*\z/, '#')
=> "#foo##"
irb(main):006:0> '##foo##'.gsub(/\A#*|#*\z/, '#')
=> "#foo##"


thumbnail image

Matthew Kerwin

Published
Modified
License
CC BY-SA 4.0
Tags
development, gotcha, javascript, regex, ruby, software
We have discovered that the javascript spec requires that certain global regular expression replaces are bugged.

Comments powered by Disqus