The power of backreference in javascript regular expression

Published by Joven Pancho on

Today I’ll demonstrate the use case of backreference in javascript regular expression. Well some of us, specially a beginner find it hard to learn regular expression and we don’t quite appreciate writing regular expression because of its complexity. But there are often times that writing a good regex would benefit us and save us time from developing application.

Okay enough of that. Let us assume that we have a problem that we need to solve. Suppose you have a string aaaccfgzzz and you want to match any consecutive characters in the string eg: aaa, cc and zzz.

Well, if I were to present my algorithm to solve the problem It would be like this:

I know there are lots of better solution than mine. But as you can see it requires a lot of typing to solve that problem.

Fortunately, with the help of our knowledge in regex backreference we would be able to solve the problem using only this code:

Well, of course I will have to explain how the regex works. If you have a basic knowledge in regular expression and you know how capturing group works in regex then you can easily understand this regex or you might have already known it.

First we have this expression (.). The . expression simply matches any character in the string and captures the matched character with the ( and ) expression around the .. What those parenthesis (capturing group) do is to store any matched character into this regex expression \{n} which is called “backreference”. We put \1 there because there is only a single capturing group. If we have a second capturing group then we can use \2 expression and \3 for third capturing group and so on.

As you can see, the regex matches [‘aaa’, ‘cc’, ‘zzz’] and not any single character in the string because . expression matches the letter “a” in the string and the next letter “a” would be matched by the \1 expression which is being made available by the effect of capturing group that stores the matched character by the . expression. We also have the + expression after the . expression. What it does is to match one or more occurrences of \1(backreference). If we did not put the + expression there then the regex would match only 2 consecutive characters eg ['aa', 'cc', 'zz']. We also put g modifier there. It is used to perform a global match (find all matches rather than stopping after the first match).

Another example:

 

With this code we perform a reverse algorithm on the string with the use of regex backreference. The only difference is when we perform a replace the matched string would be stored into ${n} expression instead of \{n} and it would be available on the second parameter of the replace method.

 

Conclusion:

The backreference in regular expression is rarely used but it is a powerful tool that you must acquire because with that you can do a lot of string manipulation and I’m sure you would encounter that whenever you develop a quite complex application.


2 Comments

Josiah · January 23, 2019 at 3:56 am

I enjoyed your article. I have a couple points of recommendation if you are interested.

I would suggest not including HTML code when giving examples of JavaScript. JavaScript doesn’t rely on HTML to run, so having the extra code in your examples is clutter.

I would also suggest using formatted text blocks for the code examples, so people can copy them and run them themselves. Or even better, I would recommend writing the code on codepen.io and embedding the pens on the page. That way they could be run inline.

Lastly, avoid document.write(). There are better ways to see the results of expressions. For instance, console.log() is good, or by modifying the innerHTML of a particular element.

    Joven Pancho · January 23, 2019 at 1:45 pm

    Thanks for your suggestions. I appreciated that.

Leave a Reply

%d bloggers like this: