How to Replace All URLs in a String with Links (Anchor Tags) in JavaScript
How can we find and replace all URLs in a given string with a link to that URL in JavaScript?
For instance, we want to replace the URL below with an anchor tag.
What is https://logfetch.com?
What is <a href="https://logfetch.com">https://logfetch.com</a>?
This process requires a few steps:
- Finding a regular expression to match all URLs
- Replacing each URL in a string with an anchor tag (
<a href=""></a>
) - Rendering that string into the HTML DOM
1. Use a regular expression to find valid URLs
Matching URL syntax with a regular expression is a difficult task.
Almost anything can be URL. For instance, :::::
, /////
, valid://///url/////
are all technically valid URLs.
We might say that URLs at least have to begin with http
or https
, right?
In that case, we could use something as simple as /(https?:\/\/[^\s]+)/g
for our regular expression, but this literally matches anything beginning with http://
or https://
, like https://garbage/?.[]c{a][s}d[]
.
URL regular expresions will likely contain false positives and/or false negatives.
For a project I’m working on, I settled on the regular expression below, which you can analyze and test on regex101.com.
/(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;()]*[-A-Z0-9+&@#\/%=~_|()])/ig
Here are a few examples of what it does match:
http://logfetch.com
https://logfetch.com
https://www.logfetch.com
https://logfetch
https://logfetch.com/hey?i=0
https://logfetch.com/hey?i=(hello)
https://l/o/g/f/e/t/c/h/./c/o/m
https://logfetch....nothing
https://+&@#/%=~_|!:,.;()
Here are a few examples of what it does not match:
logfetch.com
www.logfetch.com
https:/logfetch.com
https://logfetch.com/hey?i={hello}
https://logfetch.com/hey?i=[hello]
https://logfetch.com?
In some cases, the regular expression may match a substring of the string above but not the entire string.
While imperfect, this regular expression matches the most common use cases that I could foresee in my application. Consider your use case and what kind of tolerance you have towards false positives/negatives.
2. Replace all URLs in a string with links
We want to use the regular expression above to replace each URL in a string with an anchor tag.
First, we’ll define the regular expression as a literal:
const regex = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
Then, we’ll create a utility function to replace all URLs in a string with the respective anchor tag.
const addLinkToUrls = (text) => {
return text.replace(regex, `<a href="$1">$1</a>`);
}
If we want to open the link in a new tab, we can add this target="_blank"
attribute.
Maybe, we also want to add some classes to the anchor tag: class="text-blue-500 hover:underline"
.
3. Render the string in HTML
Now, we want to render the string with the anchor tags in the DOM.
The issue is that the string itself contains the <a>
tags. The DOM has no idea the string is meant to be an HTML element.
So unfortunately, our webpage will contain HTML as text:
What is <a href="https://logfetch.com">https://logfetch.com</a>?
We’ll want to set the innerHTML
of the target DOM element to this new string. In React, we can use dangerouslySetInnerHTML
to do this. However, as the name implies, this makes our app vulnerable to malicious cross-site scripting (XSS) attacks. This is especially an issue if we’re rendering strings created by a third party or users of our application.
A popular way to circumvent this issue is to use DOMPurify for HTML sanitization.
I used Isomorphic DOMPurify, which makes the setup of this sanitizer much more seamless for both client/server configurations.
If we’re using npm
, we can install Isomorphic DOMPurify
in our app.
npm install isomorphic-dompurify
Then, we can import DOMPurify
.
import DOMPurify from 'isomorphic-dompurify';
Finally, we can set the __html
property of dangerouslySetInnerHTML
with the sanitized output of our new string.
<div dangerouslySetInnerHTML={
{ __html: DOMPurify.sanitize(addLinkToUrls(text)) }
} />