What could go wrong when MySQL strict SQL mode is off?
This article shows some examples of attacks that can abuse MySQL behavior when the strict SQL mode is disabled, especially when string characters are invalid in the current encoding. This happens when the encoding of the application (e.g. UTF-8) is wider than that of the database (e.g. ASCII).
Looking to improve your skills? Discover our trainings sessions! Learn more.
What is MySQL strict SQL mode?
Strict mode[1] controls how MySQL handles invalid or missing values in data-change statements such as INSERT
or UPDATE
. The case detailed in this article is when the value is out of range. What happens when a value is invalid for the current encoding?
An over-simplification of the strict mode in this case:
Strict mode On |
Strict mode Off |
Error |
Warning (silent) |
Nothing modified |
Insert “adjusted” value (“closest values”) |
At this point, you guessed what could go wrong.
Set and check
In fact, there is no “strict mode” that is “on” or “off”. There is a system variable named sql_mode
that contains an array of values. If that array contains one of STRICT_TRANS_TABLES
or STRICT_ALL_TABLES
values or the combination mode TRADITIONAL
, then MySQL is in “strict mode”.
The documentation for MySQL 9.1[2] claims that the default value of sql_mode
is the following.
ONLY_FULL_GROUP_BY STRICT_TRANS_TABLES NO_ZERO_IN_DATE NO_ZERO_DATE ERROR_FOR_DIVISION_BY_ZERO NO_ENGINE_SUBSTITUTION
The default value may differ on other versions or alternative backend engines such as MariaDB or Percona Server. The @@sql_mode
system variable allows to query the current value.
[u]> SELECT @@sql_mode\G;
@@sql_mode: STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
Installation programs may configure the SQL mode during the installation process. So even if the strict mode is enabled by default, XAMPP, CMS, etc. may silently disable it.
The violent method to disable strict mode (and everything else) is:
SET sql_mode='';
Observations
Let's create a table using ASCII encoding where it is easy to insert invalid values and see what happens when the strict mode is on and off.
-- Create a table with a column using a narrow space encoding on purpose
CREATE TABLE uni_sandbox (
id INT AUTO_INCREMENT PRIMARY KEY,
data VARCHAR(255) CHARACTER SET ascii
);
-- Strict mode enabled
SET sql_mode='STRICT_ALL_TABLES';
INSERT INTO uni_sandbox (data) VALUES ('I ♥ Unicode');
-- => ERROR 1366 (22007): Incorrect string value: '\xE2\x99\xA5 Un...' for column `unicode8`.`uni_sandbox`.`data` at row 1
-- Strict mode disabled
SET sql_mode='';
INSERT INTO uni_sandbox (data) VALUES ('I ♥ Unicode');
SELECT * FROM uni_sandbox\G;
-- => data: I ? Unicode
With strict mode on, an error was triggered, with strict mode off, the data was inserted but ♥
was replaced by ?
. Oracle MySQL or MariaDB documentation says “convert the invalid value to the closest valid value” without explaining exactly how it works. In practice, strings longer than the column size will be truncated, integers rounded to the nearest values, but that is only for values that are overflowing. For values that are invalid (can't be represented in the encoding), each byte is replaced with a ?
.
Attack context
Imagine the following context: there is a web application, that is strictlly checking the user input. A generic method commonly used for validation is by using regular expressions (RegExp). Independently of the language, nearly all RegExp engines supports POSIX character classes. But years ago, with the generalization of the Unicode support, many engines extended the range of those classes to work on Unicode too (originally limited to ASCII). So for example, instead of matching alphanumeric characters only on the ASCII range (A-Za-z0-9
), it will also match characters from the corresponding Unicode categories like Letter (L
) and Number (N
) by default.
Take the following example in Ruby with the character Latin Letter Bidental Percussive ʭ
.
/[[:alnum:]]/.match('ʭ')
# => #<MatchData "ʭ">
Some other languages like JavaScript, have also implemented non-POSIX character class selectors by implementing Unicode properties and categories selectors (\p{…}
) where you can match Unicode properties directly (e.g. \p{Ll}
or Lowercase_Letter
for lowercase letters) or alias properties[3] (e.g. \p{Alpha}
for Alphabetic that matches Letter and Letter Numbers).
"ʭ".match(/\p{Alpha}/u)
// Array [ "ʭ" ]
On the other hand, if the database is a using a “narrow space” encoding such ASCII, CP-1252 or even a legacy partial implementations such as utf8mb3 in combination with the strict mode disabled (e.g. CMS installation does it automatically), that would lead to some Unicode characters passing the security check but, being invalid on the database encoding, they would end up replaced with ?
.
In this context, there are a few attack scenarios and security bypasses that could work in real life that come to mind.
Weaponizing MySQL strict SQL mode fallback
Summary of the context
All the attacks presented in this section will be placed under the following context: the application has security checks allowing only AlphaNumeric characters including Unicode ones but the MySQL database uses an ASCII encoding or similar where Unicode characters are invalid and with strict SQL mode disabled.
Shell globbing
Bash does not support Regular Expressions (RegExp) but can still carry out filename expension, which is known as globbing. Bash will then expand characters known as wild cards. The most famous one is *
matching any string of any length, but there is also ?
that matches exactly one single character (different from the ?
RegExp quantifier). For example, at the root of a Unix file system ???t
will match both boot
and root
directories.
$ ls -d /???t
/boot /root
So imagine you have a local file disclosure (LFD) thanks to an Insecure direct object references (IDOR) but the files are renamed with an UUIDv4 (e.g. 2a0f6947-bd44-449e-94fe-82ebc3ecf115.txt
). Technically you can read the files of all other users but in practice you cannot because it is not possible to brute-force the identifier.
But now, what if you ask to read ʭʭʭʭʭʭʭʭ-ʭʭʭʭ-ʭʭʭʭ-ʭʭʭʭ-ʭʭʭʭʭʭʭʭʭʭʭʭ.txt
? It would allow you to list all files, because ʭ
(Latin Letter Bidental Percussive), as well as hundread of thousand of other characters, is a Unicode character of Letter type that would pass the alphanumeric security check. After that, when stored in MySQL, it would not be recognized as a valid ASCII character, so it would fall back to ?
because of the strict SQL mode being disabled. Then, in Bash, ?
would be expanded as any single character. Thanks to that a find
command would list all files in the directory instead of just one, bypassing the application security check as well as the use of UUID identifier.
Regular expression quantifier
For RegExps, ?
is a quantifier character, it is appended after an expression to say there must be zero or one occurence.
For example, the RegExp filename\d?\.txt
will match any of the filename below with no or one digit but not filename10.txt
because there are two digits.
filename.txt
filename1.txt
…
filename9.txt
Now imagine a Python application that returns files based on a user input like follows.
import re
username = User.username # user input fetched from DB
check(username, lib.alphanum) # some Unicode Alphanum check
re.findall(f'confidential-{username}\.pdf', 'list-files-fetched-fromFS-or-DB', re.IGNORECASE)
So similarly to before, registering a username of aʭbʭ…yʭzʭ0ʭ1…8ʭ9ʭ
would be transformed to a?b?…y?z?0?1?…8?9?
which would allow matching any ASCII letter and number once interpolated into a RegExp. If targeting a pattern that is five character long, then this pattern must be repeated 5 times. The payload would be very long and inefficient, but it will work.
import re
import string
payload = "".join(map(lambda i: i + '?', string.ascii_lowercase + string.digits)) * 5
re.findall(f'confidential-{payload}\.pdf', 'confidential-noraj.pdf', re.IGNORECASE)
# => ['confidential-noraj.pdf']
Of course, it would be easier in a scenario where you are allowed to use hyphens and brackets, and where you just need to bypass the ?
restriction.
Query parameter
The ability to inject a ?
in a context where the user is supposed to only be able to write alphanumeric characters could also allow injecting a query parameter in a URL.
For example, if a web application is internally crafting a URL based on the user’s name or any filtered user input stored in database, then the user could be able to add a query parameter like debug
or admin
that would give them unauthorized access or sensitive information. The partial Python example below could give an idea of what it could look like.
from flask import Flask, redirect, url_for
import urllib.parse
app = Flask(__name__)
@app.route('/data/<username>')
@internal # Internal route to get user data by username
def profile_name(username):
# get some data from FS based on username
data = "user_secret" if request.args.get("debug") != None else "user_public_stuff"
return data
@app.route('/profile/name/<userid>')
@internal # Internal route to convert userid to username
def profile_data(userid):
user = User(userid).username # instanciate user class by fetching data from DB
return redirect(urllib.parse.unquote(url_for('profile_name', username=user)))
@app.route('/api/public/profile/<userid>')
@auth # Public route to get profile info
def api_profile(userid):
return redirect(url_for('profile_data', userid=userid))
@app.route('/api/public/profile/name/<userid>')
@auth # Public route to set user's name
def set_name(userid):
username = request.args.get("name")
check(username, lib.alphanum) # some Unicode Alphanum check
User(userid).username = username
return True
if __name__ == '__main__':
app.run(debug=True)
With what was seen earlier, the database would transform a username like norajʭdebug
into noraj?debug
. Of course, not being able to inject an =
will prevent passing a value to the parameter. However, depending on the web framework used or programming language, the application may only check that the parameter is present.
WAF bypass
Outside the previously set context, of course indirectly injecting a ?
could also help to bypass WAFs. For example, in a database storing file blobs or templates where the attacker has gained write access to, the WAF would block a payload including <?php
. In that case, sending <ʭphp
instead could bypass the detection rule and being converted to the original payload right after thanks to strict SQL mode fallback behavior.
Real life story
A long time ago (in 2012), the planets were aligned to allow crazy bugs. In those prehistoric times, Wordpress was disabling the strict SQL mode during the setup. On the other hand, MySQL was chaining the bad design flaws. The only implementation of UTF-8 in MySQL was called utf8
(which is now called utf8mb3
) that had the issue of handling only 1 to 3 bytes characters[4] (instead of 4 bytes maximum for valid UTF-8). utf8mb3
was very misleading as claiming that MySQL was supporting UTF-8 was letting people think that the UTF-8 support was 100% complete, not just a partial implementation. The other dangerous behavior was that when strict SQL mode was disabled, it was not replacing invalid characters with ?
like it is now. Instead, the invalid characters were purely and simply removed as well as the rest of the string! So, any valid UTF-8 4 bytes character was judged as invalid in utf8mb3
and was triggering a truncation of anything that was following.
The combination of those 3 major flaws was exploited in 2014 by Cedric’s Cruft to obtain stored XSS in Wordpress[5] in the core comment feature.