Writing a decent win32 keylogger [2/3]
- 21/12/2023 - inIn this series of articles, we talk about the ins and out of how to build a keylogger for Windows that is able to support all keyboard layouts and reconstruct Unicode characters correctly regardless of the language (excluding those using input method editors).
In the first part, after a brief introduction introducing the concepts of scan codes, virtual keys, characters and glyphs, we describe three different ways to capture keystrokes (GetKeyState, SetWindowsHookEx, GetRawInputData) and the differences between those techniques.
In the second part, we detail how Windows stores keyboard layouts information in the kbd*.dll and how to parse them.
In the third and last part, we go through the process of using the extracted information to convert scan codes into characters and all the tricky cases presented by different layouts, such as ligatures, dead keys, extra shift-states and SGCAPS.
Finally, we present our methodology to test and validate that our reconstruction is correct by writing a testing tool which can automate the injection of scan codes and retrieve the reference text produced by Windows which we compare with our reconstructed text.
In the previous article, we saw a few different techniques to capture key-presses on Windows. In this article, we explain how Windows translates scan-codes into characters and how we can parse keyboard layout DLLs to extract the data required to emulate that process.
Translating to characters
Now that we saw a few ways to retrieve the key-presses and context info, let’s find out how Windows goes about converting scan codes to first virtual keys, and then characters. You can find information on Windows' keyboard input model here.
Here is a simplified overview of the process:
As we saw earlier, the activated input language will select a keyboard layout. Windows supports more than a hundred different layouts out of the box, with the option to create or import more. For each layout you will find a DLL whose name starts with ‘KBD’ located in C:\Windows\System32\
, such as KBDFR.DLL
, KBDUS.DLL
, and so on.
You can find the list of usable keyboard layouts by enumerating the following registry key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layouts
Those DLLs contain everything Windows needs to convert scan codes (from the keyboard) into characters (displayed inside application windows). We will now describe their contents and how you can go about parsing that into a more readily useable intermediary format.
KBD*.DLL structure
While keyboard layout DLLs can export two functions, only one of them is really useful for our purposes and we we will not cover the second (KbdNlsLayerDescriptor
) in this article. It you want to read more about it, you can go here for the definitions and here for a reference implementation.
The only function of interest to us is KbdLayerDescriptor
which is defined like this:
PKBDTABLES KbdLayerDescriptor(VOID);
By calling it, you get a pointer to a KBDTABLES structure, which is defined as:
typedef struct tagKbdLayer {
// shift states modifiers info (shift, control, alt, alt-gr, etc.)
PMODIFIERS pCharModifiers;
// virtual keys to character conversion tables
PVK_TO_WCHAR_TABLE pVkToWcharTable;
// list of supported diactritics for this layout (aka dead-keys)
PDEADKEY pDeadKey;
// names of keys (eg: RETURN => ENTRÉE)
PVSC_LPWSTR pKeyNames;
PVSC_LPWSTR pKeyNamesExt;
WCHAR *KBD_LONG_POINTER *KBD_LONG_POINTER pKeyNamesDead;
// scan code to virtual keys conversion
USHORT *KBD_LONG_POINTER pusVSCtoVK;
BYTE bMaxVSCtoVK;
PVSC_VK pVSCtoVK_E0;
PVSC_VK pVSCtoVK_E1;
// locale specific flags (ALT-GR, Left to right, ...)
DWORD fLocaleFlags;
// ligatures
BYTE nLgMax;
BYTE cbLgEntry;
PLIGATURE1 pLigature;
// type & subtype
DWORD dwType;
DWORD dwSubType;
} KBDTABLES, * PKBDTABLES;
Exploring the KBDTABLES structure
In order to understand the contents of this structure, we wrote a tool to dump the DLLs to JSON files which will be easier to work with afterwards.
You can also check out the XML files generated by kbdlayout.info for each keyboard layout. The “XML internal tables” is basically a XML representation of the data contained in the keyboard layout DLLs. The “XML for processing” files are a higher level view of the data, in a more structured, easier to use format. Those tables were very useful to us to make sure our extraction process was correct, thanks Jan!
You can find the full source code of the tool here.
The program is written in C++ (C with 1 class to be truthful ;p) and uses the JSON library by Niels Lohmann which is very useful, powerful and easy to use (<3 single header libs).
Loading the dll & getting the pointer
Let’s start loading the DLL and then retrieve a pointer to KbdLayerDescriptor
and call it to get our KBDTABLES pointer.
// load FR keyboard layout
const char * dll = "KBDFR";
HMODULE hmod = LoadLibraryA(dll);
if(!hmod)
// handle error
// find the exported function KbdLayerDescriptor
FARPROC func = GetProcAddress(hmod, "KbdLayerDescriptor");
if(!func)
// handle error
// cast the function pointer and call it
PKBDTABLES kbd = ((PKBDTABLES(*)())func)();
Keyboard layout locale flags
Now that we have our kbd
pointer, we can start to extract information. We start by parsing the keyboard layout locale flags.
// create a json object we will fill with our extracted data
j = json({});
// parse flags
j["flag_altgr"] = kbd->fLocaleFlags & KLLF_ALTGR ? 1 : 0;
j["flag_shiftlock"] = kbd->fLocaleFlags & KLLF_SHIFTLOCK ? 1 : 0;
j["flag_ltr"] = kbd->fLocaleFlags & KLLF_LRM_RLM ? 1 : 0;
There are 3 different locale flags:
- 0x1:
KLLF_ALTGR
, if set, indicates that for this keyboard layout, the right hand ALT key should be handled as CONTROL + ALT - 0x2:
KLLF_SHIFTLOCK
, unused but if set, indicates that pressing the SHIFT key will reset the status of the CAPSLOCK key - 0x4:
KLLF_LRM_RLM
, only used for keyboard layouts with right-to-left scripts, inserts left-to-right marker (LRM) and right-to-left marker (RLM) on specific key presses (left/right shift/control and backspace combinations)
Parsing shift states & modifiers
Some layouts have more modifier keys than the common SHIFT, CONTROL, ALT and relatively common ALT-GR. For instance, Japanese keyboard use a dedicated KANA key and Canadian Multilangual uses the right control key as an extra modifier key. So this information has to be stored in the keyboard layout. Here are the relevant structures:
typedef struct {
BYTE Vk;
BYTE ModBits;
} VK_TO_BIT, *PVK_TO_BIT;
typedef struct {
PVK_TO_BIT pVkToBit;
WORD wMaxModBits;
BYTE ModNumber[];
} MODIFIERS, *PMODIFIERS;
The KBDTABLES
structure contains a pointer to this MODIFIERS struct which contains:
- a list (
pVkToBit
) of virtual keys which act as modifiers, with an associated modifier bit value - a list (
ModNumber
) ofwMaxModBits
values which map an input modifier bit-field to a column index (the shift state)
Here are example values taken from the French keyboard layout:
MODIFIERS mods_fr = {
pVkToBit: [
{ Vk=16, ModBits=1 }, // VK_SHIFT
{ Vk=17, ModBits=2 }, // VK_CONTROL
{ Vk=18, ModBits=4 }, // VK_MENU (ALT)
],
wMaxModBits: 6,
ModNumber: [0, 1, 2, 4, 15, 15, 3],
}
Now let us see what the ModNumber
list means:
Modifier | Mod value | ModNumber | Comment |
---|---|---|---|
no modifier | 0x0 | 0 | |
shift | 0x1 | 1 | |
control | 0x2 | 2 | |
alt | 0x4 | 15 | no valid combo with ALT only |
alt+shift | 0x5 | 15 | no valid combo with ALT+SHIFT only |
alt+control | 0x6 | 3 | ALT-GR = ALT + CONTROL |
So for the legacy AZERTY French keyboard layout there are at most 3 possible modifiers (shift, control and alt-gr) for a single key and the column order in the virtual keys to character tables (that we will describe later) will be:
- 0 = no modifier
- 1 = shift
- 2 = control
- 3 = alt-gr
Now this is the code we wrote to dump those shift states to JSON:
// shift states & modifiers
j["shiftstates"] = json::array();
for(int i=0; i<=kbd->pCharModifiers->wMaxModBits; i++)
j["shiftstates"].push_back(kbd->pCharModifiers->ModNumber[i]);
j["modifiers"] = json::array();
for(int i=0; kbd->pCharModifiers->pVkToBit[i].Vk; i++)
{
json o = json();
o["modbits"] = kbd->pCharModifiers->pVkToBit[i].ModBits;
o["vk"] = kbd->pCharModifiers->pVkToBit[i].Vk;
o["vkn"] = VKN(kbd->pCharModifiers->pVkToBit[i].Vk);
j["modifiers"].push_back(o);
}
The VKN macro returns the ASCII string representation of the virtual key name (see vk_names.h and vk_names.py in the repo).
Parsing VkToWcharTable
Now things get a little more dicey, the VkToWcharTable
structure pointed to in KBDTABLES
is defined like this:
typedef struct tagKbdLayer {
...
PVK_TO_WCHAR_TABLE pVkToWcharTable;
...
} KBDTABLES, * PKBDTABLES;
typedef struct _VK_TO_WCHAR_TABLE {
PVK_TO_WCHARS1 pVkToWchars;
BYTE nModifications;
BYTE cbSize;
} VK_TO_WCHAR_TABLE, *PVK_TO_WCHAR_TABLE;
Which refers PVK_TO_WCHARS1
which is a structure defined by a macro:
#define TYPEDEF_VK_TO_WCHARS(n) typedef struct _VK_TO_WCHARS##n { \
BYTE VirtualKey; \
BYTE Attributes; \
WCHAR wch[n]; \
} VK_TO_WCHARS##n, *PVK_TO_WCHARS##n;
Which is called ten times:
TYPEDEF_VK_TO_WCHARS(1)
TYPEDEF_VK_TO_WCHARS(2)
TYPEDEF_VK_TO_WCHARS(3)
TYPEDEF_VK_TO_WCHARS(4)
TYPEDEF_VK_TO_WCHARS(5)
TYPEDEF_VK_TO_WCHARS(6)
TYPEDEF_VK_TO_WCHARS(7)
TYPEDEF_VK_TO_WCHARS(8)
TYPEDEF_VK_TO_WCHARS(9)
TYPEDEF_VK_TO_WCHARS(10)
This will result in defining 10 almost identical structures, named VK_TO_WCHARS1
, VK_TO_WCHARS2
, … to VK_TO_WCHARS10
with the only difference between them beeing the size of the wch
buffer. Those structures contain:
VirtualKey
: a virtual keyAttributes
: a set of flags for this entry of the conversion table from virtual key to characterwch
: a list of characters that can be output when this virtual key is pressed (based upon the current shift state)
So to sum up, we have a pointer to multiple VK_TO_WCHAR_TABLE
structures which each contain:
- a pointer to
VK_TO_WCHARS
structures (of a specific size) - how many modifiers will be present for each entry (
nModifications
) - the offset to the next entry in the
VK_TO_WCHARS
struct
Here is our code to dump the tables to JSON, note how we cheat by using only VK_TO_WCHARS10
pointers and go to the next entry by recasting the pointer at the right address, instead of using the proper pointer according to the size (which would complexify the code).
j["vk_to_wchars"] = json::array();
for(int i=0; kbd->pVkToWcharTable[i].cbSize; i++)
{
json o = json();
o["index"] = i+1;
o["num_mods"] = kbd->pVkToWcharTable[i].nModifications;
o["table"] = json::array();
PVK_TO_WCHARS10 pvk2wch = (PVK_TO_WCHARS10)kbd->pVkToWcharTable[i].pVkToWchars;
while(pvk2wch->VirtualKey)
{
json it = json();
it["vk"] = pvk2wch->VirtualKey;
it["vkn"] = VKN(pvk2wch->VirtualKey);
it["attrs"] = pvk2wch->Attributes;
it["wch"] = json::array();
for(int j=0; j<kbd->pVkToWcharTable[i].nModifications; j++)
it["wch"].push_back(pvk2wch->wch[j]);
pvk2wch = (PVK_TO_WCHARS10)((char*)pvk2wch + kbd->pVkToWcharTable[i].cbSize);
o["table"].push_back(it);
}
j["vk_to_wchars"].push_back(o);
}
Parsing VSCtoVK
Now let us talk about the data structures that allow us to convert scan codes into virtual keys. The first one is pusVSCtoVK
which is just a pointer to USHORT
, accompanied with bMaxVSCtoVK
which gives us the number of items in the array. The index is the scan code and the value of the USHORT
pointed to is the virtual key or’d to eventual flags.
The parsing code is straight forward:
j["vsc_to_vk"] = json::array();
USHORT * vvk = kbd->pusVSCtoVK;
if(vvk)
{
for(int i=0; i<kbd->bMaxVSCtoVK; i++)
{
json o = json();
o["sc"] = i;
o["vk"] = vvk[i] & 0xff; // mask out the virtual key flags
o["vkn"] = VKN(vvk[i] & 0xff); // mask out the virtual key flags
o["flags"] = vkftos(vvk[i]); // utility function to convert the flags to json
j["vsc_to_vk"].push_back(o);
}
}
With our function vkftos
declared like this:
json vkftos(int vk)
{
json o = json::array();
if(vk & KBDEXT) o.push_back("KBDEXT");
if(vk & KBDMULTIVK) o.push_back("KBDMULTIVK");
if(vk & KBDSPECIAL) o.push_back("KBDSPECIAL");
if(vk & KBDNUMPAD) o.push_back("KBDNUMPAD");
if(vk & KBDUNICODE) o.push_back("KBDUNICODE");
if(vk & KBDINJECTEDVK) o.push_back("KBDINJECTEDVK");
if(vk & KBDMAPPEDVK) o.push_back("KBDMAPPEDVK");
if(vk & KBDBREAK) o.push_back("KBDBREAK");
return o;
}
All flag values and defines can be found here.
Now there are two more tables (pVSCtoVK_E0
and pVSCtoVK_E1
) which map extended scan codes to virtual keys. Both tables work the same:
typedef struct tagKbdLayer {
...
PVSC_VK pVSCtoVK_E0;
PVSC_VK pVSCtoVK_E1;
...
} KBDTABLES, * PKBDTABLES;
typedef struct _VSC_VK {
BYTE Vsc;
USHORT Vk;
} VSC_VK, *PVSC_VK;
So we have an array of VSC_VK
structures with two members, one for the virtual scan code and one for the associated virtual key. You have to keep reading items from the list until you get a nil virtual scan code.
Here is our parsing code:
j["vsc_to_vk_e0"] = json::array();
PVSC_VK vv0 = kbd->pVSCtoVK_E0;
for(int i=0; vv0 && vv0[i].Vsc; i++)
{
json o = json();
o["sc"] = vv0[i].Vsc;
o["vk"] = vv0[i].Vk & 0xff;
o["vkn"] = VKN(vv0[i].Vk & 0xff);
o["flags"] = vkftos(vv0[i].Vk);
j["vsc_to_vk_e0"].push_back(o);
}
j["vsc_to_vk_e1"] = json::array();
PVSC_VK vv1 = kbd->pVSCtoVK_E1;
for(int i=0; vv1 && vv1[i].Vsc; i++)
{
json o = json();
o["sc"] = vv1[i].Vsc;
o["vk"] = vv1[i].Vk & 0xff;
o["vkn"] = VKN(vv1[i].Vk & 0xff);
o["flags"] = vkftos(vv1[i].Vk);
j["vsc_to_vk_e0"].push_back(o);
}
Parsing dead keys
An interesting feature that is not used by all keyboard layouts is the support of “dead keys”. A dead key is a key that will not output a character when pressed but will instead wait for the next key press to output one or more characters to the screen. One such example is the key ^
(circumflex accent) on a french keyboard:
first key | second key | output |
---|---|---|
^ | e | ê |
^ | i | î |
^ | ' ' | ^^ |
^ | p | ^p |
Such information is stored in the PDEADKEY pDeadKey
variable whose structure is pretty simple:
typedef struct {
DWORD dwBoth;
WCHAR wchComposed;
USHORT uFlags;
} DEADKEY, *PDEADKEY;
For each of those entries, the upper 16 bits of the dwBoth
variable represent the 1st character (the ‘dead’ character) and the lower 16 bits will be the character that the dead key can be combined with (for example, the letter E). The wchComposed
variable is the combination of both those characters (in our example ê
). This table will contain the list of all valid combinations:
// list of all valid dead characters for fr_FR legacy AZERTY layout
âêîôûÂÊÎÔÛ^äëïöüÿÄËÏÖܨãÃñÑõÕ~àèìòùÀÈÌÒÙ`
We can parse all that information like this:
j["deadkeys"] = json::array();
PDEADKEY pd = kbd->pDeadKey;
for(int i=0; pd && pd[i].dwBoth != 0; i++)
{
json o = json();
o["vk1"] = pd[i].dwBoth >> 16;
o["vk2"] = pd[i].dwBoth & 0xffff;
o["combined"] = pd[i].wchComposed;
o["flags"] = pd[i].uFlags;
j["deadkeys"].push_back(o);
}
Parsing ligatures
The only thing left that we have to parse in order to fully emulate Windows character translation from keypresses are ligatures. Ligatures are the representation of two or more characters into a single glyph. The word “cœur” (which means “heart”) contains such a ligature, the characters o
and e
are merged into a single glyph œ
. Funnily enough, we can’t type that word with a standard french keyboard 💔. As an additionnal note, TTF fonts have support for ligatures, and can sometimes automatically handle such cases to display the proper joined character without requiring the input text to use the specific unicode codepoints for the ligature characters.
Now let us see an example from a keyboard layout that supports ligature: arabic. When you press the B
key, the output character will be ﻻ
, which is the combination of ل
(Arabic letter LAM) and ا
(Arabic letter ALEF). If you were to press backspace
just after pressing the b
key, you would only remove the ALEF character, and not both, and only the character LAM would remain.
Here are the relevant data structures in the KBDTABLES
struct:
typedef struct tagKbdLayer {
...
BYTE nLgMax;
BYTE cbLgEntry;
PLIGATURE1 pLigature;
...
} KBDTABLES, *PKBDTABLES;
With PLIGATURE1
defined by the following macro for up to 5 characters long ligatures:
#define TYPEDEF_LIGATURE(n) typedef struct _LIGATURE##n { \
BYTE VirtualKey; \
WORD ModificationNumber; \
WCHAR wch[n]; \
} LIGATURE##n, *PLIGATURE##n;
TYPEDEF_LIGATURE(1)
TYPEDEF_LIGATURE(2)
TYPEDEF_LIGATURE(3)
TYPEDEF_LIGATURE(4)
TYPEDEF_LIGATURE(5)
The nLgMax
variable indicates the maximum number of characters for a single ligature for the current keyboard layout. The cbLgEntry
variable indicates the size in bytes of a single ligature entry. We parse the ligature table like this (using only PLIGATURE5
pointers for shorter code):
j["ligatures"] = json::array();
PLIGATURE5 lg = (PLIGATURE5)((BYTE*)kbd->pLigature);
for(int i=0; lg && lg->VirtualKey; i++, lg = (PLIGATURE5)((BYTE*)kbd->pLigature + i*kbd->cbLgEntry))
{
json o = json();
o["vk"] = lg->VirtualKey;
o["modnum"] = lg->ModificationNumber;
o["chars"] = json::array();
for(int k=0; k<kbd->nLgMax; k++)
o["chars"].push_back(lg->wch[k]);
j["ligatures"].push_back(o);
}
… And that’s it, we’re done extracting data from those keyboard layout DLLs! There is more that we haven’t covered, such as key names, because we won’t be needing them to reconstruct our text.
In the next article we will explain how to emulate Windows' scan code to character translation with the data we extracted from the keyboard layout DLLs, you can find it here.