Addressing The AI Elephant In The Room

Something remarkable has happened in the Cyber Security industry in the last couple of months. Talented researchers and engineers are having a serious existential crisis. There is a feeling that we aren’t really… well, special anymore. Anthropic’s wild, hype-heavy announcement regarding Mythos (a new frontier model) and a coalition of leading tech companies banding together to fix all the vulnerabilities in their products, dubbed Glasswing, has exacerbated this. Is it time to clear out our desks and find new AI-proof employment? Farming, anyone?

Of course, someone could be forgiven for believing this, based on some recent headline-grabbing quotes.

Our industry is no stranger to impostor syndrome, and statements like this certainly aren’t going to improve the situation. However, this fails to account for wider considerations, such as:

The Large Language Model (LLM) had access to full source code (we rarely do)
Frontier models can only find vulnerabilities/exploits based on existing (human) findings (i.e. not novel/groundbreaking techniques)
The LLM doesn’t look at systems as a whole (i.e. a subtle primitive in one part of a system, can unlock a much more powerful vulnerability in another part)
An LLM isn’t able to autonomously hook up hardware and test/analyse across all security boundaries (e.g. design issues between SoC firmware and kernel driver components won’t be found in isolation)

For all of these reasons (and many more), CoreTech relies on its exceptionally talented people to deliver innovative solutions to our clients, across all of our capability teams and we always will.

Despite this, we pride ourselves at working at the cutting edge of cybersecurity, and this includes using the latest techniques and tools available to us. AI is a tool and, much like those that have come before, it can augment human capabilities, to allow us to increase productivity and delivery whilst maintaining our existing high quality output. As an example, when the software reverse engineering tool, Ghidra, was released, it made reverse engineering far more accessible, as the tool was free and the decompilation view reduced (though didn’t replace) the amount of time spent staring at assembly instructions. Ultimately this significantly sped up reverse engineering tasks.

That said, AI does represent more of a fundamental shift than previous technologies. Consequently, there is a heightened risk of over-reliance: if engineers depend entirely on AI without understanding the underlying technology, it could lead to skill atrophy and degraded long-term output. Like everything, there is a balance to be struck.

With this in mind, the remainder of this post illustrates how we are using AI at CoreTech by outlining our infrastructure and exploring a few practical use cases from across the business.

Offline LLM Setup

To maintain strict data privacy and retain full control over our workflows, we have built an in-house rig for running open-source models. We are still limited on what open source models we can run, due to VRAM constraints and speed. We’ve tried brilliant models such as minimax internally, but due to our hardware it was just too slow to be our primary model.

We supplement our internal rig with access to frontier models via openrouter.ai, people can access top quality models such as Claude/Gemini/GPT when data privacy is not a concern and they feel it is appropriate.

Hardware

The following infographic provides the headlines specs for the rig.

And the real thing, in all its naked glory.

This setup served as a cost-effective way to experiment with local LLMs. The GPUs were purchased second-hand, and the RAM was acquired prior to recent market shortages.

Lessons Learned

Initially, the rig experienced some instability. We encountered PCIe error rates, which we mitigated by relaxing cable management to reduce signal interference and downgrading the PCIe speed to Gen3 in the BIOS, a change that dramatically improved stability.

We also ran into issues with the 3.3V rail. One advantage of using server hardware is comprehensive Baseboard Management Controller (BMC) monitoring, which flagged the rail as “lower critical” and triggered reboots under prolonged loads (e.g., when running agentic tools). This was resolved by swapping the motherboard power cable to the secondary power supply unit (PSU).

Overall, the rig has been excellent value for money, but for future builds we will certainly transition to a rack-mounted chassis with modern GPUs (such as the RTX 6000 Pro series).

Software

Our hosting stack runs on Docker, with Open WebUI serving as the central chat interface for the company, via single sign-on (SSO), which is particularly important to us. The interface also functions as a proxy, enabling secure access to external model services that our local inference engine cannot reach directly.

Currently, we use llamacpp in router mode, which allows us to swap models on demand. Should we consolidate to fewer concurrent models, we plan to migrate to vLLM, given its superior performance for multi-user workloads. Access to frontier models via OpenRouter.ai is also routed through Open WebUI as a proxy, as this supplies the authentication layer.

Use Cases

We have cherry picked a few use cases from the different teams here at CoreTech, to show how we are using AI across the company. In our experience AI is applicable to lots of different areas of the business and not just code generation, which is typically where most companies are using it.

TRACE – Automated PCB Scanning

We use an internal tool called TRACE, which scans printed circuit boards (PCBs) by capturing thousands of high-resolution photographs within a custom rig. These images are stitched together to create a single, ultra-high-resolution view of the board. This image kicks off an automated pipeline that ultimately generates a detailed report, identifying components, potential debug pads, and other key information. More details on TRACE will be covered in a future blog post.

Our internal LLM is used for producing the overview of the PCB, performing OCR and looking up chip information based on information extracted by the model.

One of our example reports was based on an image of the Arduino Due, a stock photo of which can be seen below.

The overview produced by the TRACE tool reported the following:

This appears to be an Arduino Due development board, a microcontroller platform based around the Atmel SAM3X8E ARM Cortex-M3 processor. The prominent Atmel SAM3X8E chip is the central processing unit, handling the core logic and computations. Surrounding the processor are supporting components like voltage regulators (providing stable power), crystal oscillators (for timing), and various passive components such as resistors and capacitors for signal conditioning and stability. The board features a variety of digital and analog input/output pins, indicated by labels like “Digital” and “Analog IN”, enabling interaction with external sensors and actuators. Communication interfaces are available in the form of UART (TX/RX), SPI, and I2C (SDA/SCL) connectors, allowing the board to communicate with other devices.

Based on the visible components and pin labels, the main function of this PCB is to provide a platform for prototyping and developing embedded systems applications, particularly those requiring more processing power and memory than traditional 8-bit microcontrollers. The board likely supports programming via USB through the dedicated USB connector.

Potential debug interfaces include a standard JTAG interface, likely accessible through the six-pin header labeled “JTAG”. This allows for low-level debugging and code tracing using a JTAG debugger. Additionally, there's a possible ICSP (In-Circuit Serial Programming) header present, enabling firmware updates directly to the microcontroller. The header labeled "DEBUG" also suggests a potential programming/debugging interface. The presence of the “RESET” button allows for manual resetting of the microcontroller during development or in case of errors.

Vulnerability Research

Unlike the TRACE example above, the VR team does not yet integrate LLMs into automated workflows. Instead, our researchers use the LLM as a general-purpose assistant to streamline daily tasks, such as:

Troubleshooting software installation errors
Summarising academic and technical research papers
Drafting code snippets for proof-of-concepts (PoCs)
Using the LLM with open source code bases to help understand the code and to prioritise where to focus efforts.
Generation of debug tooling / helper tools
Ghidra script generation
General architectural questions about particular processors or software

Ghidra Script Generation

A genuine time saver is utilising the LLM to create Ghidra scripts. While local models still struggle with this complexity, frontier models like Gemini can frequently generate functional scripts in a single attempt. For instance, given the following prompt:

I have an ARMv8 binary which is running in AARCH32 mode. I need a pyghidra script for Ghidra, the reverse engineering tool.
The script must annotate `tbb` instructions which have not been correctly interpreted by ghidra.
The script will be ran on a `tbb` instruction, the location must be fetched from the cursor position. It must read the bytes after the `tbb` instruction to determine the number of cases, it must then comment the `tbb` instruction with the destinations for each valid index, the destination must be disassembled if it has not already been.
As this is pyghidra, the header must be:
```
#<DESCRIPTION OF SCRIPT>
#@author LLM
#@category <SOME CATEGORY>
#@keybinding
#@menupath
#@toolbar
#@runtime PyGhidra
```

Gemini produced a working script in one go, at a cost of 0.0881$.

#Annotates and disassembles TBB instructions and their targets in AARCH32 (Thumb-2)
#@author LLM
#@category ARM
#@keybinding
#@menupath
#@toolbar
#@runtime PyGhidra

from ghidra.program.model.listing import CodeUnit
from ghidra.program.model.symbol import RefType, SourceType

#Annotates and disassembles TBB instructions and their targets in AARCH32 (Thumb-2)
#@author LLM
#@category ARM
#@keybinding
#@menupath
#@toolbar
#@runtime PyGhidra

from ghidra.program.model.listing import CodeUnit
from ghidra.program.model.symbol import RefType, SourceType
import ghidra.app.script.GhidraScript

def main():
    if currentAddress is None:
        print("No active address. Please place your cursor on a tbb instruction.")
        return

    instr = getInstructionAt(currentAddress)
    if instr is None:
        print(f"No instruction found at the cursor ({currentAddress}).")
        return

    mnemonic = instr.getMnemonicString().lower()
    if mnemonic != "tbb":
        print(f"Instruction at cursor is not 'tbb', it is '{mnemonic}'.")
        return

    print(f"Found {mnemonic} at {currentAddress}")

    # TBB is a 32-bit (4-byte) Thumb instruction. 
    # For TBB, PC evaluates to instruction_address + 4.
    # The jump table typically directly follows the tbb instruction in memory.
    base_pc = currentAddress.add(4)
    print(f"Jump table base (PC) evaluated at {base_pc}")

    # Dynamically determine the size of the jump table.
    # The jump table cannot physically extend past its first destination offset.
    # By tracking the minimum offset reading, we can find out how many cases exist.
    i = 0
    min_offset_bytes = 255 * 2 # 255 is the absolute max index since TBB table uses bytes
    cases = []

    # Loop until our index hits the start of our closest branch target
    while i < min_offset_bytes and i < 256: 
        table_entry_addr = base_pc.add(i)
        
        # In PyGhidra (JPype), getByte returns a signed Java byte, so we mask with & 0xFF.
        offset = getByte(table_entry_addr) & 0xFF 
        offset_bytes = offset * 2
        
        if offset_bytes < min_offset_bytes:
            min_offset_bytes = offset_bytes
            
        dest = base_pc.add(offset_bytes)
        cases.append((i, dest))
        i += 1

    print(f"Determined table has {len(cases)} cases.")
    
    # Pre-fetch the ReferenceManager to add Computed Jump references
    ref_manager = currentProgram.getReferenceManager()
    
    # Pre-fetch the T (Thumb) register context
    t_reg = currentProgram.getProgramContext().getRegister("T")

    comment_lines = ["TBB Jump Table Targets:"]
    
    for idx, dest in cases:
        comment_lines.append(f"  [{idx}]: {dest}")
        
        # Add a cross-reference from the TBB instruction to the target
        ref_manager.addMemoryReference(currentAddress, dest, RefType.COMPUTED_JUMP, SourceType.USER_DEFINED, idx)
        
        target_instr = getInstructionAt(dest)
        if target_instr is None:
            print(f"Disassembling case {idx} at {dest}")
            
            # Wipe any data/undefined blocks that might be in the way at the target
            clearListing(dest, dest)
            
            # Ensure the destination context is set to Thumb mode (T = 1)
            if t_reg is not None:
                currentProgram.getProgramContext().setValue(t_reg, dest, dest, 1)
            
            # Start disassembling from the destination address
            disassemble(dest)

    # Annotate the TBB instruction with an EOL comment listing the destinations
    final_comment = "\n".join(comment_lines)
    
    # Append to existing comment if there happens to be one
    existing_comment = instr.getComment(CodeUnit.EOL_COMMENT)
    if existing_comment:
        final_comment = existing_comment + "\n\n" + final_comment
        
    instr.setComment(CodeUnit.EOL_COMMENT, final_comment)
    print("Successfully annotated TBB instruction and disassembled targets.")

if __name__ == "__main__":
    main()

Internal Meeting Summarisation & Speaker Annotation Tool

Like many others, we have developed a meeting summarisation tool to streamline the processing of internal all-hands meetings (known internally as “company breakfasts”). The entire workflow is local, ensuring sensitive company discussions never leave our infrastructure.

Through the web interface, users upload video or audio recordings directly. These files are processed using WhisperX on a single GPU, which handles high-accuracy automatic speech recognition and initial speaker diarisation. Once transcribed, users can can manually annotate and label speaker turns (adding names to speakers) to ensure the downstream LLM receives properly contextualised input.

The annotated transcript is then forwarded to our local LLM server for summarisation. Using carefully crafted system prompts, the model generates a concise, executive-style digest that highlights key announcements, strategic updates, and action items. From there, the summary can be exported and published directly to SharePoint or distributed via internal email.

This tool has proven particularly valuable for communication within CoreTech. Rather than requiring staff to sit through hour-long recordings, anyone who missed a breakfast meeting can read a tailored summary in under two minutes. It also exemplifies our broader philosophy: by chaining open-source inference tools (WhisperX + local LLMs) with a lightweight custom interface, we solve genuine internal problems whilst maintaining strict data sovereignty and avoiding third-party SaaS dependencies.

Summary

AI is here to stay and, as one respected industry colleague likes to point out, the capabilities of the LLM you’re using today is likely to be the worst they will ever be. That said, it is important to see this as a leap forward in tooling and not a like-for-like replacement for human vulnerability researchers and cyber engineers.

If you’re working somewhere where this isn’t the predominant viewpoint, why don’t you get in touch with us using the link below.

Addressing the AI elephant in the room