NVIDIA has disclosed and patched a high-severity vulnerability in its TensorRT-LLM framework that could allow attackers with local access to execute malicious code, tamper with data, and potentially compromise AI systems. 

The vulnerability, tracked as CVE-2025-23254, affects all versions of TensorRT-LLM prior to 0.18.2 across Windows, Linux, and macOS platforms.

TensorRT-LLM Python Executor Insecure Pickle Handling

Security researchers discovered a critical flaw in the Python executor component of TensorRT-LLM, specifically in its socket-based Inter-Process Communication (IPC) system. 

The vulnerability stems from insecure handling of Python’s pickle serialization/deserialization mechanism, which is widely known for its security risks when processing untrusted data.

The CVE has been assigned a CVSS base score of 8.8, categorizing it as high severity. It falls under the Common Weakness Enumeration category CWE-502 (Deserialization of Untrusted Data), a known vulnerability class that can lead to remote code execution.

“NVIDIA TensorRT-LLM for any platform contains a vulnerability in python executor where an attacker may cause a data validation issue by local access to the TRTLLM server”,  warned NVIDIA in its security bulletin.

“A successful exploit of this vulnerability may lead to code execution, information disclosure and data tampering”.

NVIDIA credited Avi Lumelsky of Oligo Security for responsibly reporting the vulnerability.

Risk FactorsDetailsAffected ProductsNVIDIA TensorRT-LLM (Windows, Linux, macOS versions prior to 0.18.2)ImpactCode execution, Information disclosure, Data tamperingExploit PrerequisitesLocal access to TRTLLM server (AV:L), Low attack complexity (AC:L), Low privileges (PR:L)CVSS 3.1 Score8.8 (High)

Technical Exploitation Path

The vulnerability specifically involves Python’s pickle module, which can execute arbitrary functions during deserialization via the __reduce__() method. 

In TensorRT-LLM’s case, an attacker with local access to the server could craft malicious serialized data that, when deserialized by the application, would execute arbitrary code with the privileges of the running process.

The ZeroMqQueue class in TensorRT-LLM’s IPC implementation was particularly vulnerable as it used pickle for serializing and deserializing data across processes without proper validation.

Patches Released

NVIDIA released version 0.18.2 on April 29, 2025, which implements HMAC (Hash-based Message Authentication Code) encryption by default in the socket-based IPC system. 

This security enhancement prevents the exploitation of the vulnerability by validating the integrity of serialized data before deserialization.

The company strongly advises all users to update immediately to version 0.18.2 or later, warning that “disabling this feature will make you vulnerable to the security issue”.

For users who cannot upgrade immediately, NVIDIA noted that the encryption feature can be manually disabled, although this is strongly discouraged:

On the main branch, customers can set use_hmac_encryption = False under tensorrt_llm/executor/ipc.py, in the class ZeroMqQueue. In release 0.18, customers can set use_hmac_encryption = False under tensorrt_llm/executor.py, in the class ZeroMqQueue.

This vulnerability highlights the growing security challenges in AI frameworks, particularly those handling complex model operations. 

TensorRT-LLM is widely used to accelerate large language models for generative AI, delivering significantly improved performance for production applications.

Organizations using TensorRT-LLM are urged to implement the patch immediately to protect their AI infrastructures from potential exploitation.

Are you from the SOC and DFIR Teams? – Analyse Real time Malware Incidents with ANY.RUN -> Start Now for Free.