TensorRT-LLM (NVIDIA/TensorRT-LLM) is an open-source AI project on GitHub. Repository summary: TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way. Its focus includes developer-centric engineering workflows. It is suitable for extension, integration, and iterative delivery in real workflows.
License
Other
Stars
13,515
Features
- Core capability: TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
- Built for code generation, debugging, or engineering integration
- Repository: NVIDIA/TensorRT-LLM
- Primary language: Python
- Open-source license: Other
- GitHub traction: about 13,514 stars
Use Cases
- Supports AI engineering build-and-iterate workflows for dev teams
- Build internal AI workflow prototypes with TensorRT-LLM
- Validate TensorRT-LLM in production-like engineering scenarios
- Translating and organizing learning content
- Language practice and review
- Multilingual publishing of course materials
FAQ
Teams should first define integration boundaries and call patterns, then map repository capabilities into concrete interfaces, parameters, and access rules. GitHub repository: https://github.com/NVIDIA/TensorRT-LLM. Community traction is around 13,514 stars. License: Other.
It usually works as an execution component or capability layer, with common deployment fits such as: Supports AI engineering build-and-iterate workflows for dev teams, Build internal AI workflow prototypes with TensorRT-LLM, Validate TensorRT-LLM in production-like engineering scenarios.