Fast LLM Inference Engines

Next-level AI engine comes top in LLM speed showdown

Responses to AI chat prompts not snappy enough? California-based generative AI company Groq has a super quick solution in its LPU Inference Engine, which has recently outperformed all contenders in ...

TechRepublic

NVIDIA Boosts LLM Inference Performance With New TensorRT-LLM Software Library

NVIDIA Boosts LLM Inference Performance With New TensorRT-LLM Software Library Your email has been sent As companies like d-Matrix squeeze into the lucrative artificial intelligence market with ...

CRN

Nvidia Says New Software Will Double LLM Inference Speed On H100 GPU

The AI chip giant says the open-source software library, TensorRT-LLM, will double the H100’s performance for running inference on leading large language models when it comes out next month. Nvidia ...

VentureBeat

AI chip race: Groq CEO takes on Nvidia, claims most startups will use speedy LPUs by end of 2024

Everyone is talking about Nvidia’s jaw-dropping earnings results — up a whopping 265% from a year ago. But don’t sleep on Groq, the Silicon Valley-based company creating new AI chips for large ...

NextBigFuture

Defeating Nondeterminism in LLM Inference by Thinking Machines

A research article by Horace He and the Thinking Machines Lab (X-OpenAI CTO Mira Murati founded) addresses a long-standing issue in large language models (LLMs). Even with greedy decoding bu setting ...

Business Wire

MangoBoost Launches Mango LLMBoost™: AI Inference Optimization Software with Up to 12.6x Relative Performance Improvement and 92% Cost Savings

BELLEVUE, Wash.--(BUSINESS WIRE)--MangoBoost, a provider of cutting-edge system solutions designed to maximize AI data center efficiency, is announcing the launch of Mango LLMBoost™, system ...

Business Wire

Show inaccessible results

Next-level AI engine comes top in LLM speed showdown

NVIDIA Boosts LLM Inference Performance With New TensorRT-LLM Software Library

Nvidia Says New Software Will Double LLM Inference Speed On H100 GPU

AI chip race: Groq CEO takes on Nvidia, claims most startups will use speedy LPUs by end of 2024

Defeating Nondeterminism in LLM Inference by Thinking Machines

MangoBoost Launches Mango LLMBoost™: AI Inference Optimization Software with Up to 12.6x Relative Performance Improvement and 92% Cost Savings

Meta Collaborates with Cerebras to Drive Fast Inference for Developers in New Llama API

LLM Inference On CPUs (Intel)

How attention offloading reduces the costs of LLM inference at scale

LLM Inference: Core Bottlenecks Imposed By Memory, Compute Capacity, Synchronization Overheads (NVIDIA)