GRIT LLM

The GRIT LLM server is a centrally managed, GPU-accelerated AI service designed to provide researchers and staff with secure, high-performance access to large language models. The platform is powered by dual NVIDIA L40S GPUs and delivers fast, private inference without reliance on external cloud-based AI services.

Overview of the Service

The service combines two core components:

OpenWebUI — a web-based interface that provides a user-friendly, ChatGPT-like experience

Ollama — a backend model runtime responsible for running and managing large language models locally

Together, these components provide a seamless environment where users can interact with AI models through a browser while computations are performed on dedicated GPU hardware within the GRIT infrastructure.

What We Provide

The GRIT LLM server offers:

Web-based access to AI models
Users can interact with language models through an intuitive browser interface with no local installation required.

Access to multiple models
A curated set of open-source models (e.g., LLaMA, Mistral, and code-focused models) are available atand ~~llm.grit.ucsb.edu.~~selectable within the interface.

High-performance GPU acceleration
All inference workloads are executed on L40S GPUs, enabling significantly faster responses compared to CPU-based systems.

Private, on-premise processing
All data and prompts remain within GRIT-managed infrastructure, supporting research workflows that require data privacy or compliance controls.

Multi-user support
The platform is designed for shared use, allowing multiple users to access models concurrently.

Persistent chat history and sessions
Users can maintain conversation history and revisit prior interactions within the interface.

How the Platform Works

Ollama serves as the model engine. It ~~can~~handles downloading, storing, and executing language models, and exposes them through a local API.

OpenWebUI connects to Ollama and provides the frontend experience, allowing users to select models, submit prompts, and view responses.

This separation allows the system to be ~~logged~~flexible ~~into~~and extensible while maintaining a consistent user experience.

Available Models

A selection of commonly used models is pre-installed and maintained by GRIT. These may include:

General-purpose chat models

Code generation and analysis models

Lightweight and high-performance variants for different workloads

Additional models may be made available upon request.

Adding or Requesting New Models

Administrator-managed models

GRIT administrators can deploy new models by pulling them through Ollama:

ollama pull <model-name>

Once installed, models are automatically available in OpenWebUI.

Custom model configurations

Advanced users or administrators can define custom model configurations using an Ollama Modelfile, allowing for:

Custom system prompts

Tuned parameters (e.g., temperature, context length)

Derived models built on existing base models

User requests

Users may request additional models based on research or project needs. Requests are evaluated based on:

GPU memory requirements

Licensing and usage constraints

Relevance to supported workloads

Intended Use Cases

This service is suitable for:

Research assistance and exploratory analysis

Code generation and debugging

Documentation and content drafting

Internal tooling and automation workflows

Experimentation with ~~your~~open-source LLMs

Summary

The GRIT ~~credentials,~~LLM orserver ~~you~~provides ~~can~~a ~~create~~secure, high-performance environment for working with modern language models. By combining GPU acceleration with a flexible backend and an ~~account~~accessible ~~with~~web aninterface, ~~email~~the ~~address~~platform ifenables ~~you~~users ~~dont~~to ~~have~~leverage aAI ~~GRIT~~capabilities ~~account.~~efficiently while keeping all data within institutional infrastructure.