GRIT LLM
The GRIT LLM server is a centrally managed, GPU-accelerated AI service designed to provide researchers and staff with secure, high-performance access to large language models. The platform is powered by dual NVIDIA L40S GPUs and delivers fast, private inference without reliance on external cloud-based AI services.
Overview of the Service
The service combines two core components:
-
OpenWebUI — a web-based interface that provides a user-friendly, ChatGPT-like experience
-
Ollama — a backend model runtime responsible for running and managing large language models locally
Together, these components provide a seamless environment where users can interact with AI models through a browser while computations are performed on dedicated GPU hardware within the GRIT infrastructure.
What We Provide
The GRIT LLM server offers:
-
Web-based access to AI models
Users can interact with language models through an intuitive browser interface with no local installation required. -
Access to multiple models
A curated set of open-source models (e.g., LLaMA, Mistral, and code-focused models) are availableatandllm.grit.ucsb.edu.selectable within the interface. -
High-performance GPU acceleration
All inference workloads are executed on L40S GPUs, enabling significantly faster responses compared to CPU-based systems. -
Private, on-premise processing
All data and prompts remain within GRIT-managed infrastructure, supporting research workflows that require data privacy or compliance controls. -
Multi-user support
The platform is designed for shared use, allowing multiple users to access models concurrently. -
Persistent chat history and sessions
Users can maintain conversation history and revisit prior interactions within the interface.
How the Platform Works
-
Ollama serves as the model engine. It
canhandles downloading, storing, and executing language models, and exposes them through a local API. -
OpenWebUI connects to Ollama and provides the frontend experience, allowing users to select models, submit prompts, and view responses.
This separation allows the system to be loggedflexible intoand extensible while maintaining a consistent user experience.
Available Models
A selection of commonly used models is pre-installed and maintained by GRIT. These may include:
-
General-purpose chat models
-
Code generation and analysis models
-
Lightweight and high-performance variants for different workloads
Additional models may be made available upon request.
Adding or Requesting New Models
Administrator-managed models
GRIT administrators can deploy new models by pulling them through Ollama:
ollama pull <model-name>
Once installed, models are automatically available in OpenWebUI.
Custom model configurations
Advanced users or administrators can define custom model configurations using an Ollama Modelfile, allowing for:
-
Custom system prompts
-
Tuned parameters (e.g., temperature, context length)
-
Derived models built on existing base models
User requests
Users may request additional models based on research or project needs. Requests are evaluated based on:
-
GPU memory requirements
-
Licensing and usage constraints
-
Relevance to supported workloads
Intended Use Cases
This service is suitable for:
-
Research assistance and exploratory analysis
-
Code generation and debugging
-
Documentation and content drafting
-
Internal tooling and automation workflows
-
Experimentation with
youropen-source LLMs
Summary
The GRIT credentials,LLM orserver youprovides cana createsecure, high-performance environment for working with modern language models. By combining GPU acceleration with a flexible backend and an accountaccessible withweb aninterface, emailthe addressplatform ifenables youusers dontto haveleverage aAI GRITcapabilities account.efficiently while keeping all data within institutional infrastructure.