AI Features & Privacy
Learn how AssisT handles AI processing with local LLMs and optional cloud APIs while keeping your data private.
Overview
AssisT uses a privacy-first hybrid AI system that gives you four ways to use AI features, from completely offline to cloud-powered. Your data stays on your device by default.
Four AI Modes
AssisT offers flexible AI processing with four distinct modes:
| Mode | Privacy | Cost | Performance | Requirements |
|---|---|---|---|---|
| Off | N/A | Free | Features disabled | None |
| Local AI (Ollama) | 100% Private | Free | Good (hardware-dependent) | Ollama installed |
| Cloud AI (API) | Your API only | Pay-per-use | Excellent | API key |
| Gemini Nano | 100% Private | Free | Fast, on-device | Chrome 128+ |
You can switch between modes anytime in AssisT settings using the radio toggle selector.
Key Principles
- Your Choice: Pick the AI mode that matches your privacy and performance needs
- No Data Collection: We never see, store, or transmit your data
- Bring Your Own Key: Cloud mode uses your own API keys, not ours
- Graceful Fallback: Features work even without AI (with reduced functionality)
Local AI with Ollama
AssisT integrates with Ollama, a free, open-source tool that runs AI models directly on your computer.
Why Local AI?
| Benefit | Description |
|---|---|
| Privacy | Data never leaves your device |
| Compliance | Safe for GDPR, FERPA, and HIPAA environments |
| No Cost | No API fees or subscriptions |
| Offline | Works without internet connection |
| Speed | No network latency for requests |
Supported Models
AssisT automatically detects and uses available Ollama models:
| Model | Size | Best For |
|---|---|---|
| phi3:mini | 2GB | Fast responses, basic tasks |
| llama3.2 | 5GB | Balanced performance |
| mistral | 4GB | Complex analysis, detailed responses |
| llava | 4GB | Image understanding (vision) |
Installing Ollama
- Download Ollama from ollama.ai
- Install and run Ollama on your computer
- AssisT will automatically detect it
Installing Models
Once Ollama is running, you can install models directly from AssisT:
- Open AssisT settings
- Go to AI Settings > Local Models
- Click Install next to your preferred model
- Wait for the download to complete
Recommended Model Sets:
- Minimal (2GB):
phi3:mini- Fast responses for basic tasks - Balanced (5GB):
phi3:mini+llama3.2- Good for most users - Full (10GB): All models including vision - Complete AI capabilities
AssisT will recommend a model set based on your system’s available memory.
How Local AI Works
Your Browser (AssisT)
↓
Message Bridge
↓
Ollama (localhost:11434)
↓
AI Response
↓
Back to AssisT
All communication happens locally on your machine. Nothing is sent to external servers.
Cloud Providers (Optional)
For users who want more powerful AI capabilities, AssisT supports multiple cloud providers through API keys you provide.
Supported Providers
| Provider | Strengths | Best For |
|---|---|---|
| Anthropic (Claude) | Coding, academic writing, analysis | Text simplification, tutoring |
| OpenAI (ChatGPT) | Creative, conversational | Brainstorming, general tasks |
| Google (Gemini) | Multimodal, visual, factual | Image understanding |
| Perplexity | Real-time web, citations | Research, fact-checking |
Bringing Your Own API Key
- Get an API key from your preferred provider:
- Anthropic Console (Claude)
- OpenAI Platform (ChatGPT)
- Google AI Studio (Gemini)
- Perplexity Settings
- Open AssisT settings
- Go to AI Settings > Cloud Providers
- Select your provider and enter your API key
- Choose your preferred model
Cost vs Quality
| Model Type | Cost | Best For |
|---|---|---|
| Fast (Haiku, GPT-4o-mini, Flash) | Cheaper per token | Simple tasks, high volume |
| Balanced (Sonnet, GPT-4o, Pro) | Moderate | Most use cases |
| Quality (Opus, GPT-4) | Higher per token | Complex tasks, accuracy critical |
Tip: Start with faster models for simple tasks. Use larger models when you need more nuanced or accurate responses.
API Key Security
- Your API keys are stored locally in Chrome’s secure storage
- They are never sent to Fiavaion servers
- Only transmitted directly to the provider when you use cloud features
- You can remove them anytime from settings
Claude 4.5/4.6 Models (Anthropic)
When using Cloud AI mode with an Anthropic API key, AssisT supports the latest Claude models for powerful language understanding and generation:
| Model | Model ID | Best For | Input Cost | Output Cost |
|---|---|---|---|---|
| Haiku 4.5 | claude-haiku-4-5 | Quick answers, simple tasks, high volume | $0.001/1K tokens | $0.005/1K tokens |
| Sonnet 4.5 | claude-sonnet-4-5 | Everyday tasks, balanced performance (recommended) | $0.003/1K tokens | $0.015/1K tokens |
| Opus 4.6 | claude-opus-4-6 | Complex analysis, critical work, highest quality | $0.015/1K tokens | $0.075/1K tokens |
Cost Example: A typical 500-word document summary using Sonnet 4.5 costs approximately $0.002-0.004 per request.
Recommendation: Start with Sonnet 4.5 for the best balance of quality and cost. Use Haiku 4.5 for simple, high-volume tasks. Reserve Opus 4.6 for complex analysis where accuracy is critical.
Feature-Specific Defaults:
- Summarization: Haiku 4.5 (fast, sufficient for most summaries)
- Text Simplification: Sonnet 4.5 (better comprehension and clarity)
- Assignment Breakdown: Sonnet 4.5 (detailed task analysis)
- Socratic Tutor: Opus 4.6 (complex reasoning and questioning)
- Citation Analysis: Sonnet 4.5 (balanced accuracy and speed)
- Multi-Document Compare: Opus 4.6 (handles complexity well)
Gemini 2.0 Models
AssisT now supports Google’s latest Gemini 2.0 models when using Cloud AI mode with a Google API key:
| Model | Description | Best For |
|---|---|---|
| Gemini 1.5 Flash | Fast, efficient, affordable | Quick tasks, high volume |
| Gemini 1.5 Pro | Balanced performance and quality | General use, complex tasks |
| Gemini 2.0 Flash Experimental | Latest experimental model | Cutting-edge features, testing |
These models offer improved reasoning, multimodal understanding, and longer context windows compared to earlier versions.
Gemini Nano (Chrome Built-In AI)
NEW: AssisT now supports Chrome’s built-in Gemini Nano model for completely private, on-device AI processing without installing anything.
What is Gemini Nano?
Gemini Nano is Google’s smallest AI model, built directly into Chrome 128 and later. It runs entirely on your device using Chrome’s Prompt API (window.ai).
Benefits
- Zero Setup: No Ollama installation required
- 100% Private: All processing happens locally in Chrome
- No API Costs: Completely free to use
- Fast: Optimized for on-device performance
- Offline: Works without internet connection
Requirements
To use Gemini Nano mode, you need:
- Chrome 128 or later (Canary, Dev, Beta, or Stable)
- Feature flag enabled: Visit
chrome://flags/#optimization-guide-on-device-modeland set to “Enabled” - Model download: Chrome downloads the model automatically on first use (happens in background)
How to Enable
- Open AssisT settings
- Go to AI Settings
- Select Gemini Nano mode
- AssisT will check availability and show status
Status Indicators
| Status | Meaning | What to Do |
|---|---|---|
| Ready | Model downloaded and available | Start using AI features |
| Needs Download | Model downloading in background | Wait a few minutes, reload |
| Not Supported | Chrome version too old or device incompatible | Update Chrome or use different mode |
| Unavailable | Feature flag not enabled | Enable flag at chrome://flags |
Gemini Nano vs Ollama
| Feature | Gemini Nano | Ollama |
|---|---|---|
| Setup | Chrome flag only | Install separate app |
| Model Size | ~2GB (built into Chrome) | 2GB-7GB per model |
| Model Choice | Single model (Google’s) | Many models available |
| Performance | Good for basic tasks | Better for complex tasks |
| Customization | Limited | Full control |
Recommendation: Try Gemini Nano first for simplicity. Switch to Ollama if you need more powerful models or specific capabilities.
How the AI Mode System Works
AssisT routes requests based on your selected AI mode:
Feature Request
↓
Check Selected AI Mode
↓
┌──────┴──────────────────────────┐
│ │
OFF Mode AI Enabled Mode
│ │
└→ Fallback ┌─────────┴──────────┐
behavior │ │
│ │
Local AI Mode Cloud AI Mode
│ │
┌──────┴──────┐ ┌────┴────┐
│ │ │ │
Ollama Mode Gemini Nano API Key Fallback
│ │ │
Use Ollama Use Chrome Use Cloud
(private) built-in AI Provider
(private) (your key)
Task-Based Model Selection
Different features use the best available model for optimal results:
| Feature | Gemini Nano | Ollama (Local) | Recommended Cloud |
|---|---|---|---|
| Summarization | ✅ Good | phi3:mini, llama3.2 | Any fast model |
| Text Simplification | ✅ Good | llama3.2, mistral | Anthropic (clarity) |
| Socratic Tutor | ⚠️ Basic | mistral | Anthropic (reasoning) |
| Assignment Breakdown | ✅ Good | llama3.2, mistral | Claude, GPT-4 |
| Image Understanding | ❌ Not supported | llava | Gemini or GPT-4o |
| Research & Citations | ❌ Not supported | ❌ No web access | Perplexity (web access) |
Fallback Behaviors
When AI isn’t available, features gracefully degrade:
| Feature | Fallback Behavior |
|---|---|
| Summarize | Shows first paragraph |
| Simplify | Feature disabled with message |
| Image Describe | Requires vision model |
| TTS Prosody | Uses neutral tone |
Privacy Guarantees
What We Never Do
- Collect or store your data
- Send data to our servers
- Track your AI usage
- Share information with third parties
What Stays Local
- All text you process
- Documents you summarize
- Images you analyze
- Conversation history
GDPR/FERPA/HIPAA Compliance
Because AssisT processes everything locally:
- GDPR: No personal data is transmitted
- FERPA: Student data stays on the device
- HIPAA: Patient information never leaves the browser
This makes AssisT safe for educational institutions and healthcare settings.
Performance Tips
For Best Local AI Performance
- Use an SSD: Faster model loading
- 8GB+ RAM/VRAM: Required for larger models
- Keep Ollama Running: Faster first response
- Choose Appropriate Models: Match model size to your hardware
Why Memory Matters
- More VRAM = Better Models: With more video memory (or unified memory on Apple Silicon), you can run larger, more capable models
- More Memory = Longer Context: Additional memory allows longer context windows—the AI can “remember” more of your document
- Longer Context = Fewer Hallucinations: When AI sees more context, it makes fewer mistakes because it has more information to work with
Memory Types
| Type | What Matters | Notes |
|---|---|---|
| Dedicated GPU | VRAM (8GB good, 12GB+ great) | NVIDIA/AMD graphics cards |
| Apple Silicon | Unified memory (16GB good, 32GB+ excellent) | M1/M2/M3/M4 Macs |
| CPU-only | System RAM (16GB min, 32GB recommended) | Slower but works |
Recommended System Requirements
| Setup | RAM/VRAM | Storage | Models |
|---|---|---|---|
| Minimal | 8GB | 4GB free | phi3:mini |
| Standard | 16GB | 8GB free | phi3:mini + llama3.2 |
| Full | 32GB+ | 15GB free | All models + longer context |
Troubleshooting
Ollama Not Detected
- Ensure Ollama is installed and running
- Check that it’s accessible at
localhost:11434 - Restart Ollama if needed
- Refresh the AssisT extension
Slow Responses
- Try a smaller model (phi3:mini is fastest)
- Ensure Ollama isn’t processing other requests
- Check your system’s available memory
- Close other resource-intensive applications
Model Download Failed
- Check your internet connection
- Ensure enough disk space is available
- Try downloading a smaller model first
- Restart Ollama and try again