AI Features & Privacy - AssisT Docs

Overview

AssisT uses a privacy-first hybrid AI system that gives you four ways to use AI features, from completely offline to cloud-powered. Your data stays on your device by default.

Four AI Modes

AssisT offers flexible AI processing with four distinct modes:

Mode	Privacy	Cost	Performance	Requirements
Off	N/A	Free	Features disabled	None
Local AI (Ollama)	100% Private	Free	Good (hardware-dependent)	Ollama installed
Cloud AI (API)	Your API only	Pay-per-use	Excellent	API key
Gemini Nano	100% Private	Free	Fast, on-device	Chrome 128+

You can switch between modes anytime in AssisT settings using the radio toggle selector.

Key Principles

Your Choice: Pick the AI mode that matches your privacy and performance needs
No Data Collection: We never see, store, or transmit your data
Bring Your Own Key: Cloud mode uses your own API keys, not ours
Graceful Fallback: Features work even without AI (with reduced functionality)

Local AI with Ollama

AssisT integrates with Ollama, a free, open-source tool that runs AI models directly on your computer.

Why Local AI?

Benefit	Description
Privacy	Data never leaves your device
Compliance	Safe for GDPR, FERPA, and HIPAA environments
No Cost	No API fees or subscriptions
Offline	Works without internet connection
Speed	No network latency for requests

Supported Models

AssisT automatically detects and uses available Ollama models:

Model	Size	Best For
phi3:mini	2GB	Fast responses, basic tasks
llama3.2	5GB	Balanced performance
mistral	4GB	Complex analysis, detailed responses
llava	4GB	Image understanding (vision)

Installing Ollama

Download Ollama from ollama.ai
Install and run Ollama on your computer
AssisT will automatically detect it

Installing Models

Once Ollama is running, you can install models directly from AssisT:

Open AssisT settings
Go to AI Settings > Local Models
Click Install next to your preferred model
Wait for the download to complete

Recommended Model Sets:

Minimal (2GB): phi3:mini - Fast responses for basic tasks
Balanced (5GB): phi3:mini + llama3.2 - Good for most users
Full (10GB): All models including vision - Complete AI capabilities

AssisT will recommend a model set based on your system’s available memory.

How Local AI Works

Your Browser (AssisT)
        ↓
    Message Bridge
        ↓
Ollama (localhost:11434)
        ↓
    AI Response
        ↓
Back to AssisT

All communication happens locally on your machine. Nothing is sent to external servers.

Cloud Providers (Optional)

For users who want more powerful AI capabilities, AssisT supports multiple cloud providers through API keys you provide.

Supported Providers

Provider	Strengths	Best For
Anthropic (Claude)	Coding, academic writing, analysis	Text simplification, tutoring
OpenAI (ChatGPT)	Creative, conversational	Brainstorming, general tasks
Google (Gemini)	Multimodal, visual, factual	Image understanding
Perplexity	Real-time web, citations	Research, fact-checking

Bringing Your Own API Key

Get an API key from your preferred provider:
- Anthropic Console (Claude)
- OpenAI Platform (ChatGPT)
- Google AI Studio (Gemini)
- Perplexity Settings
Open AssisT settings
Go to AI Settings > Cloud Providers
Select your provider and enter your API key
Choose your preferred model

Cost vs Quality

Model Type	Cost	Best For
Fast (Haiku, GPT-4o-mini, Flash)	Cheaper per token	Simple tasks, high volume
Balanced (Sonnet, GPT-4o, Pro)	Moderate	Most use cases
Quality (Opus, GPT-4)	Higher per token	Complex tasks, accuracy critical

Tip: Start with faster models for simple tasks. Use larger models when you need more nuanced or accurate responses.

API Key Security

Your API keys are stored locally in Chrome’s secure storage
They are never sent to Fiavaion servers
Only transmitted directly to the provider when you use cloud features
You can remove them anytime from settings

Claude 4.5/4.6 Models (Anthropic)

When using Cloud AI mode with an Anthropic API key, AssisT supports the latest Claude models for powerful language understanding and generation:

Model	Model ID	Best For	Input Cost	Output Cost
Haiku 4.5	`claude-haiku-4-5`	Quick answers, simple tasks, high volume	$0.001/1K tokens	$0.005/1K tokens
Sonnet 4.5	`claude-sonnet-4-5`	Everyday tasks, balanced performance (recommended)	$0.003/1K tokens	$0.015/1K tokens
Opus 4.6	`claude-opus-4-6`	Complex analysis, critical work, highest quality	$0.015/1K tokens	$0.075/1K tokens

Cost Example: A typical 500-word document summary using Sonnet 4.5 costs approximately $0.002-0.004 per request.

Recommendation: Start with Sonnet 4.5 for the best balance of quality and cost. Use Haiku 4.5 for simple, high-volume tasks. Reserve Opus 4.6 for complex analysis where accuracy is critical.

Feature-Specific Defaults:

Summarization: Haiku 4.5 (fast, sufficient for most summaries)
Text Simplification: Sonnet 4.5 (better comprehension and clarity)
Assignment Breakdown: Sonnet 4.5 (detailed task analysis)
Socratic Tutor: Opus 4.6 (complex reasoning and questioning)
Citation Analysis: Sonnet 4.5 (balanced accuracy and speed)
Multi-Document Compare: Opus 4.6 (handles complexity well)

Gemini 2.0 Models

AssisT now supports Google’s latest Gemini 2.0 models when using Cloud AI mode with a Google API key:

Model	Description	Best For
Gemini 1.5 Flash	Fast, efficient, affordable	Quick tasks, high volume
Gemini 1.5 Pro	Balanced performance and quality	General use, complex tasks
Gemini 2.0 Flash Experimental	Latest experimental model	Cutting-edge features, testing

These models offer improved reasoning, multimodal understanding, and longer context windows compared to earlier versions.

Gemini Nano (Chrome Built-In AI)

NEW: AssisT now supports Chrome’s built-in Gemini Nano model for completely private, on-device AI processing without installing anything.

What is Gemini Nano?

Gemini Nano is Google’s smallest AI model, built directly into Chrome 128 and later. It runs entirely on your device using Chrome’s Prompt API (window.ai).

Benefits

Zero Setup: No Ollama installation required
100% Private: All processing happens locally in Chrome
No API Costs: Completely free to use
Fast: Optimized for on-device performance
Offline: Works without internet connection

Requirements

To use Gemini Nano mode, you need:

Chrome 128 or later (Canary, Dev, Beta, or Stable)
Feature flag enabled: Visit chrome://flags/#optimization-guide-on-device-model and set to “Enabled”
Model download: Chrome downloads the model automatically on first use (happens in background)

How to Enable

Open AssisT settings
Go to AI Settings
Select Gemini Nano mode
AssisT will check availability and show status

Status Indicators

Status	Meaning	What to Do
Ready	Model downloaded and available	Start using AI features
Needs Download	Model downloading in background	Wait a few minutes, reload
Not Supported	Chrome version too old or device incompatible	Update Chrome or use different mode
Unavailable	Feature flag not enabled	Enable flag at `chrome://flags`

Gemini Nano vs Ollama

Feature	Gemini Nano	Ollama
Setup	Chrome flag only	Install separate app
Model Size	~2GB (built into Chrome)	2GB-7GB per model
Model Choice	Single model (Google’s)	Many models available
Performance	Good for basic tasks	Better for complex tasks
Customization	Limited	Full control

Recommendation: Try Gemini Nano first for simplicity. Switch to Ollama if you need more powerful models or specific capabilities.

How the AI Mode System Works

AssisT routes requests based on your selected AI mode:

Feature Request
      ↓
Check Selected AI Mode
      ↓
   ┌──────┴──────────────────────────┐
   │                                  │
OFF Mode               AI Enabled Mode
   │                          │
   └→ Fallback      ┌─────────┴──────────┐
      behavior      │                    │
                    │                    │
              Local AI Mode        Cloud AI Mode
                    │                    │
             ┌──────┴──────┐        ┌────┴────┐
             │             │        │         │
        Ollama Mode   Gemini Nano  API Key   Fallback
             │             │        │
        Use Ollama    Use Chrome   Use Cloud
        (private)     built-in AI  Provider
                      (private)    (your key)

Task-Based Model Selection

Different features use the best available model for optimal results:

Feature	Gemini Nano	Ollama (Local)	Recommended Cloud
Summarization	✅ Good	phi3:mini, llama3.2	Any fast model
Text Simplification	✅ Good	llama3.2, mistral	Anthropic (clarity)
Socratic Tutor	⚠️ Basic	mistral	Anthropic (reasoning)
Assignment Breakdown	✅ Good	llama3.2, mistral	Claude, GPT-4
Image Understanding	❌ Not supported	llava	Gemini or GPT-4o
Research & Citations	❌ Not supported	❌ No web access	Perplexity (web access)

Fallback Behaviors

When AI isn’t available, features gracefully degrade:

Feature	Fallback Behavior
Summarize	Shows first paragraph
Simplify	Feature disabled with message
Image Describe	Requires vision model
TTS Prosody	Uses neutral tone

Privacy Guarantees

What We Never Do

Collect or store your data
Send data to our servers
Track your AI usage
Share information with third parties

What Stays Local

All text you process
Documents you summarize
Images you analyze
Conversation history

GDPR/FERPA/HIPAA Compliance

Because AssisT processes everything locally:

GDPR: No personal data is transmitted
FERPA: Student data stays on the device
HIPAA: Patient information never leaves the browser

This makes AssisT safe for educational institutions and healthcare settings.

Performance Tips

For Best Local AI Performance

Use an SSD: Faster model loading
8GB+ RAM/VRAM: Required for larger models
Keep Ollama Running: Faster first response
Choose Appropriate Models: Match model size to your hardware

Why Memory Matters

More VRAM = Better Models: With more video memory (or unified memory on Apple Silicon), you can run larger, more capable models
More Memory = Longer Context: Additional memory allows longer context windows—the AI can “remember” more of your document
Longer Context = Fewer Hallucinations: When AI sees more context, it makes fewer mistakes because it has more information to work with

Memory Types

Type	What Matters	Notes
Dedicated GPU	VRAM (8GB good, 12GB+ great)	NVIDIA/AMD graphics cards
Apple Silicon	Unified memory (16GB good, 32GB+ excellent)	M1/M2/M3/M4 Macs
CPU-only	System RAM (16GB min, 32GB recommended)	Slower but works

Recommended System Requirements

Setup	RAM/VRAM	Storage	Models
Minimal	8GB	4GB free	phi3:mini
Standard	16GB	8GB free	phi3:mini + llama3.2
Full	32GB+	15GB free	All models + longer context

Troubleshooting

Ollama Not Detected

Ensure Ollama is installed and running
Check that it’s accessible at localhost:11434
Restart Ollama if needed
Refresh the AssisT extension

Slow Responses

Try a smaller model (phi3:mini is fastest)
Ensure Ollama isn’t processing other requests
Check your system’s available memory
Close other resource-intensive applications

Model Download Failed

Check your internet connection
Ensure enough disk space is available
Try downloading a smaller model first
Restart Ollama and try again