API Cache

Overview

TypeAuth’s LLM (Large Language Model) caching feature allows you to efficiently run language models while optimizing for performance and cost. This document explains how the feature works, its benefits, and how to configure it for your needs.

Key Features

Run open-source models directly through TypeAuth
Use popular models like OpenAI or Claude with your own API keys
Efficient caching using a vector database
Configurable verbosity levels
Adjustable similarity threshold for cache hits

Models supported

Open Source models

Meta
- LLama 3 8b
- LLama 3.1 8b
Mistral
- Mistral 7b v0.2

Propietary Models

OpenAI
- gpt-4o
- gpt-4o-mini
- gpt-4-turbo
Claude
- Claude 3.5 Sonnett
- Claude 3 Opus
- Claude 3 Sonnett
- Claude 3 Haikut

How It Works

Model Execution: You can run language models in two ways:
- Open-source models: Executed directly by TypeAuth
- Third-party models (e.g., OpenAI, Claude): Require your API key
Caching Mechanism: TypeAuth uses a vector database to cache queries and their responses.
Cache Lookup: When a new query is received, TypeAuth checks the cache for similar existing queries.
Response Serving:
- If a similar query is found in the cache, the cached response is served.
- If no similar query is found, the request is sent to the chosen model.
Billing: For open-source models run by TypeAuth, billing is based on neurons used.

Configuration

Verbosity Levels

You can control the length and detail of responses by setting a verbosity level:

Concise and short
Moderately explanatory
Very extensive

Default: No specified verbosity (model default)

Similarity Threshold

The similarity threshold determines when to serve cached responses:

Default: 97%
Lower threshold (e.g., 95%): More lenient matching, fewer requests to the origin
Higher threshold (e.g., 99%): Stricter matching, more requests to the origin

Benefits

Cost Reduction: By serving cached responses for similar queries, you can significantly reduce the number of tokens consumed.
Improved Response Time: Cached responses are served faster than generating new ones.
Consistency: Similar queries receive consistent responses.
Flexibility: Choose between open-source models or bring your own API key for third-party models.

Best Practices

Start with the default similarity threshold and adjust based on your specific needs.
Monitor cache hit rates and adjust the similarity threshold accordingly.
Use appropriate verbosity levels to balance between detailed responses and token consumption.
Regularly review and update your cached responses to ensure information accuracy.

On this page

Overview
Key Features
Models supported
Open Source models
Propietary Models
How It Works
Configuration
Verbosity Levels
Similarity Threshold
Benefits
Best Practices

Overview

Key Features

Run open-source models directly through TypeAuth
Use popular models like OpenAI or Claude with your own API keys
Efficient caching using a vector database
Configurable verbosity levels
Adjustable similarity threshold for cache hits

Models supported

Open Source models

Meta
- LLama 3 8b
- LLama 3.1 8b
Mistral
- Mistral 7b v0.2

Propietary Models

OpenAI
- gpt-4o
- gpt-4o-mini
- gpt-4-turbo
Claude
- Claude 3.5 Sonnett
- Claude 3 Opus
- Claude 3 Sonnett
- Claude 3 Haikut

How It Works

Model Execution: You can run language models in two ways:
- Open-source models: Executed directly by TypeAuth
- Third-party models (e.g., OpenAI, Claude): Require your API key
Caching Mechanism: TypeAuth uses a vector database to cache queries and their responses.
Cache Lookup: When a new query is received, TypeAuth checks the cache for similar existing queries.
Response Serving:
- If a similar query is found in the cache, the cached response is served.
- If no similar query is found, the request is sent to the chosen model.
Billing: For open-source models run by TypeAuth, billing is based on neurons used.

Configuration

Verbosity Levels

You can control the length and detail of responses by setting a verbosity level:

Concise and short
Moderately explanatory
Very extensive

Default: No specified verbosity (model default)

Similarity Threshold

The similarity threshold determines when to serve cached responses:

Default: 97%
Lower threshold (e.g., 95%): More lenient matching, fewer requests to the origin
Higher threshold (e.g., 99%): Stricter matching, more requests to the origin

Benefits

Cost Reduction: By serving cached responses for similar queries, you can significantly reduce the number of tokens consumed.
Improved Response Time: Cached responses are served faster than generating new ones.
Consistency: Similar queries receive consistent responses.
Flexibility: Choose between open-source models or bring your own API key for third-party models.

Best Practices

Start with the default similarity threshold and adjust based on your specific needs.
Monitor cache hit rates and adjust the similarity threshold accordingly.
Use appropriate verbosity levels to balance between detailed responses and token consumption.
Regularly review and update your cached responses to ensure information accuracy.

On this page

Overview
Key Features
Models supported
Open Source models
Propietary Models
How It Works
Configuration
Verbosity Levels
Similarity Threshold
Benefits
Best Practices

Overview

Key Features

Models supported

Open Source models

Propietary Models

How It Works

Configuration

Verbosity Levels

Similarity Threshold

Benefits

Best Practices

API Documentation

Applications

Authentication

Token

JWK

JWT

Security

Abuse Prevention

Monitor

Routing

Tags

API Cache

Overview

Key Features

Models supported

Open Source models

Propietary Models

How It Works

Configuration

Verbosity Levels

Similarity Threshold

Benefits

Best Practices

​Overview

​Key Features

​Models supported

​Open Source models

​Propietary Models

​How It Works

​Configuration

​Verbosity Levels

​Similarity Threshold

​Benefits

​Best Practices

API Documentation

Applications

Authentication

Token

JWK

JWT

Security

Abuse Prevention

Monitor

Routing

Tags

​Overview

​Key Features

​Models supported

​Open Source models

​Propietary Models

​How It Works

​Configuration

​Verbosity Levels

​Similarity Threshold

​Benefits

​Best Practices

Overview

Key Features

Models supported

Open Source models

Propietary Models

How It Works

Configuration

Verbosity Levels

Similarity Threshold

Benefits

Best Practices

Overview

Key Features

Models supported

Open Source models

Propietary Models

How It Works

Configuration

Verbosity Levels

Similarity Threshold

Benefits

Best Practices