This guide will help you to understand how TypeAuth cache works
TypeAuth’s LLM (Large Language Model) caching feature allows you to efficiently run language models while optimizing for performance and cost. This document explains how the feature works, its benefits, and how to configure it for your needs.
Model Execution: You can run language models in two ways:
Caching Mechanism: TypeAuth uses a vector database to cache queries and their responses.
Cache Lookup: When a new query is received, TypeAuth checks the cache for similar existing queries.
Response Serving:
Billing: For open-source models run by TypeAuth, billing is based on neurons used.
You can control the length and detail of responses by setting a verbosity level:
The similarity threshold determines when to serve cached responses:
Cost Reduction: By serving cached responses for similar queries, you can significantly reduce the number of tokens consumed.
Improved Response Time: Cached responses are served faster than generating new ones.
Consistency: Similar queries receive consistent responses.
Flexibility: Choose between open-source models or bring your own API key for third-party models.
This guide will help you to understand how TypeAuth cache works
TypeAuth’s LLM (Large Language Model) caching feature allows you to efficiently run language models while optimizing for performance and cost. This document explains how the feature works, its benefits, and how to configure it for your needs.
Model Execution: You can run language models in two ways:
Caching Mechanism: TypeAuth uses a vector database to cache queries and their responses.
Cache Lookup: When a new query is received, TypeAuth checks the cache for similar existing queries.
Response Serving:
Billing: For open-source models run by TypeAuth, billing is based on neurons used.
You can control the length and detail of responses by setting a verbosity level:
The similarity threshold determines when to serve cached responses:
Cost Reduction: By serving cached responses for similar queries, you can significantly reduce the number of tokens consumed.
Improved Response Time: Cached responses are served faster than generating new ones.
Consistency: Similar queries receive consistent responses.
Flexibility: Choose between open-source models or bring your own API key for third-party models.