How to Use Large Language Models While Reducing Cost and Improving Performance

1 minute read

Researchers at Stanford have proposed a method called FrugalGPT to harness the power of large language models while significantly reducing their inference cost. It can potentially match GPT-4’s performance while reducing cost by 98%.

Some highlights:

There is rapidly growing number of large language models (LLMs) available as commercial APIs. Using these APIs can be very expensive, ranging from $700k per day to $21k per month for some use cases.
The cost of using different LLM APIs varies significantly, up to two orders of magnitude. For example, processing 10 million input tokens with GPT-J costs $0.2 while GPT-4 costs $30.
The paper proposes three strategies to reduce the cost of using LLMs while maintaining performance:
1. Prompt adaptation: Using shorter prompts to reduce input length and save cost. This includes prompt selection (using only relevant examples) and query concatenation (aggregating multiple queries into one prompt).
2. LLM approximation: Approximating expensive LLMs with smaller, cheaper models for specific tasks. This includes caching previously generated answers and fine-tuning cheap models with answers from expensive LLMs.
3. LLM cascade: Selectively choosing which LLM APIs to use for different queries based on cost and reliability. Cheaper LLMs are queried first, reserving expensive ones only for “hard” queries.
FrugalGPT’s key technique is LLM cascade. In experiments:
- It matched GPT-4’s performance while reducing cost by an astonishing 98%!
- It improved accuracy over GPT-4 by 4% with the same cost.
By composing strategies, greater gains are possible.
In a simple cascade, more affordable APIs like GPT-J and J1-L answer most queries, while reserving GPT-4 for the hardest queries. This cuts costs drastically while maintaining performance.

References

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

Share on

Twitter Facebook LinkedIn

Tuan Vu

How to Use Large Language Models While Reducing Cost and Improving Performance

References

Share on

Leave a comment

You may also enjoy

System Design Interview Prep Guide

On How To Stand Out in Your Job Application: Tips for New Grads

Reflection on System Efficiency at Meta: A Personal Perspective

10-Year Anniversary: A Journey of Friendship, Food, and Dreams