Implementing LiteLLM Proxy on AWS ECS: Optimizing Quotas and Ensuring High Availability

Case study

3 min

read

Introduction


As an AWS partner, our expertise allows us to tackle complex challenges for our clients. Recently, we had
the opportunity to implement LiteLLM Proxy in a self-hosted environment on AWS ECS for one of our
clients. The main objectives of this project were to address quota limitations, ensure high availability, and
balance the load across multiple language model providers.


Overview of LiteLLM Proxy


LiteLLM Proxy is a powerful tool that serves as a unified interface for accessing more than 100 different
language models (LLMs). It offers two primary usage modes:

  1. As an SDK for interacting with models via code.
  2. As a proxy server that abstracts multiple services behind a single OpenAI-compatible API.


For this project, we opted for the second approach, which provided numerous benefits, including:


Overall Architecture of the Solution


Our solution is built on a robust architecture deployed on AWS, comprising the following components:


Deploying LiteLLM Proxy on ECS involved several key steps:


1. Creating an S3 bucket to store LiteLLM Proxy configuration files (YAML format).

2. In an existing ECS cluster:


3. Setting up cross-account IAM roles:

4. Enabling various AI models in AWS accounts across different regions.

To automate these steps, we developed Terraform modules.

How to fetch LiteLLM Proxy Configuration from S3 in ECS ?


To ensure that LiteLLM Proxy loads its configuration dynamically, we set up an init container within the ECS
task definition. This container retrieves the configuration file from an S3 bucket before the main application
starts. Below is a Terraform snippet illustrating this process:

How to make Requests to LiteLLM Proxy ?


LiteLLM Proxy is fully compatible with OpenAI's API format, making it easy to use with standard tools.

Below are examples of how to make requests to LiteLLM Proxy using curl and Python.

Using curl

Using Python (or any OpenAI compatible SDK)

Quota Management with LiteLLM Proxy

One of the major challenges our client faced was managing quotas imposed by LLM providers. To overcome
this limitation, we implemented an innovative strategy:

This approach allowed the client to bypass initial limitations and ensure service continuity even during peak
usage.


A key security advantage of this architecture is that using IAM roles to access AI models in AWS Bedrock
eliminates the need to transmit passwords and other sensitive credentials.

Ensuring High Availability

To guarantee maximum service availability, we implemented several measures:

Load Balancing Between Providers

LiteLLM Proxy offers various routing strategies to optimise the use of different LLM providers:

For this implementation, we configured intelligent routing based on the "Least Busy" strategy. This
approach ensures a consistent and high-quality user experience.

Monitoring and Logging

To ensure optimal tracking of the solution, we configured and utilised:

LiteLLM Proxy also includes features such as alerts (Slack, Discord, Microsoft Teams, webhooks) for
notifications regarding:

For more advanced monitoring, we recommend specialized tools such as AgentOps and LangTrace.

Advanced Features and Future Enhancements

Although our current implementation meets the client's immediate needs, LiteLLM Proxy offers many
additional advanced features:

Looking ahead, we plan to explore additional features to further optimise the solution, including:

Conclusion

The implementation of LiteLLM Proxy on AWS ECS demonstrates how a well-architected cloud solution can
address complex challenges related to intensive LLM usage. By leveraging AWS ECS, cross-account IAM
roles, and LiteLLM Proxy, we successfully built a robust, scalable, and cost-effective solution.

This approach enabled our client to:

The lessons learned from this project highlight the importance of careful planning, flexible architecture, and
continuous monitoring to succeed in the evolving fields of AI and cloud computing.

As we continue exploring the possibilities offered by LiteLLM Proxy and AWS, we are confident that this
solution will evolve to meet the growing needs of innovative companies in web development and AI.

Thank you for reading this article. We hope you enjoyed it!

Contact us for more information about our accompaniment and expertise !