Llama2 - Huggingface Tutorial
Huggingface is an open source platform to deploy machine-learnings models.
Call Llama2 with Huggingface Inference Endpoints
LiteLLM makes it easy to call your public, private or the default huggingface endpoints.
In this case, let's try and call 3 models:
Model | Type of Endpoint |
---|---|
deepset/deberta-v3-large-squad2 | Default Huggingface Endpoint |
meta-llama/Llama-2-7b-hf | Public Endpoint |
meta-llama/Llama-2-7b-chat-hf | Private Endpoint |
Case 1: Call default huggingface endpoint
Here's the complete example:
from litellm import completion
model = "deepset/deberta-v3-large-squad2"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format
### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface")
What's happening?
- model: This is the name of the deployed model on huggingface
- messages: This is the input. We accept the OpenAI chat format. For huggingface, by default we iterate through the list and add the message["content"] to the prompt. Relevant Code
- custom_llm_provider: Optional param. This is an optional flag, needed only for Azure, Replicate, Huggingface and Together-ai (platforms where you deploy your own models). This enables litellm to route to the right provider, for your model.
Case 2: Call Llama2 public Huggingface endpoint
We've deployed meta-llama/Llama-2-7b-hf
behind a public endpoint - https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud
.
Let's try it out:
from litellm import completion
model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"
### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base)
What's happening?
- api_base: Optional param. Since this uses a deployed endpoint (not the default huggingface inference endpoint), we pass that to LiteLLM.
Case 3: Call Llama2 private Huggingface endpoint
The only difference between this and the public endpoint, is that you need an api_key
for this.
On LiteLLM there's 3 ways you can pass in an api_key.
Either via environment variables, by setting it as a package variable or when calling completion()
.
Setting via environment variables
Here's the 1 line of code you need to add
os.environ["HF_TOKEN"] = "..."
Here's the full code:
from litellm import completion
os.environ["HF_TOKEN"] = "..."
model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"
### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base)
Setting it as package variable
Here's the 1 line of code you need to add
litellm.huggingface_key = "..."
Here's the full code:
import litellm
from litellm import completion
litellm.huggingface_key = "..."
model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"
### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base)
Passed in during completion call
completion(..., api_key="...")
Here's the full code:
from litellm import completion
model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"
### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base, api_key="...")