Banana
Banana provided serverless GPU inference for AI models, including a CI/CD build pipeline and a simple Python framework (Potassium) to server your models.
This page covers how to use the Banana ecosystem within LangChain.
It is broken into two parts:
- installation and setup,
- and then references to specific Banana wrappers.
Installation and Setupβ
- Install with
pip install banana-dev
- Get an Banana api key from the Banana.dev dashboard and set it as an environment variable (
BANANA_API_KEY
) - Get your model's key and url slug from the model's details page
Define your Banana Templateβ
You'll need to set up a Github repo for your Banana app. You can get started in 5 minutes using this guide.
Alternatively, for a ready-to-go LLM example, you can check out Banana's CodeLlama-7B-Instruct-GPTQ GitHub repository. Just fork it and deploy it within Banana.
Other starter repos are available here.
Build the Banana appβ
To use Banana apps within Langchain, they must include the outputs
key
in the returned json, and the value must be a string.
# Return the results as a dictionary
result = {'outputs': result}
An example inference function would be:
@app.handler("/")
def handler(context: dict, request: Request) -> Response:
"""Handle a request to generate code from a prompt."""
model = context.get("model")
tokenizer = context.get("tokenizer")
max_new_tokens = request.json.get("max_new_tokens", 512)
temperature = request.json.get("temperature", 0.7)
prompt = request.json.get("prompt")
prompt_template=f'''[INST] Write code to solve the following coding problem that obeys the constraints and passes the example test cases. Please wrap your code answer using ```:
{prompt}
[/INST]
'''
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=temperature, max_new_tokens=max_new_tokens)
result = tokenizer.decode(output[0])
return Response(json={"outputs": result}, status=200)
This example is from the app.py
file in CodeLlama-7B-Instruct-GPTQ.
Wrappersβ
LLMβ
Within Langchain, there exists a Banana LLM wrapper, which you can access with
from langchain.llms import Banana
You need to provide a model key and model url slug, which you can get from the model's details page in the Banana.dev dashboard.
llm = Banana(model_key="YOUR_MODEL_KEY", model_url_slug="YOUR_MODEL_URL_SLUG")