Banana

Banana provided serverless GPU inference for AI models, including a CI/CD build pipeline and a simple Python framework (Potassium) to server your models.

This page covers how to use the Banana ecosystem within LangChain.

It is broken into two parts:

installation and setup,
and then references to specific Banana wrappers.

Installation and Setup

Install with pip install banana-dev
Get an Banana api key from the Banana.dev dashboard and set it as an environment variable (BANANA_API_KEY)
Get your model's key and url slug from the model's details page

Define your Banana Template

You'll need to set up a Github repo for your Banana app. You can get started in 5 minutes using this guide.

Alternatively, for a ready-to-go LLM example, you can check out Banana's CodeLlama-7B-Instruct-GPTQ GitHub repository. Just fork it and deploy it within Banana.

Other starter repos are available here.

Build the Banana app

To use Banana apps within Langchain, they must include the outputs key in the returned json, and the value must be a string.

# Return the results as a dictionary
result = {'outputs': result}

An example inference function would be:

@app.handler("/")
def handler(context: dict, request: Request) -> Response:
    """Handle a request to generate code from a prompt."""
    model = context.get("model")
    tokenizer = context.get("tokenizer")
    max_new_tokens = request.json.get("max_new_tokens", 512)
    temperature = request.json.get("temperature", 0.7)
    prompt = request.json.get("prompt")
    prompt_template=f'''[INST] Write code to solve the following coding problem that obeys the constraints and passes the example test cases. Please wrap your code answer using ```:
    {prompt}
    [/INST]
    '''
    input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
    output = model.generate(inputs=input_ids, temperature=temperature, max_new_tokens=max_new_tokens)
    result = tokenizer.decode(output[0])
    return Response(json={"outputs": result}, status=200)

This example is from the app.py file in CodeLlama-7B-Instruct-GPTQ.

Wrappers

LLM

Within Langchain, there exists a Banana LLM wrapper, which you can access with

from langchain.llms import Banana

You need to provide a model key and model url slug, which you can get from the model's details page in the Banana.dev dashboard.

llm = Banana(model_key="YOUR_MODEL_KEY", model_url_slug="YOUR_MODEL_URL_SLUG")

Banana

Installation and Setup​

Define your Banana Template​

Build the Banana app​

Wrappers​

LLM​

Installation and Setup

Define your Banana Template

Build the Banana app

Wrappers

LLM