Amazon Bedrock Intelligent Prompt Routing について(プレビュー)

こんにちは。AWS CLI が好きな福島です。

はじめに

本日は、AWS re:Invent 2024 にてプレビューでリリースされた Amazon Bedrock Intelligent Prompt Routing について記事を執筆します。

aws.amazon.com

AWSの方が記載したブログもあるため、今回をこれを参考にしています。

aws.amazon.com

はじめに
Amazon Bedrock Intelligent Prompt Routing とは？
触ってみる
終わりに

Amazon Bedrock Intelligent Prompt Routing とは？

一言で言うと、プロンプトに応じて、コストパフォーマンスが最適なモデルを選んでくれる機能になります。

現在は、以下の特徴や制約がありますが、大変便利な機能なので早く日本語をサポートしてくれると嬉しいですね。

提供されているルーティング設定は以下の2つ
- Anthropic の Claude 3 Haiku or Claude 3.5 Sonnet（デフォルト）にルーティング
- Meta の Llama 3.1 8B Instruct or Llama 3.1 70B Instruct （デフォルト）にルーティング
英語のプロンプトのみをサポートしていること
米国東部 (バージニア北部) および米国西部 (オレゴン)のみ利用可能

また、複雑な設定が必要なのかなと思っていましたが、 Prompt Router 用のモデルIDが提供されているため、利用者は、推論する際にそのモデルIDを指定するだけで良いので導入も楽ちんです！複雑なルーティングの実装を AWS がよしなにやってくれているのはありがたいですね。

触ってみる

マネジメントコンソール

コンソールの画面は以下の通りでナビゲーションペインに「Prompt Routers」という項目が追加されています。 Anthropic Prompt Router、Meta Prompt Router の2つを選択可能です。

どちらかの Router を選択し、「Open in playground」を押下します。

いつもの playground が表示されますが、普段モデル名が表示されている箇所が「Anthropic Prompt Router」となっていることが分かります。

続いて、AWS さんのブログで紹介されている以下のプロンプトを実行すると、回答が生成されます。

Alice has N brothers and she also has M sisters. How many sisters does Alice’s brothers have?
日本語訳：アリスには N 兄弟がいて、M 姉妹もいます。アリスの兄弟には何人の姉妹がいますか?

右側にある新しい「Router metrics」のアイコンを押下すると、Prompt Router によって選択されたモデルを確認することができます。今回はプロンプトが複雑だったため、Anthropic の Claude 3.5 Sonnet が使用されました。

次に、以下のプロンプトを実行してみます。

Describe the purpose of a 'hello world' program in one line.
日本語訳：「hello world」プログラムの目的を 1 行で説明します。

今回は、Prompt Router によって、Anthropic の Claude 3 Haiku が選ばれていることが分かります。

AWS CLI

AWS CLI から Prompt Router を呼び出したいと思います。

まずは、以下のコマンドで Prompt Router の Arn を確認します。

$ aws bedrock list-prompt-routers --region us-east-1

実行結果は以下の通りです。

{
    "promptRouterSummaries": [
        {
            "promptRouterName": "Anthropic Prompt Router",
            "routingCriteria": {
                "responseQualityDifference": 0.26
            },
            "description": "Routes requests among models in the Claude family",
            "createdAt": "2024-11-20T00:00:00+00:00",
            "updatedAt": "2024-11-20T00:00:00+00:00",
            "promptRouterArn": "arn:aws:bedrock:us-east-1:123456789012:default-prompt-router/anthropic.claude:1",
            "models": [
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123456789012:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0"
                },
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123456789012:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
                }
            ],
            "fallbackModel": {
                "modelArn": "arn:aws:bedrock:us-east-1:123456789012:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
            },
            "status": "AVAILABLE",
            "type": "default"
        },
        {
            "promptRouterName": "Meta Prompt Router",
            "routingCriteria": {
                "responseQualityDifference": 0.0
            },
            "description": "Routes requests among models in the LLaMA family",
            "createdAt": "2024-11-20T00:00:00+00:00",
            "updatedAt": "2024-11-20T00:00:00+00:00",
            "promptRouterArn": "arn:aws:bedrock:us-east-1:123456789012:default-prompt-router/meta.llama:1",
            "models": [
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123456789012:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
                },
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123456789012:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
                }
            ],
            "fallbackModel": {
                "modelArn": "arn:aws:bedrock:us-east-1:123456789012:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
            },
            "status": "AVAILABLE",
            "type": "default"
        }
    ]
}
$

続いて以下のコマンドで推論を行います。

aws bedrock-runtime converse \
    --model-id arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/anthropic.claude:1 \
    --messages '[{ "role": "user", "content": [ { "text": "Alice has N brothers and she also has M sisters. How many sisters does Alice’s brothers have?" } ] }]' \
    --region us-east-1

実行結果は以下の通りで trace キーが追加されており、その中に選択されたモデルが記載されています。

{
    "output": {
        "message": {
            "role": "assistant",
            "content": [
                {
                    "text": "To solve this problem, let's think through it step by step:\n\n1. Alice has N brothers.\n2. Alice has M sisters.\n3. We need to find out how many sisters Alice's brothers have.\n\nThe key point to realize is that Alice's brothers have the same sisters as Alice does, except for Alice herself.\n\nSo:\n* Alice's brothers have all of Alice's sisters (M)\n* Plus Alice herself (1)\n* The total number of sisters that Alice's brothers have is M + 1\n\nTherefore, Alice's brothers have M + 1 sisters.\n\nThis answer is true regardless of the specific values of N and M. The number of brothers (N) doesn't affect the answer in this case."
                }
            ]
        }
    },
    "stopReason": "end_turn",
    "usage": {
        "inputTokens": 27,
        "outputTokens": 160,
        "totalTokens": 187
    },
    "metrics": {
        "latencyMs": 4504
    },
    "trace": {
        "promptRouter": {
            "invokedModelId": "arn:aws:bedrock:us-east-1:123456789012:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
        }
    }
}

AWS SDK

Python(boto3) から呼び出してみます。

import json
import boto3

bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1",
)

MODEL_ID = "arn:aws:bedrock:us-east-1:123456789012:default-prompt-router/meta.llama:1"

user_message = "Describe the purpose of a 'hello world' program in one line."
messages = [
    {
        "role": "user",
        "content": [{"text": user_message}],
    }
]

streaming_response = bedrock_runtime.converse_stream(
    modelId=MODEL_ID,
    messages=messages,
)

for chunk in streaming_response["stream"]:
    if "contentBlockDelta" in chunk:
        text = chunk["contentBlockDelta"]["delta"]["text"]
        print(text, end="")
    if "messageStop" in chunk:
        print()
    if "metadata" in chunk:
        if "trace" in chunk["metadata"]:
            print(json.dumps(chunk['metadata']['trace'], indent=2))

実行結果は以下の通りです。

$ python prompt_router.py


A "Hello, World!" program is a traditional computer program used to demonstrate the most basic working code in a particular programming language, ensuring the development environment and compiler or interpreter work correctly.
{
  "promptRouter": {
    "invokedModelId": "arn:aws:bedrock:us-east-1:123456789012:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
  }
}
$