AWSではマネージドな機械学習サービスが様々提供されています。

ComprehendやTranscribe等で、ユーザーがInputを与え簡単に機械学習で予測することはもちろん、SageMakerで

しかし、私のような機械学習初心者からすると、いきなりSageMakerを使うのはハードルが高く、ComprehendやTranscribeでできる範囲で予測を行うにとどまってしまいます。

ということで、AWSのサービスを使いつつ、ざっくり機械学習についてできることを増やしていこうというのがこの記事の目的です。

Comprehend Custom<とは？

Amazon Comprehendは自然言語処理のマネージドサービスです。文章を与えればエンティティ分析・感情分析・キーフレーズ分析等々が簡単にできます。

Amazon ComprehendにはComprehend Customとおいう機能もあります。こちらは、ユーザーが学習データを用意するだけで簡単にモデルが作成できるというものです。

機械学習の知識がなくとも簡単に利用できるので、機械学習初心者ユーザーにとってはとてもありがたい機能です。

Comprehend Customでは2018/04/02時点でCustom ClassificationとCustom Entity Recognitionが利用できます。

Custom Classificationは、独自のデータでトレーニングしたモデルで文章の分類を行えます。Custom Entity Recognitionも同様に、独自のデータでトレーニングしたモデルでエンティティ分析ができます。

実際に使ってみる

今回は学習データの用意が簡単なCustom Classificationを利用します。

学習データ用意

Custom Classificationのドキュメントに学習データについて記載があります。各カテゴリにつき1000の文章があることが推奨されるそうです。今回はでお試しなのでEC2, IAM, Auroraのドキュメントから、それぞれ50の文章を抜粋して、以下のようなCSVファイルを作成し学習データとします（"," や "." は抜いてあります


EC2,Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the Amazon Web Services (AWS) cloud Using Amazon EC2 eliminates your need to invest in hardware up front so you can develop and deploy applications faster You can use Amazon EC2 to launch as many or as few virtual servers as you need configure security and networking and manage storage Amazon EC2 enables you to scale up or down to handle changes in requirements or spikes in popularity reducing your need to forecast traffic
EC2,An instance is a virtual server in the cloud It's configuration at launch is a copy of the AMI that you specified when you launched the instance
EC2,You can launch different types of instances from a single AMI An instance type essentially determines the hardware of the host computer used for your instance Each instance type offers different compute and memory capabilities

あとはS3バケットへ保存すればOKです。

Comprehend Customでトレーニング

今回はComprehendのコンソールからトレーニングを実行します。

「Cutom Classification」の「Train Classifier」をクリック。

Nameを入力、言語を選択します。今回は英語です。

学習データの場所を指定します。

今回Comprehend Customを利用するのは初めてなので、IAMロールは新規作成します。

これで設定は終了。あとはステータスが「Trained」になるを待つだけです。

予測してみる

それではトレーニングしたモデルで予測してみます。

Inputとするデータは以下のように作成し、S3バケットへ保存しました。（AWSサポートナレッジセンターにあるEC2の質問文です


How do I launch an EC2 instance from a custom Amazon Machine Image (AMI)?
How do I create an EBS snapshot based on my EBS volume?
My Spot instance was terminated Can I recover it?

「Create Job」をクリック。

JobのNameを入力。Analysis typeとSelect classfierはすでに選択されています。 Inputの場所を指定します。

Outputの場所を指定します。IAMロールをしていします。

ジョブ作成はこれで完了。あとは予測結果が出るのを待ちます。StatusがCompletedとなればOKです。

結果

結果は以下のようになりました。以下は上記で記載したInputの1行目の結果です。


{
    "File": "input.txt", 
    "Line": "0", 
    "Classes": [{
        "Name": "EC2", 
        "Score": 0.3337
    }, 
    {
        "Name": "Aurora", 
        "Score": 0.3332
    }, 
    {
        "Name": "IAM", 
        "Score": 0.3331
    }
    ]
}

.......

学習データが各カテゴリで50個ずつしかなかったのと、文章が短かったこともあり、まったく精度が出ていません...