2024-11-13

windows11のiisによるperl cgiの動作

strawberry perl for win のインストール

詳細は記載しませんが、ver.5.40をインストール

「Windowsの機能」でIISとCGIを有効化

IISマネージャーからハンドラーマッピングを追加

perl cgiのsample script

#!c:/Strawberry/perl/bin/perl
use strict;
use warnings;
use CGI;

my $cgi = new CGI;
print $cgi->header;
print $cgi->start_html("hello world");
print "hello world\n";
print $cgi->end_html;

2024-10-12

elasticsearch に analysis-sudachi プラグイン + ユーザ辞書を導入し、日本語の全文検索

先日のkuromojiではユーザ辞書の登録方法が不明でしたので、今回は sudachi。

それっぽい日本語全文検索はできましたが、期待した分かち書きにはならず、「ホントにユーザ辞書が機能しているの?」という感じでした。

参考url

https://github.com/WorksApplications/elasticsearch-sudachi/tree/develop/docs

install elasticsearch 8.13.4 と、初回機能によるrootパスワード発行

2024/10時点の最新 analysis-sudachiは ver.3.2.2 で、対応する elasticsearchが ver.8.4.3 でしたので、これをインストール

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.13.4-linux-x86_64.tar.gz
$ tar -xzf elasticsearch-8.13.4-linux-x86_64.tar.gz
$ cd elasticsearch-8.13.4/
$ bin/elasticsearch
【略】
 Elasticsearch security features have been automatically configured!
 Authentication is enabled and cluster connections are encrypted.

  Password for the elastic user (reset with `bin/elasticsearch-reset-password -u elastic`):
  O_j2Ga64xnJNjwTYzOZO

install analysis-sudachiプラグイン ver.3.2.2

$ cd elasticsearch-8.13.4/
$ bin/elasticsearch-plugin install \
  https://github.com/WorksApplications/elasticsearch-sudachi/releases/download/v3.2.2/elasticsearch-8.13.4-analysis-sudachi-3.2.2.zip
-> Installing https://github.com/WorksApplications/elasticsearch-sudachi/releases/download/v3.2.2/elasticsearch-8.13.4-analysis-sudachi-3.2.2.zip
-> Downloading https://github.com/WorksApplications/elasticsearch-sudachi/releases/download/v3.2.2/elasticsearch-8.13.4-analysis-sudachi-3.2.2.zip
[=================================================] 100%?? 
-> Installed analysis-sudachi
-> Please restart Elasticsearch to activate any plugins installed

システム辞書と、ユーザ辞書の準備

システム辞書は、sudachi dictのgithub?からダウンロード

$ wget https://d2ej7fkh96fzlu.cloudfront.net/sudachidict/sudachi-dictionary-latest-core.zip
$ unzip sudachi-dictionary-latest-core.zip
$ mkdir config/sudachi
$ cp sudachi-dictionary-*/system_core.dic config/sudachi/system_core.dic

ユーザ辞書は、以前、sudachipyで作成したものを config/sudachi/ へコピー

https://end0tknr.hateblo.jp/entry/20240409/1712611803

もし、java版sudachiでユーザ辞書csvからユーザ辞書をbuildする場合、以下

$ java -Dfile.encoding=UTF-8 -cp sudachi-*.jar \
    com.worksap.nlp.sudachi.dictionary.UserDictionaryBuilder \
    -o user.dic -s system_core.dic \
    -d 'my first user dictionary' \
    user.dic.csv

user.dic.csv    ....... Done! (1256 entries, 0.067 sec)
POS table               Done! (2 bytes, 0.002 sec)
WordId table    ....... Done! (6279 bytes, 0.002 sec)
double array Trie       Done! (46084 bytes, 0.022 sec)
word parameters         Done! (7530 bytes, 0.000 sec)
word entries            Done! (40914 bytes, 0.010 sec)
WordInfo offsets        Done! (5020 bytes, 0.000 sec)

index作成と、辞書設定

elasticsearch-8.13.4/sudachi.json を作成

{"settings": { "index": { "analysis": {
    "tokenizer": { "sudachi_tokenizer": {
        "type": "sudachi_tokenizer",
        "additional_settings":
           "{\"systemDict\":\"system_core.dic\",\"userDict\":[\"user.dic\"]}"
    } },
    "analyzer": {
      "sudachi_analyzer": {"type": "custom",
                           "tokenizer": "sudachi_tokenizer"}
    }
} } } }

index作成と、辞書設定

$ curl --cacert config/certs/http_ca.crt -u elastic \
  -H "Content-Type: application/json" -X PUT \
  http://localhost:9200/test_index \
   --data-binary @sudachi.json

{"acknowledged":true,"shards_acknowledged":true,"index":"test_index"}

index作成と、辞書設定の確認

$ curl --cacert config/certs/http_ca.crt -u elastic \
  http://localhost:9200/test_index/_settings

{"test_index": {
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "include": {"_tier_preference": "data_content"}
          }
        },
        "number_of_shards": "1",
        "provided_name": "test_index",
        "creation_date": "1728588381818",
        "analysis": {
          "analyzer": {
            "sudachi_analyzer": {"type": "custom",
                             "tokenizer": "sudachi_tokenizer"
            }
          },
          "tokenizer": {
            "sudachi_tokenizer": {
              "type": "sudachi_tokenizer",
              "additional_settings":
          "{\"systemDict\":\"system_core.dic\",\"userDict\":[\"user.dic\"]}"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "qYb7XeDWSIaUOQidjbMxTg",
        "version": { "created": "8503000" }
      }
    }
  }
}

分かち書きのテスト

ユーザ辞書には「快感A」を「快感エアリー」で登録したので、以下も「快感エアリー」と返して欲しかったのですが、対処方法が不明の為、今回はここまで

$ curl --cacert config/certs/http_ca.crt -u elastic \
 "localhost:9200/test_index/_analyze?pretty" \
 -H 'Content-Type: application/json' \
 -d'{"tokenizer":"sudachi_tokenizer", "text" : "快感A"}'

{
  "tokens" : [
    {
      "token" : "快感",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "A",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "word",
      "position" : 1
    }
  ]
}

2024-10-10

elasticsearch に analysis-kuromoji プラグインを導入し、日本語の全文検索

最近ではsudachiの方が利用されているかもしれませんが、インターネットで検索した範囲では elasticsearch はkuromojiの情報が多いので、今回は kuromoji

参考url

https://qiita.com/mserizawa/items/8335d39cacb87f12b678

install analysis-kuromoji

$ bin/elasticsearch-plugin install analysis-kuromoji
-> Installing analysis-kuromoji
-> Downloading analysis-kuromoji from elastic
[=================================================] 100%?? 
-> Installed analysis-kuromoji
-> Please restart Elasticsearch to activate any plugins installed

↑インストール後、↓その結果を確認

$ bin/elasticsearch-plugin list
analysis-kuromoji

$ curl --cacert config/certs/http_ca.crt -u elastic \
  https://localhost:9200/_nodes/plugins?pretty
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "nodes" : {
【略】  
      "plugins" : [
        {
          "name" : "analysis-kuromoji",
          "version" : "8.15.2",
          "elasticsearch_version" : "8.15.2",
          "java_version" : "17",
          "description" : "The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.",
          "classname" : "org.elasticsearch.plugin.analysis.kuromoji.AnalysisKuromojiPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false,
          "licensed" : false,
          "is_official" : true
        }
【略】  
}

一旦、indexを閉じ、kuromoji をデフォルトのトークナイザに設定

$ curl --cacert config/certs/http_ca.crt -u elastic \
  -XPOST http://localhost:9200/test_index/_close
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB

$ curl --cacert config/certs/http_ca.crt -u elastic \
  -H "Content-Type: application/json" -X PUT \
  "https://localhost:9200/_all/_settings?preserve_existing=true" \
  -d '{"index.analysis.analyzer.default.tokenizer": "kuromoji_tokenizer",
       "index.analysis.analyzer.default.type"     : "custom"}'
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB

$ curl --cacert config/certs/http_ca.crt -u elastic \
  -XPOST http://localhost:9200/test_index/_open
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB

日本語データの投入

$ vi wine.json

{ "index" : {} }
{ "name": "カベルネ・ソーヴィニヨン", "description": "カベルネ・ソーヴィニヨン (Cabernet Sauvignon) は、世界的に最も有名な赤ワイン用の代表ワイン用品種の1つである。単に「カベルネ」(Cabernet) とも呼ばれることが多い。フランスではメドック地区に代表されるようにボルドーの最も重要な品種の一つであり、世界各地でも栽培されているが、比較的温暖な気候を好む。ソーヴィニヨン・ブランとカベルネ・フランの自然交配によって誕生したといわれている。 果皮のタンニン分が多く、強い渋味のある濃厚なワインとなる。雑味が多く、比較的長期の熟成を必要とする。強過ぎる渋味を緩和すべく、メルロー等の他の品種との混醸や混和も少なくない。歴史的には「ヴィドゥーレ」「ヴェデーレ」（「硬い」の意）とも呼ばれた。ソーヴィニヨン・ブラン同様メトキシピラジン(Methoxypyrazine)に由来するアロマがある。"}
{ "index" : {} }
{ "name": "メルロー", "description": "メルロー (Merlot) は、赤ワイン用の品種の中では最大の作地面積をもつ。とくにフランスのボルドーや、それを真似た「ボルドー・ブレンド」において非常に重要であり、カベルネ・ソーヴィニヨンとブレンドされることもある。カベルネ・ソーヴィニヨンに比し爽やかで、軽口である。また、ボルドーのサンテミリオン(Saint-Emilion)やポムロール(Pomerol)といった地区では、カベルネ・ソーヴィニヨンよりも多く配合され、とくにポムロール地区の「シャトー・ペトリュス」は、しばしばこの品種単独で造られる。日本でも長野県の塩尻市桔梗ヶ原地区などで栽培されている。土壌の塩分に弱い。"}
{ "index" : {} }
{ "name": "ピノ・ノワール", "description": "ピノ・ノワール (Pinot Noir) は、フランスのブルゴーニュ地方を原産とする世界的な品種で、紫色を帯びた青色の果皮を持つ。冷涼な気候を好み、特に温暖な気候では色やフレーバーが安定しないので栽培は難しい。イタリアでは「ピノ・ネロ」(Pinot Nero)、ドイツでは「シュペートブルグンダー」(Spätburgunder)の名がある。遺伝子的に不安定で変異種が少なくない。この中には、緑みを帯びた黄色の果皮を持つピノ・ブラン(Pinot Blanc)や褐色のピノ・グリ(Pinot Gris)などがあり、時には同じ樹に異なった色の果実がなるともいわれている。フランス以外では最近ニュージーランドでの栽培が盛んで、寒冷地を中心に栽培される。ワインはライトボディで、弱めの渋味、繊細なアロマとフレーバーが特徴である。シャンパンにも欠かせない品種である。"}
{ "index" : {} }
{ "name": "シラー", "description": "シラー(Syrah)は「シラーズ」(Shiraz)とも呼ばれる赤ワイン用の代表的な品種の1つである。シラーズはイランの都市名であるが、フランス・ローヌ地方が起源とされる。ローヌ地方の代表的な品種である他、オーストラリアでは最も重要な品種である。南アフリカ、チリなどでも栽培されている。ワインはフルボディで香味が強く、カベルネ・ソーヴィニヨンに比べタンニンが「新鮮」なのが特徴である。他の品種との混醸や混和も見られる。栽培される気候や風土によって味が異なる。ローヌ渓谷北部のコート・ロティやエルミタージュ、オーストラリア産が有名。果実は熟するとしなびやすい。"}
{ "index" : {} }
{ "name": "サンジョヴェーゼ", "description": "サンジョヴェーゼ (Sangiovese) は、イタリアで最も栽培面積の多い赤ワイン用の品種である。果皮の色の違いを含め数多くの亜種を持つ。中央イタリアのトスカーナ州が主産地で、イタリアで最も有名な一つである「キャンティ(Chianti)をはじめ、「ブルネッロ・ディ・モンタルチーノ」(Brunello di Montalcino) や「ヴィーノ・ノービレ・ディ・モンテプルチアーノ」(Vino Nobile di Montepulciano) 、「モレッリーノ・ディ・スカンサーノ」(Morellino di Scansano)などが生産される。コルシカ島では、「ニエルッキオ」(Nielluccio)として知られる。"}

$ curl --cacert config/certs/http_ca.crt -u elastic \
  -H "Content-Type: application/json" \
  -X POST http://localhost:9200/test_index/_bulk --data-binary @wine.json
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB

{"errors":false,"took":200,
"items":[
  {"index":{"_index":"test_index","_id":"rb8gdZIBNDBl3Nb_TkmR","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":8,"status":201}},
  {"index":{"_index":"test_index","_id":"rr8gdZIBNDBl3Nb_TkmU","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":2,"_primary_term":8,"status":201}},
  {"index":{"_index":"test_index","_id":"r78gdZIBNDBl3Nb_TkmU","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":3,"_primary_term":8,"status":201}},
  {"index":{"_index":"test_index","_id":"sL8gdZIBNDBl3Nb_TkmU","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":4,"_primary_term":8,"status":201}},
  {"index":{"_index":"test_index","_id":"sb8gdZIBNDBl3Nb_TkmU","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":5,"_primary_term":8,"status":201}} ] }

日本語での検索test

$ curl --cacert config/certs/http_ca.crt -u elastic \
  -H "Content-Type: application/json" \
  http://localhost:9200/test_index/_search -d '{"query":{"match":{"description":"渋め"}}}'
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB


{"took":162,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":2,"relation":"eq"},"max_score":1.2804732,"hits":[{"_index":"test_index","_id":"rb8gdZIBNDBl3Nb_TkmR","_score":1.2804732,"_ignored":["description.keyword"],"_source":{ "name": "カベルネ・ソーヴィニヨン", "description": "カベルネ・ソーヴィニヨン (Cabernet Sauvignon) は、世界的に最も有名な赤ワイン用の代表ワイン用品種の1つである。単に「カベルネ」(Cabernet) とも呼ばれることが多い。フランスではメドック地区に代表されるようにボルドーの最も重要な品種の一つであり、世界各地でも栽培されているが、比較的温暖な気候を好む。ソーヴィニヨン・ブランとカベルネ・フランの自然交配によって誕生したといわれている。 果皮のタンニン分が多く、強い渋味のある濃厚なワインとなる
。雑味が多く、比較的長期の熟成を必要とする。強過ぎる渋味を緩和すべく、メルロー等の他の品種との混醸や混和も少なくない。歴史的には「ヴィドゥーレ」「ヴェデーレ」（「硬い」の意）とも呼ばれた。ソーヴィニヨン・ブラン同様メトキシピラジン(Methoxypyrazine)に由来するアロマがある。"}},{"_index":"test_index","_id":"r78gdZIBNDBl3Nb_TkmU","_score":0.84890795,"_ignored":["description.keyword"],"_source":{ "name": "ピノ・ノワール", "description": "ピノ・ノワール (Pinot Noir) は、フランスのブルゴーニュ地方を原産とする世界的な品種で、紫色を帯びた青色の果皮を持つ。冷涼な気候を好み、特に温暖な気候では色やフレーバーが安定しないので栽培は難しい。イタリアでは「ピノ・ネロ」(Pinot Nero)、ドイツでは「シュペートブルグンダー」(Spätburgunder)の名がある。遺伝子的に不安定で変異種が少なく
ない。この中には、緑みを帯びた黄色の果皮を持つピノ・ブラン(Pinot Blanc)や褐色のピノ・グリ(Pinot Gris)などがあり、時には同じ樹に異なった色の果実がなるともいわれている。フランス以外では最近ニュ
ージーランドでの栽培が盛んで、寒冷地を中心に栽培される。ワインはライトボディで、弱めの渋味、繊細なアロマとフレーバーが特徴である。シャンパンにも欠かせない品種である。"}}]}}

形態素解析のtest

$ curl --cacert config/certs/http_ca.crt -u elastic \
  -H "Content-Type: application/json" \
  -X POST "http://localhost:9200/test_index/_analyze" -d '{"text":"渋め"}'
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB


{"tokens":[{"token":"渋","start_offset":0,"end_offset":1,"type":"word","position":0},{"token":"
め","start_offset":1,"end_offset":2,"type":"word","position":1}]}

2024-10-10

elasticsearchのindexをcloseするには、elasticsearch.yml で xpack.security.enabled: false ?

elasticsearchのあるindexをcloseしようと、curlを実行したところ「curl: (52) Empty reply from server」と表示

$ curl --cacert config/certs/http_ca.crt -u elastic
  -XPOST http://localhost:9200/test_index/_close

curl: (52) Empty reply from server

また、 elasticsearch_server.json には以下のようなログ

{"@timestamp":"2024-10-09T22:49:47.010Z", "log.level": "WARN",
"message":"received plaintext http traffic on an https channel,
           closing connection Netty4HttpChannel{
          localAddress=/127.0.0.1:9200, remoteAddress=/127.0.0.1:42808}",
 "ecs.version": "1.2.0", "service.name":"ES_ECS","event.dataset":"elasticsearch.server",
 "process.thread.name":"elasticsearch[a64][transport_worker][T#20]",
 "log.logger":"org.elasticsearch.http.netty4.Netty4HttpServerTransport",
 "elasticsearch.cluster.uuid":"_1Zg8GmwSdS8pk94jvNoPQ",
 "elasticsearch.node.id":"F5B59GreTvaM_Jz4P2wJFQ",
 "elasticsearch.node.name":"a64",
 "elasticsearch.cluster.name":"elasticsearch"}

elasticsearch.yml を以下のように変更することで解消しましたが、 elasticsearch.yml や xpack.security.enabled の意味は理解できていません

old) xpack.security.enabled: true
new) xpack.security.enabled: false

2024-10-10

install elasticsearch 8.15.2 from tar.gz to oracle linxu 8.7

全文検索のfessは触ったことがありますが、 fessが内部で使用する elasticsearch はありませんでしたので、hands-on

今回のポイントは

インストール自体は、tar.gz のダウングレードと解凍のみ
設定やindex作成等、多くの作業はrest api(curlコマンド)で
日本語やクラスタの設定は次回以降

参考url

install

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.15.2-linux-x86_64.tar.gz
$ tar -xzf elasticsearch-8.15.2-linux-x86_64.tar.gz
$ cd elasticsearch-8.15.2/

初回起動と、rootパスワード確認

ここで発行されたパスワードは、この後のcurlコマンド実行時に使用します

$ bin/elasticsearch
Oct 09, 2024 6:28:03 PM sun.util.locale.provider.LocaleProviderAdapter <clinit>
WARNING: COMPAT locale provider will be removed in a future release
【略】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Elasticsearch security features have been automatically configured!
 Authentication is enabled and cluster connections are encrypted.

  Password for the elastic user (reset with `bin/elasticsearch-reset-password -u elastic`):
  mNXX=JAaEr+gVO5zDQhB ★
【略】

上記の bin/elasticsearch を起動したまま、以下のcurlコマンドで起動結果確認

$ curl --cacert config/certs/http_ca.crt -u elastic https://localhost:9200
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB
{
  "name" : "a64",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "_1Zg8GmwSdS8pk94jvNoPQ",
  "version" : {
    "number" : "8.15.2",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "98adf7bf6bb69b66ab95b761c9e5aadb0bb059a3",
    "build_date" : "2024-09-19T10:06:03.564235954Z",
    "build_snapshot" : false,
    "lucene_version" : "9.11.1",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

index作成と作成結果の確認 (今回のindex名は test_index )

index作成

$ curl --cacert config/certs/http_ca.crt -u elastic  -X PUT https://localhost:9200/test_index
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB
{"acknowledged":true,"shards_acknowledged":true,"index":"test_index"}

作成結果の確認

$ curl --cacert config/certs/http_ca.crt -u elastic https://localhost:9200/_aliases?pretty
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB
{
  ".security-7" : {
    "aliases" : {
      ".security" : {
        "is_hidden" : true
      }
    }
  },
  "test_index" : {
    "aliases" : { }
  }
}

$ curl --cacert config/certs/http_ca.crt -u elastic https://localhost:9200/_settings?pretty
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB
{
  "test_index" : {
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "test_index",
        "creation_date" : "1728501776101",
        "number_of_replicas" : "1",
        "uuid" : "274wN-66TZCwly5cWHO6ZQ",
        "version" : {
          "created" : "8512000"
        }
      }
    }
  }
}

検索データの登録による Mapping 作成

Mapping は Index の構造を定義するもので、データを投入すると自動で Mapping 定義されるが、手動定義することも可能らしい

$ curl --cacert config/certs/http_ca.crt -u elastic \
  -H "Content-Type: application/json" -X PUT \
  https://localhost:9200/test_index/_doc/001 -d '{
    "subject" : "Test Post No.1",
    "description" : "This is the initial post",
    "content" : "This is the test message for using Elasticsearch."
}'
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB

{"_index":"test_index",
 "_id":"001",
 "_version":1,
 "result":"created",
 "_shards":{"total":2, "successful":1,"failed":0},
 "_seq_no":0,
 "_primary_term":1}

↑検索データの登録による Mapping 作成　↓作成結果確認

$ curl --cacert config/certs/http_ca.crt -u elastic \
  https://localhost:9200/_mapping/?pretty
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB
{ "test_index" : {
    "mappings" : {
      "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "description" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "subject" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

登録した検索データの確認

$ curl --cacert config/certs/http_ca.crt -u elastic \
  https://localhost:9200/test_index/_doc/001?pretty
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB

{
  "_index" : "test_index",
  "_id" : "001",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "subject" : "Test Post No.1",
    "description" : "This is the initial post",
    "content" : "This is the test message for using Elasticsearch."
  }
}

検索テスト (例: 検索条件は [description] に [initial] 含む)

$ curl --cacert config/certs/http_ca.crt -u elastic \
  "https://localhost:9200/test_index/_search?q=description:initial&pretty=true"
Enter host password for user 'elastic': mNXX=JAaEr+gVO5zDQhB

{
  "took" : 58,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "test_index",
        "_id" : "001",
        "_score" : 0.2876821,
        "_source" : {
          "subject" : "Test Post No.1",
          "description" : "This is the initial post",
          "content" : "This is the test message for using Elasticsearch."
        }
      }
    ]
  }
}

2024-09-28

ldapwhoami コマンドによる ldapユーザのパスワード検証

ldapsearch コマンドによるldapでのユーザ検索は、以下

$ ldapsearch -h ldap.mile.sexy.co.jp \
   -b "ou=people,o=sexy-group" "(uid=end0tknr)"

※ ldapsearchからの返値の日本語は、base64されています

ldapユーザのpassword検証を行う場合、ldapwhoami コマンドで以下

$ ldapwhoami -h ldap.mile.sexy.co.jp -x \
  -D "uid=end0tknr,ou=people,o=sexy-group" -w $パスワード

成功すると、「uid=end0tknr,ou=people,o=sexy-group」のような文字列が返ります

2024-09-24

networkx for python のお試し

networkxを初めて使用しますので、グラフデータ作成やその描画

# -*- coding: utf-8 -*-
import jaconv
import matplotlib.pyplot as plt
import networkx as nx
import sqlite3

#探索するルートノード
root_nodes = ["2C0？？？","2C1Q1？？？CIZW","2C8？？？45JBZL"]

db_path = 'bom.sqlite'
font    = "MS Gothic" # 日本語を表示する為

def main():
    # DB(sqlite)接続
    db_conn = sqlite3.connect( db_path )
    db_conn.row_factory = dict_factory
    db_cur  = db_conn.cursor()
    
    # ルートノード
    org_mims = root_nodes

    for org_mim in org_mims:
        root_node = select_mim_def(db_cur, org_mim)
        if root_node == None:
            print("fail select_mim() %s" % (org_mim))
            continue

        bom = nx.Graph()
        bom.add_node(1,**root_node) # ルートノード追加

        # 以下のように「data=True」とすることで属性も取得
        last_node = list( bom.nodes(data=True) )[-1]
        expand_bom(db_cur,bom, last_node)       # BOM展開
        draw_bom( bom )                         # BOM描画
    db_cur.close()
    db_conn.close()

def draw_bom( bom ):
    pos = nx.spring_layout(bom, k=0.3)
    # ノードと、そのラベルの表示
    nx.draw_networkx_nodes(bom,
                           pos,
                           node_color='#87cefa',
                           node_size=20)

    node_labels = nx.get_node_attributes(bom, 'label')
    nx.draw_networkx_labels(bom, pos, labels=node_labels,
                            font_size  =10,
                            font_family=font)
    # エッジと、そのラベルの表示
    nx.draw_networkx_edges(bom, pos,
                           arrows=True,
                           arrowstyle='->',
                           edge_color='#aaa')
    edge_labels = nx.get_edge_attributes(bom, 'label')
    
    nx.draw_networkx_edge_labels(bom, pos,
                                 edge_labels=edge_labels,
                                 font_family=font)
    plt.axis('off')
    plt.show()
    
def expand_bom(db_cur, bom, parent_node):

    for child_attr in select_children(db_cur,parent_node):
        child_id = list( bom.nodes )[-1] + 1
        child_attr["label"] = child_attr["code"]+"\n"+child_attr["name"]
        child_attr["label"] = child_attr["label"].replace(" ","\n")
        child_attr["label"] = jaconv.z2h(child_attr["label"],
                                         kana=True,
                                         digit=True,
                                         ascii=True )
        # add_node()やadd_edge()する際のIDはunique
        bom.add_node(child_id,**child_attr)
        bom.add_edge(parent_node[0],
                     child_id,
                     label =child_attr["use_amount"],
                     weight=child_attr["use_amount"] )
        child_node = list( bom.nodes(data=True) )[-1]
        expand_bom(db_cur, bom, child_node)

        
def select_children( db_cur, parent_node ):
    ret_datas = []
    ret_datas += select_mim_children(db_cur, parent_node)
    ret_datas += select_im_children(db_cur, parent_node)
    return ret_datas
    
# sql selectした結果をdict型で取得する為
def dict_factory(cursor, row):
   d = {}
   for idx, col in enumerate(cursor.description):
       d[col[0]] = row[idx]
   return d

def select_im_children(db_cur,parent_node):
    
    sql = """
SELECT mps.m_p_child as code,
       im.name_kanji as name,
       mps.use_amount
FROM mps
JOIN im ON (mps.m_p_child=im.part_num)
WHERE m_p_child_flag='P' AND m_p_parent=?
"""
    sql_vals = (parent_node[1]["code"],)
    db_cur.execute( sql, sql_vals )

    ret_datas = []
    for ret_data in db_cur.fetchall():
        ret_data["type"] = "im"
        ret_datas.append( ret_data )
    return ret_datas

def select_mim_children(db_cur,parent_node):
    
    sql = """
SELECT mps.m_p_child  as code,
       mim.name_kanji as name,
       mps.use_amount
FROM mps
JOIN mim ON (mps.m_p_child=mim.menu_code)
WHERE m_p_child_flag='M' AND m_p_parent=?
"""
    sql_vals = (parent_node[1]["code"],)
    db_cur.execute( sql, sql_vals )

    ret_datas = []
    for ret_data in db_cur.fetchall():
        ret_data["type"] = "mim"
        ret_datas.append( ret_data )
    return ret_datas

def select_mim_def(db_cur, menu_code):
    sql = """
SELECT menu_code as code, name_kanji as name
FROM mim
WHERE menu_code=?
"""
    sql_vals = (menu_code,)
    db_cur.execute( sql, sql_vals )
    ret_data = db_cur.fetchone()
    if not ret_data:
        return
        
    ret_data["type"] = "mim"

    ret_data["label"] = ret_data["code"]+" "+ret_data["name"]
    ret_data["label"] = ret_data["label"].replace(" ","\n")
    ret_data["label"] = jaconv.z2h(ret_data["label"],
                                   kana=True,
                                   digit=True,
                                   ascii=True )
    return ret_data

if __name__ == '__main__':
    main()

2024-09-18

kuromoji.js + SudachiDict 環境でのユーザ辞書作成

以前、sudachiのユーザ辞書を作成しましたが、今回は、同じユーザ辞書を用い、ブラウザ環境で形態素解析を行います。

参考url

事前準備 1 - node.js v10.16.3 のインストール

https://github.com/coreybutler/nvm-windows/releases/download/1.1.12/nvm-setup.exe

nvm for winのインストーラが上記urlにありますので、インストール後、以下のコマンドを実行

DOS> nvm --version
1.1.12

DOS> nvm install v10.16.3
DOS> nvm use 10.16.3
Now using node v10.16.3 (64-bit)
DOS> node --version
v10.16.3

事前準備 2 - kuromoji.js のインストール

DOS> npm install kuromoji
＜略＞
+ kuromoji@0.1.2

DOS> cd node_modules/kuromoji
DOS> npm install

上記のようにインストールすると、以下のように辞書ファイルを作成できます。

DOS> npm run build-dict
DOS> dir dict
 ドライブ C のボリューム ラベルがありません。
 ボリューム シリアル番号は 00DD-2D3B です

2024/09/18  09:09    <DIR>          .
2024/09/18  09:05    <DIR>          ..
2024/09/18  09:08        55,873,342 base.dat.gz
2024/09/18  09:08        25,614,933 cc.dat.gz
2024/09/18  09:08        48,926,448 check.dat.gz
2024/09/18  09:08        12,272,244 tid.dat.gz
2024/09/18  09:09        10,529,545 tid_map.dat.gz
2024/09/18  09:09        36,804,066 tid_pos.dat.gz
2024/09/18  09:09            10,491 unk.dat.gz
2024/09/18  09:09               320 unk_char.dat.gz
2024/09/18  09:09               338 unk_compat.dat.gz
2024/09/18  09:09             1,141 unk_invoke.dat.gz
2024/09/18  09:09             1,177 unk_map.dat.gz
2024/09/18  09:09            10,524 unk_pos.dat.gz

kuromoji.js + SudachiDict 環境でのユーザ辞書作成

作成の手順は、参考urlの通りですが、手間なので、以下のようにpython scriptにしてみました

#!python
# -*- coding: utf-8 -*-
""" https://qiita.com/piijey/items/2517af039bbedddec7b8
"""
from datetime import datetime
from pathlib import Path
import csv
import glob
import io
import logging.config
import os
import requests
import shutil
import subprocess
import sys
import urllib.request
import zipfile
import time

CONF = {
    "sudachi":{
        "dic_src_base_url" :
        "http://sudachi.s3-website-ap-northeast-1.amazonaws.com/sudachidict-raw",
        # refer to https://github.com/WorksApplications/SudachiDict/blob/develop/build.gradle
        "dic_src_paths" : ["matrix.def.zip",
                           "20240409/small_lex.zip",
                           "20240409/core_lex.zip",
                           #"20240409/notcore_lex.zip"
                           ],
        "dic_def_base_url" :
        "https://github.com/WorksApplications/Sudachi/raw/develop/src/main/resources",
        "dic_def_paths" : ["char.def", "unk.def" ],
        "usrdic_dir":"c:/Users/xcendou/local/FIND_ZUMEN2/sudachi"
    },
    "kuromoji":{
        "base_dir":"C:/Users/xcendou/local/FIND_ZUMEN2/kuromoji/node_modules/kuromoji",
        "build_cmds":["set NODE_OPTIONS=--max-old-space-size=4096",
                      "npm run build-dict"],
        "ipadic_src_dir":
        "c:/Users/xcendou/local/FIND_ZUMEN2/kuromoji/"+
        "node_modules/kuromoji/node_modules/mecab-ipadic-seed/lib/dict",
        "backup_dir":
        "c:/Users/xcendou/local/FIND_ZUMEN2/kuromoji/"+
        "node_modules/kuromoji/node_modules/mecab-ipadic-seed/lib/dict_bak",
    },
    "log":{
        'version': 1,
        'loggers': {"mainLogger":
                    {'level':"INFO",'handlers':["mainHandler"]},
                    },
        'handlers': { "mainHandler": {
            'formatter': "mainFormatter",
            'class'    : 'logging.handlers.RotatingFileHandler',
            'filename' : os.path.splitext(os.path.basename(__file__))[0] + \
            "_"+datetime.now().strftime("%m%d")+ ".log",
            'encoding' : 'utf-8',
            'maxBytes' : 1024*1024*10, # MB
            'backupCount': 30       # rotation
       }},
        'formatters': {  "mainFormatter":{
            "format":
            "%(asctime)s\t%(levelname)s\t%(filename)s"+
            "\tL%(lineno)d\t%(funcName)s\t%(message)s",
            "datefmt": '%Y/%m/%d %H:%M:%S'
        }},
    }
}

logging.config.dictConfig(CONF["log"])
logger = logging.getLogger('mainLogger')

def main():
    logger.info("START")
    
    init_dic_src_dir()
    download_sudachi_dic_src()
    conv_lex_csv_for_kuromoji()
    conv_sudashi_usrdic_for_kuromoji()
    build_kuromoji_dic()

def build_kuromoji_dic():

    org_dir = os.getcwd()
    os.chdir( CONF["kuromoji"]["base_dir"] )
    
    for cmd_str in CONF["kuromoji"]["build_cmds"]:
        exec_subprocess( cmd_str )
    os.chdir( org_dir )
    
def conv_sudashi_usrdic_for_kuromoji():
    for org_path in  glob.glob( CONF["sudachi"]["usrdic_dir"] + "/*.dic.csv"):
        org_rows = []
        with open(org_path, encoding='utf-8') as f:
            csvreader = csv.reader(f)
            org_rows = [row for row in csvreader]

        new_path = CONF["kuromoji"]["ipadic_src_dir"] + "/"+ os.path.basename(org_path)
        print( new_path )
        with open(new_path, "w", encoding='utf-8') as f:
            writer = csv.writer(f, lineterminator='\n')
            for org_row in org_rows:
                new_row = [org_row[0],
                           1285, 1285,  5402, "名詞","普通名詞","*","*","*","*",
                           org_row[12], "*","*"]
                writer.writerow( new_row )
    
def conv_lex_csv_for_kuromoji():
    for lex_path in  glob.glob( CONF["kuromoji"]["ipadic_src_dir"] + "/*_lex.csv"):
        org_rows = []
        with open(lex_path, encoding='utf-8') as f:
            csvreader = csv.reader(f)
            org_rows = [row for row in csvreader]
            
        with open(lex_path, "w", encoding='utf-8') as f:
            writer = csv.writer(f, lineterminator='\n')
            for org_row in org_rows:
                new_row = [
                    org_row[0], org_row[1], org_row[2], org_row[3], org_row[5],
                    org_row[6], org_row[7], org_row[9], org_row[10],"*",
                    org_row[12],org_row[11],"*"]
                writer.writerow( new_row )
                
def download_sudachi_dic_src():

    for path in CONF["sudachi"]["dic_src_paths"]:
        req_url = CONF["sudachi"]["dic_src_base_url"] +"/"+ path
        src_content = get_http_requests(req_url)
        
        zip = zipfile.ZipFile( io.BytesIO(src_content) )
        zip.extractall( CONF["kuromoji"]["ipadic_src_dir"] )
        
        
    for path in CONF["sudachi"]["dic_def_paths"]:
        req_url = CONF["sudachi"]["dic_def_base_url"] +"/"+ path
        src_content = get_http_requests(req_url)

        save_path = CONF["kuromoji"]["ipadic_src_dir"] + "/"+ path
        with open(save_path, 'w',encoding='utf-8') as f:
            f.write(src_content.decode("utf-8"))
                                  
def init_dic_src_dir():
    
    backup_dir = CONF["kuromoji"]["backup_dir"]
    if not os.path.isdir( backup_dir ):
        Path( backup_dir ).mkdir()
        
    local_src_dir = CONF["kuromoji"]["ipadic_src_dir"]
    if os.path.isdir( local_src_dir ):
        # 旧辞書srcがあればbackup
        if len( glob.glob(local_src_dir+'/**', recursive=True) ) > 0:
            bak_filename = ".".join([ os.path.split(local_src_dir)[-1],
                                      datetime.now().strftime("%m%d") ])
            bak_path = backup_dir +"/"+bak_filename
            shutil.make_archive(bak_path, format='zip', root_dir=local_src_dir)

        shutil.rmtree(local_src_dir)
    Path( local_src_dir ).mkdir()
    return local_src_dir

def get_http_requests(req_url):
    logger.info("START %s",req_url)
    
    i = 0
    while i < 3: # 最大3回 retry
        i += 1
        try: # 先方サーバによっては http response codeを返さない為、try-except
            res = requests.get(req_url, timeout=(5,60), stream=True,verify=False)
        except Exception as e:
            logger.warning(e)
            logger.warning("retry {} {}".format(i,req_url))
            time.sleep(10)

        if res.status_code == 404:
            logger.error( "404 error {}".format(req_url) )
            return

        try:
            res.raise_for_status()
        except Exception as e:
            logger.warning(e)
            logger.warning("retry {} {}".format(i,req_url))
            time.sleep(10)

    # 大容量の為か urllib.request.urlopen()では
    # response contentを取得できなかった為、stream=True で chunk化
    chunks = []
    for chunk in res.iter_content(chunk_size=1024*1024):
        chunks.append(chunk)
      
    content = b"".join(chunks)
    return content

def exec_subprocess(cmd:str, raise_error=True):
    child = subprocess.Popen( cmd,
                              shell=True,
                              stdout=subprocess.PIPE,
                              stderr=subprocess.PIPE )
    stdout, stderr = child.communicate()
    rt = child.returncode
    if rt != 0 and raise_error:
        print("ERROR",stderr,file=sys.stderr)
        return (None,None,None)

    return stdout, stderr, rt

if __name__ == "__main__":
    main()

2024-09-08

LOCAL LLM via emacs29 + ellama via ollama on win11 + intel core i7 + mem:64G + rtx4090

LOCAL LLM via ollama on win11 + intel core i7 + mem:64G - end0tknr's kipple - web写経開発

先程の上記entryの続きです。

CPUのみでのローカルLLM利用は難しいことが分かりましたので、 geforce rtx4090 のGPUを接続し emacs + ellama 経由でも試してみました。

参考url

-最強ローカルLLM実行環境としてのEmacs | 日々、とんは語る。 -GitHub - s-kostyaev/ellama: Ellama is a tool for interacting with large language models from Emacs.

ellamaの依存ライブラリ (ollama、emacs29)

ellamaはollamaを必要としますが、先程のentryの通り既にollamaはインストール済です。

emacsは、emacs28 for winを使用していましたが、「M-x ellama-summarize」等のellama実行時、次のようなエラーとなりましたので、emacs29を使用しています。

Symbol's function definition is void: setopt

installation

ellamaのdocumentにも記載されている通り、 emacsで、package-install を実行するのみです

M-x package-install ⏎
ellama ⏎

model download

ollamaでダウンロードした LOCAL LLM を利用できます。先程のentryの通り、既にダウンロード済ですので、省略します。

configuration ( .emacs.d/init.el )

参考urlにあった内容を init.el に追記しています

(with-eval-after-load #'llm
  (require #'llm-ollama)
  ;; ellama-translateで翻訳する言語
  (setq ellama-language "Japanese")
  ;; ellama-ask-selection などで生成されるファイルのネーミングルール
  (setq ellama-naming-scheme #'ellama-generate-name-by-llm)
  ;; デフォルトのプロバイダー
  (setq ellama-provider (make-llm-ollama
                           :chat-model "codestral:22b-v0.1-q4_K_S"
                           :embedding-model "codestral:22b-v0.1-q4_K_S"))
  ;; 翻訳で利用するプロバイダー
  (setq ellama-translation-provider (make-llm-ollama
                                       :chat-model "aya:35b-23-q4_K_S"
                                       :embedding-model "aya:35b-23-q4_K_S"))
  ;; ellamaで使えるプロバイダー。ellama-provider-select で選択できる
  (setq ellama-providers
          #'(("codestral" . (make-llm-ollama
                            :chat-model "codestral:22b-v0.1-q4_K_S"
                            :embedding-model "codestral:22b-v0.1-q4_K_S"))
            ("gemma2" . (make-llm-ollama
                            :chat-model "gemma2:27b-instruct-q4_K_S"
                            :embedding-model "gemma2:27b-instruct-q4_K_S"))
            ("command-r" . (make-llm-ollama
                            :chat-model "command-r:35b"
                            :embedding-model "command-r:35b"))
            ("llama3.1" . (make-llm-ollama
                                  :chat-model "llama3.1:8b"
                                  :embedding-model "llama3.1:8b"))
            )))

テスト

M-x ellama-make-table

CREATE TABLE product (
  id   int,
  name varchar(10),
  col  varchar(10) );

↑こう書いて、「M-x ellama-make-table」を実行すると、 ↓こう表示されます

Here's how you can represent your SQL CREATE TABLE statement as a
Markdown table:

| Column | Data Type    |
|--------|--------------|
| id     |int           |
| name   | varchar(10)  |
| col    | varchar(10)  |

This markdown table represents
the "product" table with its columns and their data types.

M-x ellama-make-format ⏎ json ⏎

CREATE TABLE product (
  id   int,
  name varchar(10),
  col  varchar(10) );

↑こう書いて、「M-x ellama-make-format ⏎ json ⏎」を実行すると、 ↓こう表示されます

{"table_name": "product",
 "columns": [
   { "name": "id",   "type": "int"},
   { "name": "name", "type": "varchar(10)" },
   { "name": "col",  "type": "varchar(10)" } ] }

M-x ellama-summarize 、M-x ellama-translate

CREATE TABLE product (
  id   int,
  name varchar(10),
  col  varchar(10) );

↑こう書いて、「M-x ellama-summarize」を実行すると、 ↓こう表示されます

The given text creates a table named "product" in a database.
This table has three columns: "id","name", and "col".
The "id" column is of integer type, while the "name" and "col" columns are of
variable character type, with maximum lengths of 10 characters each.

更に「M-x ellama-translate」を実行すると、 ↓こう表示されます

入力されたテキストは、
データベースに「product」という名前のテーブルを作成します。
このテーブルには、3つの列があります：「id」、「name」、および「col」。
「id」は整数型で、「name」と「col」は
それぞれ10文字以下の変長文字型の列です。

2024-09-07

LOCAL LLM via ollama on win11 + intel core i7 + mem:64G

ollama for win を用い、ローカルLLMを win11 + intel core i7 + mem:64G の環境で触ってみました。

(geforce rtx 4090は接続していません)

簡単な手順でLLMを始められる点ではすごいと思いましたが、短い英文の和訳で、30分程度を要しましたので、実用には難しいと感じました。

参考url

Step 1 - installation ollama

https://ollama.com/download/windows より、

OllamaSetup.exe のインストーラを実行するだけで、コマンドラインから利用できるようになります。

DOS> ollama --version
ollama version is 0.3.9

DOS> ollama --help
Large language model runner

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Use "ollama [command] --help" for more information about a command.

Step 2 - download Model

利用可能なモデルは https://ollama.com/library に記載があり、これらを「ollama pull」コマンドでダウンロードできます。

今回、5つのモデルをダウンロードしましたが、どれも容量が大きく、10分超の時間を要しました。

C:\Users\end0t>ollama pull codestral:22b-v0.1-q4_K_S
pulling 73d8b6a3770e... 100% ▕█████████▏  12 GB
＜略＞

C:\Users\end0t>ollama pull aya:35b-23-q4_K_S
pulling 0075a86dcf85... 100% ▕█████████▏  20 GB
＜略＞

C:\Users\end0t>ollama pull gemma2:27b-instruct-q4_K_S
pulling 0e0956415962... 100% ▕███████▏  15 GB
＜略＞

C:\Users\end0t>ollama pull command-r:35b
pulling 8e0609b8f0fe... 100% ▕███████▏  18 GB
＜略＞

C:\Users\end0t>ollama pull llama3.1:8b
pulling 8eeb52dfb3bb... 100% ▕███████▏ 4.7 GB
＜略＞

Step 3 - 翻訳テスト

試しに、emacs で表示されたメッセージの和訳を行いましたが、 GPUを接続していないとはいえ、英文2行の和訳に30分程度を要しました...

DOS> ollama run gemma2:27b-instruct-q4_K_S
>>> 次の英語を和訳してください。
... Install the Codeium extension in Emacs, and start seeing
...  suggestions as you write comments and code.

↑こう書くと、↓こう表示されます

EmacsにCodeium拡張機能をインストールすると、コメントやコ
ードを書いているときに提案が表示されるようになります。

>>> Send a message (/? for help)

2024-08-21

mount-s3 による aws s3 の ec2へのmount

aws s3 の ec2へのmountには、s3fsやgoofysを使用していましたが、 awsより公式の mount-s3 が公開されていたので、お試し。

直接httpを経由せず、aws s3を使用できる為、s3fsやgoofys同様、操作は楽です
速度テストには、1MB x 1,000個のファイルを使用しましたが、EBSには遠く及びません

aws s3 の ec2へのmount

mount-s3をオプションなしで実行した場合、ファイルの削除や上書きができない為、「--allow-delete」「--allow-overwrite」ありでマウントしています。

$ export AWS_ACCESS_KEY_ID=ないしょ
$ export AWS_SECRET_ACCESS_KEY=ないしょ

$ mkdir mount_s3
$ mount-s3 --allow-delete --allow-overwrite end0tknrnstest mount_s3

mount-s3経由でアップロード - 1MB x 1,000コ → 約7分

[ec2-user@ip-172-31-5-210 tmp]$ date; cp -r org_dummy_files/* mount_s3/; date
Wed Aug 21 07:14:01 AM UTC 2024
Wed Aug 21 07:21:12 AM UTC 2024

mount-s3経由でダウンロード - 1MB x 1,000コ → 約2分

$ date; cp mount_s3/* new_dummy_files/; date
Wed Aug 21 07:22:35 AM UTC 2024
Wed Aug 21 07:24:15 AM UTC 2024

mount-s3経由で削除 - 1MB x 1,000コ → 約1分

$ date; rm *; date
Wed Aug 21 07:24:43 AM UTC 2024
Wed Aug 21 07:25:41 AM UTC 2024

その他 - 今回使用したmount-s3のバージョンとヘルプ

[ec2-user@ip-172-31-5-210 mount_s3]$ mount-s3 --version
mount-s3 1.8.0
[ec2-user@ip-172-31-5-210 mount_s3]$ mount-s3 --help
Mountpoint for Amazon S3

Usage: mount-s3 [OPTIONS] <BUCKET_NAME> <DIRECTORY>

Arguments:
  <BUCKET_NAME>  Name of bucket to mount
  <DIRECTORY>    Directory to mount the bucket at

Options:
  -f, --foreground  Run as foreground process
  -h, --help        Print help
  -V, --version     Print version

Bucket options:
      --prefix <PREFIX>
          Prefix inside the bucket to mount, ending in '/' [default: mount the entire bucket]
      --region <REGION>
          AWS region of the bucket [default: auto-detect region]
      --endpoint-url <ENDPOINT_URL>
          S3 endpoint URL [default: auto-detect endpoint]
      --force-path-style
          Force path-style addressing
      --transfer-acceleration
          Use S3 Transfer Acceleration when accessing S3. This must be enabled on the bucket.
      --dual-stack
          Use dual-stack endpoints when accessing S3
      --requester-pays
          Set the 'x-amz-request-payer' to 'requester' on S3 requests
      --bucket-type <BUCKET_TYPE>
          Type of S3 bucket to use [default: inferred from bucket name] [possible values: general-purpose, directory]
      --storage-class <STORAGE_CLASS>
          Set the storage class for new objects
      --expected-bucket-owner <AWS_ACCOUNT_ID>
          Account ID of the expected bucket owner. If the bucket is owned by a different account, S3 requests fail with an access denied error.
      --sse <SSE>
          Server-side encryption algorithm to use when uploading new objects [possible values: aws:kms, aws:kms:dsse, AES256]
      --sse-kms-key-id <AWS_KMS_KEY_ARN>
          AWS Key Management Service (KMS) key ARN to use with KMS server-side encryption when uploading new objects. Key ID, Alias and Alias ARN are all not supported.
      --upload-checksums <ALGORITHM>
          Checksum algorithm to use for S3 uploads [default: crc32c] [possible values: crc32c, off]

AWS credentials options:
      --no-sign-request    Do not sign requests. Credentials will not be loaded if this argument is provided.
      --profile <PROFILE>  Use a specific profile from your credential file.

Mount options:
      --read-only              Mount file system in read-only mode
      --allow-delete           Allow delete operations on file system
      --allow-overwrite        Allow overwrite operations on file system
      --auto-unmount           Automatically unmount on exit
      --allow-root             Allow root user to access file system
      --allow-other            Allow other users, including root, to access file system
      --uid <UID>              Owner UID [default: current user's UID]
      --gid <GID>              Owner GID [default: current user's GID]
      --dir-mode <DIR_MODE>    Directory permissions [default: 0755]
      --file-mode <FILE_MODE>  File permissions [default: 0644]

Client options:
      --maximum-throughput-gbps <N>  Maximum throughput in Gbps [default: auto-detected on EC2 instances, 10 Gbps elsewhere]
      --max-threads <N>              Maximum number of FUSE daemon threads [default: 16]
      --part-size <SIZE>             Part size for multi-part GET and PUT in bytes [default: 8388608]
      --read-part-size <SIZE>        Part size for GET in bytes [default: 8388608]
      --write-part-size <SIZE>       Part size for multi-part PUT in bytes [default: 8388608]

Logging options:
  -l, --log-directory <DIRECTORY>  Write log files to a directory [default: logs written to syslog]
      --log-metrics                Enable logging of summarized performance metrics
  -d, --debug                      Enable debug logging for Mountpoint
      --debug-crt                  Enable debug logging for AWS Common Runtime
      --no-log                     Disable all logging. You will still see stdout messages.

Caching options:
      --cache <DIRECTORY>
          Enable caching of object content to the given directory and set metadata TTL to 60 seconds
      --metadata-ttl <SECONDS|indefinite|minimal>
          Time-to-live (TTL) for cached metadata in seconds [default: minimal, or 60 seconds if --cache is set]
      --max-cache-size <MiB>
          Maximum size of the cache directory in MiB [default: preserve 5% of available space]

Advanced options:
      --user-agent-prefix <PREFIX>  Configure a string to be prepended to the 'User-Agent' HTTP request header for all S3 requests

2024-08-15

MobileNet , tf2onnx for python 等による類似画像検索 (改)

前回のentryでは、224224サイズの画像ファイルしか扱えませんでしたので大きなサイズの画像は、224224サイズにタイル分割した上で、特徴量を算出するようにしました。

( 前回のentryにあった calc_feature(onnx_session, img_path) を改良しています )

2. 各画像ファイルの特徴量算出

import glob
import os
import numpy as np
import onnxruntime
import PIL.Image

img_base_dir       = os.path.abspath( "./png" )
feature_base_dir   = os.path.abspath( "./feature" )
merged_feature_dir = os.path.abspath( "./merged_feature" )
merge_limit = 200
onnx_model_path = \
    "./mobilenet-v3-tensorflow2-large-100-224-feature-vector-v1.onnx"
#provider = ['CUDAExecutionProvider','CPUExecutionProvider']
# CPUでも十分、短時間で処理できます
provider = ['CPUExecutionProvider']

def main():
    # onnx modelのロード
    onnx_session = onnxruntime.InferenceSession( onnx_model_path,
                                                 providers=provider )

    # 各画像の特徴量算出
    for img_path in glob.glob(img_base_dir + "/**/*.png", recursive=True):
        img_dir_basename = os.path.split( img_path )
        feature_path = img_path.replace(img_base_dir,feature_base_dir)
        feature = calc_feature(onnx_session, img_path)
        #print( len( feature ) )
        
        np.save(feature_path, feature)
        
    # 各特徴量fileを集約
    features  = []
    img_paths = []
    i = 0
    for feature_path in glob.glob(feature_base_dir+"/**/*.npy",recursive=True):
        if len(features) < merge_limit:
            feature = np.load( feature_path )
            features.append(feature)
            img_path = feature_path.replace(feature_base_dir, img_base_dir)
            img_paths.append(img_path)
            continue

        features_path = os.path.join(merged_feature_dir,
                                     "features_{:03d}.npy".format(i) )
        np.save(features_path, features)
        features = []
        img_paths_path = os.path.join(merged_feature_dir,
                                      "img_paths_{:03d}.npy".format(i) )
        np.save(img_paths_path, img_paths)
        img_paths = []
        i += 1
        
def calc_feature(onnx_session, img_path):

    unit_size = 224
    
    image = PIL.Image.open( img_path )
    image = image.crop((40,25,950,610))
    image = image.resize((unit_size*3, unit_size*2))
    
    feature = []
    
    for u_x in [0,1,2]:
        for u_y in [0,1]:
            win_coord = (unit_size*u_x,    unit_size*u_y,
                         unit_size*(u_x+1),unit_size*(u_y+1))

            tmp_img = image.crop( win_coord )
            # tmp_img = tmp_img.convert("L")

            tmp_img = np.array(tmp_img, dtype=np.float32)
            tmp_img = tmp_img / 255

            # model入力に合わせ、1チャンネルのモノクロ画像を3チャンネルに拡張
            tmp_img = np.stack([tmp_img] * 3, axis=-1)
            tmp_feature = onnx_session.run(
                ["feature_vector"],
                {"inputs": np.expand_dims(tmp_img, 0)} )[0][0]
            feature += list(tmp_feature)
    return feature

if __name__ == "__main__":
    main()

3. 類似画像検索

#!/usr/bin/env python3

import os
import sys
import glob
import numpy as np
import onnxruntime
import PIL.Image

merged_feature_dir = os.path.abspath( "./merged_feature" )
merge_limit = 200
onnx_model_path = \
    "./mobilenet-v3-tensorflow2-large-100-224-feature-vector-v1.onnx"
#provider = ['CUDAExecutionProvider','CPUExecutionProvider']
# CPUでも十分、短時間で処理できます
provider = ['CPUExecutionProvider']

def main():
    # onnx modelのロード
    onnx_session = onnxruntime.InferenceSession( onnx_model_path,
                                                 providers=provider )
    
    img_path = "./png/CVAD45-06-000.60.png"
    query_feature = calc_feature(onnx_session, img_path)

    features  = load_merged_features()
    img_paths = load_img_paths()

    # 配列ndarrayを任意のtileに並べる
    query_features = np.tile(query_feature, (len(features), 1))
    #print( query_features )
    
    # 距離算出
    distances = np.linalg.norm(query_features - features, axis=1)
    # print( distances )

    # 類似画像検索の結果出力
    find_limit = 100
    distance_idxs = np.argsort(distances)[:find_limit]

    for idx in distance_idxs:
        print( img_paths[idx], distances[idx] )

    
def load_merged_features():
    ret_datas = []
    for feature_path in glob.glob(merged_feature_dir+"/**/features_*.npy",
                                  recursive=True ):
        ret_datas += list( np.load( feature_path ) )
    return ret_datas

def load_img_paths():
    ret_datas = []
    for imgs_list in glob.glob(merged_feature_dir+"/**/img_paths_*.npy",
                                   recursive=True ):
        ret_datas += list( np.load( imgs_list ) )
    return ret_datas
    
def calc_feature(onnx_session, img_path):

    unit_size = 224
    
    image = PIL.Image.open( img_path )
    image = image.crop((40,25,950,610))
    image = image.resize((unit_size*3, unit_size*2))
    
    feature = []
    
    for u_x in [0,1,2]:
        for u_y in [0,1]:
            win_coord = (unit_size*u_x,    unit_size*u_y,
                         unit_size*(u_x+1),unit_size*(u_y+1))

            tmp_img = image.crop( win_coord )
            # tmp_img = tmp_img.convert("L")

            tmp_img = np.array(tmp_img, dtype=np.float32)
            tmp_img = tmp_img / 255

            # model入力に合わせ、1チャンネルのモノクロ画像を3チャンネルに拡張
            tmp_img = np.stack([tmp_img] * 3, axis=-1)
            tmp_feature = onnx_session.run(
                ["feature_vector"],
                {"inputs": np.expand_dims(tmp_img, 0)} )[0][0]
            feature += list(tmp_feature)
    return feature

if __name__ == "__main__":
    main()

2024-08-15

MobileNet , tf2onnx for python 等による類似画像検索

先程のentryで構築した環境を用い、参考urlにある類似画像検索を写経。

参考url
TODO - 224*224よりも大きな画像の類似検索
1. mobilenet_v3 の取得とonnx形式への変換
2. 各画像ファイルの特徴量算出
3. 類似画像検索

参考url

類似画像検索ツールを作ってみる (2) 特徴化その1

TODO - 224*224よりも大きな画像の類似検索

python script等は以下に記載している通りで、それっぽく動作しました。

類似検索の対象にした 2D CAD図面は、特徴量算出モデル MobileNet v3の入力に合わせ、 224*224サイズに縮小する必要があります。

この縮小により、CAD図面の線画の情報が失われている気がします。

224*224よりも大きなサイズの類似検索はどうしたものか

1. mobilenet_v3 の取得とonnx形式への変換

mobilenet_v3 の TensorFlow Hub形式での取得

https://tfhub.dev/google/imagenet/mobilenet_v3_large_100_224/feature_vector/5 にある Download → Download as tar.gz ボタンで mobilenet-v3-tensorflow2-large-100-224-feature-vector-v1.tar.gz をダウンロードし、これを解凍します。

PS C:\Users\end0t\tmp\similar_img>tree /F
C:.
└─mobilenet-v3-tensorflow2-large-100-224-feature-vector-v1
    │  saved_model.pb
    └─variables
            variables.data-00000-of-00001
            variables.index

onnx形式への変換

miniconda cuda> python -m tf2onnx.convert \
   --saved-model mobilenet-v3-tensorflow2-large-100-224-feature-vector-v1 \
   --output mobilenet_v3_large_100_224_feature_vector_v5.onnx

C:\Users\end0t\miniconda3\envs\cuda\lib\runpy.py:126: RuntimeWarning: 'tf2onnx.convert' found in sys.modules after import of package 'tf2onnx', but prior to execution of 'tf2onnx.convert'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
2024-08-14 16:39:01,033 - WARNING - '--tag' not specified for saved_model. Using --tag serve
2024-08-14 16:39:01,843 - INFO - Fingerprint not found. Saved model loading will continue.
2024-08-14 16:39:01,843 - INFO - Signatures found in model: [serving_default].
2024-08-14 16:39:01,843 - WARNING - '--signature_def' not specified, using first signature: serving_default
2024-08-14 16:39:01,854 - INFO - Output names: ['feature_vector']
2024-08-14 16:39:03,539 - INFO - Using tensorflow=2.13.1, onnx=1.16.2, tf2onnx=1.16.1/15c810
2024-08-14 16:39:03,539 - INFO - Using opset <onnx, 15>
2024-08-14 16:39:03,665 - INFO - Computed 0 values for constant folding
2024-08-14 16:39:03,934 - INFO - Optimizing ONNX model
2024-08-14 16:39:05,841 - INFO - After optimization: BatchNormalization -46 (46->0), Const -262 (406->144), GlobalAveragePool +8 (0->8), Identity -2 (2->0), ReduceMean -8 (8->0), Reshape +3 (15->18), Transpose -236 (237->1)
2024-08-14 16:39:06,010 - INFO -
2024-08-14 16:39:06,010 - INFO - Successfully converted TensorFlow model mobilenet-v3-tensorflow2-large-100-224-feature-vector-v1 to ONNX
2024-08-14 16:39:06,010 - INFO - Model inputs: ['inputs']
2024-08-14 16:39:06,010 - INFO - Model outputs: ['feature_vector']
2024-08-14 16:39:06,010 - INFO - ONNX model is saved at mobilenet-v3-tensorflow2-large-100-224-feature-vector-v1.onnx

推論することで、変換できていることを確認

import numpy as np
import onnxruntime

print( onnxruntime.get_device() )
print( onnxruntime.get_available_providers() )

provider = ['CUDAExecutionProvider','CPUExecutionProvider']

session = onnxruntime.InferenceSession(
    "./mobilenet-v3-tensorflow2-large-100-224-feature-vector-v1.onnx",
    providers=provider)

print( session.get_providers() )

tmp_result = session.run(
    ["feature_vector"],
    {"inputs": np.zeros((1, 224, 224, 3), dtype=np.float32)} )
print( tmp_result )

tmp_result = session.run(
    ["feature_vector"],
    {"inputs": np.ones((1, 224, 224, 3), dtype=np.float32)} )

print( tmp_result )

↑こう書いて、↓こう表示されればOK

(cuda) C:\Users\end0t\tmp\similar_img>python foo1.py
GPU
['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
['CUDAExecutionProvider', 'CPUExecutionProvider']
[array([[-0.19626826, -0.3297085 ,  0.01850062, ...,  1.1618388 ,
        -0.3663718 , -0.33905375]], dtype=float32)]
[array([[ 0.46943867,  0.20897101,  0.30629852, ..., -0.36712584,
        -0.31481627, -0.33279896]], dtype=float32)]

2. 各画像ファイルの特徴量算出

import glob
import os
import numpy as np
import onnxruntime
import PIL.Image

img_base_dir       = os.path.abspath( "./png" )
feature_base_dir   = os.path.abspath( "./feature" )
merged_feature_dir = os.path.abspath( "./merged_feature" )
merge_limit = 200
onnx_model_path = \
    "./mobilenet-v3-tensorflow2-large-100-224-feature-vector-v1.onnx"
#provider = ['CUDAExecutionProvider','CPUExecutionProvider']
# CPUでも十分、短時間で処理できます
provider = ['CPUExecutionProvider']

def main():
    # onnx modelのロード
    onnx_session = onnxruntime.InferenceSession( onnx_model_path,
                                                 providers=provider )

    # 各画像の特徴量算出
    for img_path in glob.glob(img_base_dir + "/**/*.png", recursive=True):
        img_dir_basename = os.path.split( img_path )
        feature_path = img_path.replace(img_base_dir,feature_base_dir)
        feature = calc_feature(onnx_session, img_path)

        # print( feature_path )
        np.save(feature_path, feature)
        
    # 各特徴量fileを集約
    features  = []
    img_paths = []
    i = 0
    for feature_path in glob.glob(feature_base_dir+"/**/*.npy",recursive=True):
        if len(features) < merge_limit:
            feature = np.load( feature_path )
            features.append(feature)
            img_path = feature_path.replace(feature_base_dir, img_base_dir)
            img_paths.append(img_path)
            continue

        features_path = os.path.join(merged_feature_dir,
                                     "features_{:03d}.npy".format(i) )
        np.save(features_path, features)
        features = []
        img_paths_path = os.path.join(merged_feature_dir,
                                      "img_paths_{:03d}.npy".format(i) )
        np.save(img_paths_path, img_paths)
        img_paths = []
        i += 1
        
def calc_feature(onnx_session, img_path):

    image = PIL.Image.open( img_path )
    #image = image.convert("RGB")
    #image = image.convert("L")
    image = image.resize((224, 224)) # model入力が 224*224の為
    image = np.array(image, dtype=np.float32)
    image = image / 255

    # model入力に合わせ、1チャンネルのモノクロ画像を3チャンネルに拡張
    image = np.stack([image] * 3, axis=-1)

    feature = onnx_session.run(["feature_vector"],
                               {"inputs": np.expand_dims(image, 0)})[0][0]
    return feature

if __name__ == "__main__":
    main()

3. 類似画像検索

import os
import sys
import glob
import numpy as np
import onnxruntime
import PIL.Image

merged_feature_dir = os.path.abspath( "./merged_feature" )
merge_limit = 200
onnx_model_path = \
    "./mobilenet-v3-tensorflow2-large-100-224-feature-vector-v1.onnx"
#provider = ['CUDAExecutionProvider','CPUExecutionProvider']
# CPUでも十分、短時間で処理できます
provider = ['CPUExecutionProvider']

def main():
    # onnx modelのロード
    onnx_session = onnxruntime.InferenceSession( onnx_model_path,
                                                 providers=provider )
    
    img_path = "./png/CVAD45-06-000.60.png"
    query_feature = calc_feature(onnx_session, img_path)

    features  = load_merged_features()
    img_paths = load_img_paths()

    # 配列ndarrayを任意のtileに並べる
    query_features = np.tile(query_feature, (len(features), 1))
    #print( query_features )
    
    # 距離算出
    distances = np.linalg.norm(query_features - features, axis=1)
    # print( distances )

    # 類似画像検索の結果出力
    find_limit = 10
    distance_idxs = np.argsort(distances)[:find_limit]
    print( distance_idxs )
    for idx in distance_idxs:
        print( img_paths[idx], distances[idx] )

    
def load_merged_features():
    ret_datas = []
    for feature_path in glob.glob(merged_feature_dir+"/**/features_*.npy",
                                  recursive=True ):
        ret_datas += list( np.load( feature_path ) )
    return ret_datas

def load_img_paths():
    ret_datas = []
    for imgs_list in glob.glob(merged_feature_dir+"/**/img_paths_*.npy",
                                   recursive=True ):
        ret_datas += list( np.load( imgs_list ) )
    return ret_datas
    
def calc_feature(onnx_session, img_path):

    image = PIL.Image.open( img_path )
    #image = image.convert("L")
    image = image.resize((224, 224)) # model入力が 224*224の為
    image = np.array(image, dtype=np.float32)
    image = image / 255

    # model入力に合わせ、1チャンネルのモノクロ画像を3チャンネルに拡張
    image = np.stack([image] * 3, axis=-1)

    feature = onnx_session.run(["feature_vector"],
                               {"inputs": np.expand_dims(image, 0)})[0][0]
    return feature

if __name__ == "__main__":
    main()

2024-08-14

類似画像検索に向けた cuda11.8, cudnn8.5.0 再installや、pip install onnxruntime-gpu 等

win11 pcには以前、cuda 11.2, cuDNN 8.9.2 をinstallしていますが、 onnxruntime が必要とするversionと異なるようですので、 cuda11.8, cudnn8.5.0 を再installしています。

ついでに、GeForce Game Ready driver も再installしています。

0. 環境 win11 + miniconda24.5 + GeForce RTX 4090
1. GeForce Game Ready driver, CUDA, cuDNN の再install
2. conda create や pip install onnxruntime-gpu
3. pip install tensorflow[and-cuda] tf2onnx 等

0. 環境 win11 + miniconda24.5 + GeForce RTX 4090

PS> systeminfo
  :
OS 名:                  Microsoft Windows 11 Pro
OS バージョン:          10.0.22621 N/A ビルド 22621
システム製造元:         LENOVO
システム モデル:        21HMCTO1WW
システムの種類:         x64-based PC
                        [01]: Intel64 Family 6 Model 186 Stepping 2
                  GenuineIntel ~1900 Mhz
Windows ディレクトリ:   C:\Windows
システム ディレクトリ:  C:\Windows\system32
物理メモリの合計:       65,193 MB
  :

PS C:\Users\end0t> conda --version
conda 24.5.0
PS C:\Users\end0t> python --version
Python 3.12.4

PS C:\Users\end0t> Get-WmiObject -Class Win32_VideoController
  :
AdapterCompatibility         : NVIDIA
AdapterDACType               : Integrated RAMDAC
AdapterRAM                   : 4293918720
Availability                 : 8
Caption                      : NVIDIA GeForce RTX 4090
DriverDate                   : 20240730000000.000000-000
DriverVersion                : 32.0.15.6081
  :

1. GeForce Game Ready driver, CUDA, cuDNN の再install

参考url 1.1にあるように、 win11 pcには以前、cuda 11.2, cuDNN 8.9.2 をinstallしていますが、 onnxruntime を使用する為、参考url 1.3 & 4 を参照し、 cuda11.8, cudnn8.5.0 を再installしています。

ついでに、GeForce Game Ready driver も再installしています。

参考url

GeForce Game Ready driver 560.81 win11

https://www.nvidia.com/ja-jp/drivers/details/230597/ より 560.81-desktop-win10-win11-64bit-international-dch-whql.exe をダウンロードし、実行するだけです。

CUDA Toolkit 11.8

https://developer.nvidia.com/cuda-toolkit-archive より cuda_11.8.0_522.06_windows.exe をダウンロードし、実行します。

cuda_11.8.0_522.06_windows.exe のインストーラが、多く環境変数も設定してくれますが、「CUDA_PATH = %CUDA_PATH_V11_8%」だけは、手動編集しています。

以下は環境変数 PATH

また、参考url 1.2 を見ると、cuda11.2 のinstall時、レジストリを編集しているようですので、以下の内容を追加しています。

項目	内容
ﾚｼﾞｽﾄﾘｷｰ	HKEY_LOCAL_MACHINE\SOFTWARE\NVIDIA Corporation\GPU Computing Toolkit\CUDA\v11.8
名前	InstallDir
種類	REG_SZ (文字列)
データ	C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8

cuDNN 8.5.0.96

https://developer.nvidia.com/rdp/cudnn-archive より cudnn-windows-x86_64-8.5.0.96_cuda11-archive.zip をダウンロード & 解凍し C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 へ、コピーします。

GeForce RTX 4090 が認識されていることを確認

ここまでのinstallで nvidia-smi というコマンドが、 C:\Windows\System32 にinstallされていますので、以下のように実行することで GeForce RTX 4090 が認識されていることを確認できます。

PS C:\Users\end0t> nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-e87ef9c0-2654-6223-9868-ecc2973f1789)

2. conda create や pip install onnxruntime-gpu

参考url

2.1 ONNX Runtime（GPU版）のインストール | ジコログ

miniconda base>python --version Python 3.12.4

conda create や pip install onnxruntime-gpu

miniconda base> conda create -n cuda python=3.10
miniconda base> conda activate cuda

miniconda cuda> python -m pip install --upgrade pip setuptools

miniconda cuda> pip install onnxruntime-gpu

miniconda cuda> pip list
Package         Version
--------------- -------
coloredlogs     15.0.1
flatbuffers     24.3.25
humanfriendly   10.0
mpmath          1.3.0
numpy           1.26.4
onnxruntime-gpu 1.18.1
packaging       24.1
pip             24.2
protobuf        5.27.3
pyreadline3     3.4.1
setuptools      72.2.0
sympy           1.13.2
wheel           0.43.0

onnxruntimeがgpuを認識していることを確認

import onnxruntime
 
print( onnxruntime.get_available_providers() )

↑こう書いて、↓こう表示されれば、OKです

miniconda cuda> python foo.py
['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

3. pip install tensorflow[and-cuda] tf2onnx 等

この先のentryで onnx 書式のモデルを使用し、類似画像検索を行いますが、これに備え、tensorflow[and-cuda] tf2onnx の install 等を行います。

参考url

pip install tensorflow[and-cuda] tf2onnx 等

miniconda cuda> pip install tensorflow[and-cuda]
miniconda cuda> pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
miniconda cuda> pip install tf2onnx

miniconda cuda> pip list

Package                      Version
---------------------------- ------------
absl-py                      2.1.0
astunparse                   1.6.3
cachetools                   5.4.0
certifi                      2024.7.4
charset-normalizer           3.3.2
coloredlogs                  15.0.1
filelock                     3.13.1
flatbuffers                  24.3.25
fsspec                       2024.2.0
gast                         0.4.0
google-auth                  2.33.0
google-auth-oauthlib         1.0.0
google-pasta                 0.2.0
grpcio                       1.65.4
h5py                         3.11.0
humanfriendly                10.0
idna                         3.7
Jinja2                       3.1.3
keras                        2.13.1
libclang                     18.1.1
Markdown                     3.6
MarkupSafe                   2.1.5
mpmath                       1.3.0
networkx                     3.2.1
numpy                        1.24.3
oauthlib                     3.2.2
onnx                         1.16.2
onnxruntime-gpu              1.18.1
opt-einsum                   3.3.0
packaging                    24.1
pillow                       10.2.0
pip                          24.2
protobuf                     3.20.3
pyasn1                       0.6.0
pyasn1_modules               0.4.0
pyreadline3                  3.4.1
requests                     2.32.3
requests-oauthlib            2.0.0
rsa                          4.9
setuptools                   72.2.0
six                          1.16.0
sympy                        1.13.2
tensorboard                  2.13.0
tensorboard-data-server      0.7.2
tensorflow                   2.13.1
tensorflow-estimator         2.13.0
tensorflow-intel             2.13.1
tensorflow-io-gcs-filesystem 0.31.0
termcolor                    2.4.0
tf2onnx                      1.16.1
torch                        2.4.0+cu118
torchaudio                   2.4.0+cu118
torchvision                  0.19.0+cu118
typing_extensions            4.9.0
urllib3                      2.2.2
Werkzeug                     3.0.3
wheel                        0.43.0
wrapt                        1.16.0

pytorch から gpu が認識されていることを確認

# -*- coding: utf-8 -*-
import torch
from tensorflow.python.client import device_lib;

print( torch.cuda.is_available() )
print( torch.version.cuda )
print( torch.cuda.device_count() )
print( torch.cuda.get_device_name() )
print( "" )

print( device_lib.list_local_devices() )

↑こう書いて↓こう表示されればOK

True
11.8
1
NVIDIA GeForce RTX 4090

2024-08-14 15:41:58.819659: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE SSE2 SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1347897820107021385
xla_global_id: -1
]

zlibwapi.dll のコピー

この先のentryの操作で zlibwapi.dll に関するエラーとなりましたので、

C:\Users\end0t\miniconda3\envs\cuda\Lib\site-packages\torch\lib にある zlibwapi.dll を copy to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin へ、コピーしています。

2024-08-11

installing mysql9.0 source distribution to oraclie linux 8.7

参考url
install先 - oracle linux 8.7
依存library
user & group追加
mysql9.0 source distribution の入手と解凍
cmakeによる configuration
make ～ make test ～ make install
my.cnfによる設定等
datadir作成や初期化
systemd 自動起動
root による接続テスト

参考url

install先 - oracle linux 8.7

$ cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.7 (Ootpa)  

$ cat /etc/os-release 
NAME="Oracle Linux Server"
VERSION="8.7"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="8.7"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Oracle Linux Server 8.7"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:8:7:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 8"
ORACLE_BUGZILLA_PRODUCT_VERSION=8.7
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=8.7

依存library

https://end0tknr.hateblo.jp/entry/20220108/1641624999 を参考に、以下を yum install.

$ sudo yum install openssl-devel
$ sudo yum install ncurses-devel
$ sudo yum install libedit-devel
$ sudo yum install boost-devel
$ sudo yum install gcc-c++
$ sudo yum install libtirpc-devel
$ sudo yum install rpcgen

その後、cmakeを実行しましたが、以下も yum installするよう cmakeがメッセージを表示した為、追加

$ sudo yum install \
  gcc-toolset-13-gcc gcc-toolset-13-gcc-c++ \
  gcc-toolset-13-binutils gcc-toolset-13-annobin-annocheck \
  gcc-toolset-13-annobin-plugin-gcc

user & group追加

$ sudo groupadd mysql
$ sudo useradd -r -g mysql mysql

mysql9.0 source distribution の入手と解凍

$ wget https://dev.mysql.com/get/Downloads/MySQL-9.0/mysql-9.0.1.tar.gz
$ tar -xvf mysql-9.0.1.tar.gz

cmakeによる configuration

尚、cmakeのオプションは以下を参照

https://dev.mysql.com/doc/refman/9.0/en/source-configuration-options.html

$ mkdir build
$ cd build
$ cmake .. \
   -DCMAKE_INSTALL_PREFIX=/usr/local/mysql \
   -DWITH_SYSTEMD=ON

make ～ make test ～ make install

make test で少々、failしましたが、無視し、make install しています。

$ make
$ sudo make test
  :
99% tests passed, 2 tests failed out of 305

Label Time Summary:
NDB    = 117.86 sec*proc (37 tests)

Total Test time (real) = 1394.25 sec

The following tests FAILED:
          8 - NdbGetInAddr-t (Failed)
        300 - routertest_integration_routing_sharing (Failed)
Errors while running CTest
Output from these tests are in: /home/end0tknr/tmp/mysql-9.0.1/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
make: *** [Makefile:91: test] Error 8

$ sudo make install

my.cnfによる設定等

my.cnf は、makeにより作成されたものを https://end0tknr.hateblo.jp/entry/20220108/1641624999 も参考に編集

$ sudo cp ./packaging/rpm-common/my.cnf /etc/

$ suo vi /etc/my.cnf
[mysqld]
basedir = /usr/local/mysql
datadir = /var/mysql_data

socket=/tmp/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

skip-grant-tables
default_password_lifetime=0
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

datadir作成や初期化

$ sudo mkdir /var/mysql_data
$ sudo chown mysql:mysql /var/mysql_data
$ sudo chmod 750 /var/mysql_data

$ sudo touch /var/log/mysqld.log
$ sudo chown mysql:mysql /var/log/mysqld.log

$ sudo mkdir /var/run/mysqld
$ sudo chown mysql:mysql /var/run/mysqld

mysqlのroot初期パスワードは /var/log/mysqld.log で確認できますが、 my.cnf で skip-grant-tables していますので、パスワードなしでも、mysqlへ接続できます。

$ sudo /usr/local/mysql/bin/mysqld --initialize --user=mysql

$ sudo cat /var/log/mysqld.log
2024-08-11T01:36:03.034743Z 0 [System] [MY-015017] [Server] MySQL Server Initialization - start.
2024-08-11T01:36:03.036746Z 0 [Warning] [MY-010915] [Server] 'NO_ZERO_DATE', 'NO_ZERO_IN_DATE' and 'ERROR_FOR_DIVISION_BY_ZERO' sql modes should be used with strict mode. They will be merged with strict mode in a future release.
2024-08-11T01:36:03.036792Z 0 [System] [MY-013169] [Server] /usr/local/mysql/bin/mysqld (mysqld 9.0.1) initializing of server in progress as process 271822
2024-08-11T01:36:03.060561Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2024-08-11T01:36:03.696142Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2024-08-11T01:36:05.836016Z 6 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: 9flrM#n<u:tg
2024-08-11T01:36:08.423575Z 0 [System] [MY-015018] [Server] MySQL Server Initialization - end.

systemd 自動起動

mysqlの systemd については https://blog.s-style.co.jp/2024/07/12276/ が分かりやすい気がします

$ sudo cp ./scripts/mysqld.service /etc/systemd/system/

$ sudo systemctl enable mysqld.service
Created symlink /etc/systemd/system/multi-user.target.wants/mysqld.service → /etc/systemd/system/mysqld.service.

$ sudo systemctl start mysqld.service

root による接続テスト

$ /usr/local/mysql/bin/mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 9.0.1 Source distribution

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
4 rows in set (0.02 sec)

strawberry perl for win のインストール

「Windowsの機能」でIISとCGIを有効化

IISマネージャーからハンドラーマッピングを追加

perl cgiのsample script

参考url

install elasticsearch 8.13.4 と、初回機能によるrootパスワード発行

install analysis-sudachiプラグイン ver.3.2.2

システム辞書と、ユーザ辞書の準備

index作成と、辞書設定

分かち書きのテスト

参考url

install analysis-kuromoji

一旦、indexを閉じ、kuromoji をデフォルトのトークナイザに設定

日本語データの投入

日本語での検索test

形態素解析のtest

今回のポイントは

参考url

install

初回起動と、rootパスワード確認

index作成と作成結果の確認 (今回のindex名は test_index )

検索データの登録による Mapping 作成

検索テスト (例: 検索条件は [description] に [initial] 含む)

参考url

事前準備 1 - node.js v10.16.3 のインストール

事前準備 2 - kuromoji.js のインストール

kuromoji.js + SudachiDict 環境での ユーザ辞書作成

参考url

ellamaの依存ライブラリ (ollama、emacs29)

installation

model download

configuration ( .emacs.d/init.el )

テスト

M-x ellama-make-table

M-x ellama-make-format ⏎ json ⏎

M-x ellama-summarize 、M-x ellama-translate

参考url

Step 1 - installation ollama

Step 2 - download Model

Step 3 - 翻訳テスト

aws s3 の ec2へのmount

mount-s3経由でアップロード - 1MB x 1,000コ → 約7分

mount-s3経由でダウンロード - 1MB x 1,000コ → 約2分

mount-s3経由で削除 - 1MB x 1,000コ → 約1分

その他 - 今回使用したmount-s3のバージョンとヘルプ

2. 各画像ファイルの特徴量算出

3. 類似画像検索

参考url

TODO - 224*224よりも大きな画像の類似検索

1. mobilenet_v3 の取得とonnx形式への変換

mobilenet_v3 の TensorFlow Hub形式での取得

onnx形式への変換

推論することで、変換できていることを確認

2. 各画像ファイルの特徴量算出

3. 類似画像検索

0. 環境 win11 + miniconda24.5 + GeForce RTX 4090

1. GeForce Game Ready driver, CUDA, cuDNN の再install

参考url

GeForce Game Ready driver 560.81 win11

CUDA Toolkit 11.8

cuDNN 8.5.0.96

GeForce RTX 4090 が認識されていることを確認

2. conda create や pip install onnxruntime-gpu

参考url

conda create や pip install onnxruntime-gpu

onnxruntimeがgpuを認識していることを確認

3. pip install tensorflow[and-cuda] tf2onnx 等

参考url

pip install tensorflow[and-cuda] tf2onnx 等

pytorch から gpu が認識されていることを確認

zlibwapi.dll のコピー

参考url

install先 - oracle linux 8.7

依存library

user & group追加

mysql9.0 source distribution の入手と解凍

cmakeによる configuration

make ～ make test ～ make install

my.cnfによる設定等

datadir作成や初期化

kuromoji.js + SudachiDict 環境でのユーザ辞書作成