awkの使い方

6月 30, 2019

awkの使い方

$NF 最後の文字列

# echo "a1 b1 c1" | awk '{print $NF}'
c1

数値が20000000より大きい

# grep -n consumed *.log | grep Lku | awk '$6 > 20000000{print $1,$2,$6}'
ABCM4110_JOBSTATUS_20120101124731.log:4758: Lku_01,0: 21838780

文字列抜き出し

# A="ABCSO2210 16時23分28秒平成31年 6月26日（水）"
# echo $A | awk '{print $2}'
16時23分28秒
# echo $A | awk '{print(substr($2,1,2)":"substr($2,4,2)":"substr($2,7,2))}'
16:23:28
# A="21001 17:00:30 24 MAY 2019"
# echo $A | awk '{print(substr($2,1,2)":"substr($2,4,2)":"substr($2,7,2))}'
17:00:30

タイムスタンプ(日本、英語）のフォーマット変換

タイムスタンプを含んだデータの抽出をする場合、フォーマットに日本語（元号）、英語の両方が含まれていると容易にgrepなどで実現できないことがある。
awkで比較的容易に実装できる。
テストデータは以下。
$ cat sample.txt
test1 21時02分39秒平成30年 6月20日（木）
test2 21時02分39秒令和元年 6月20日（木）
test3 21時02分39秒令和12年 6月20日（木）
test4 16:11:01 29 MAY 2019
test5 09:59:24 5 JUN 2016

変換結果は以下。
$ cat sample.txt | awk -f sample1.awk
test1 20180620 21:02:39
test2 20190620 21:02:39
test3 20300620 21:02:39
test4 20190529 16:11:01
test5 20160605 09:59:24

$ cat sample1.awk
{
if(substr($2,3,1) == "時") {
time=substr($2,1,2) ":" substr($2,4,2) ":" substr($2,7,2)
if(substr($3,1,2) == "平成") {
year=substr($3,3,2) + 1988
} else if(substr($3,1,4) == "令和元年") {
year=2019
} else { # 令和2年以降
if(substr($3,4,1) == "年") {
year=substr($3,3,1) + 2018
} else {
year=substr($3,3,2) + 2018
}
}
tuki=substr($4,2,1)
if(tuki == "月") {
month=substr($4,1,1)
if(substr($4,4,1) == "日") {
day=substr($4,3,1)
} else {
day=substr($4,3,2)
}
} else {
month=substr($4,1,2)
if(substr($4,5,1) == "日") {
day=substr($4,4,1)
} else {
day=substr($4,4,2)
}
}
} else {
time=$2
day=$3
year=$5
if($4 == "JAN") {
month=1
} else if($4 == "FEB") {
month=2
} else if($4 == "MAR") {
month=3
} else if($4 == "APR") {
month=4
} else if($4 == "MAY") {
month=5
} else if($4 == "JUN") {
month=6
} else if($4 == "JUL") {
month=7
} else if($4 == "AUG") {
month=8
} else if($4 == "SEP") {
month=9
} else if($4 == "OCT") {
month=10
} else if($4 == "NOV") {
month=11
} else {
month=12
}
}
if(length(month) == 1) {
month="0" month
}
if(length(day) == 1) {
day="0" day
}
print $1 " " year month day " " time
}

AWKで指定日付より後のデータを抽出する

$ REQUIRED_DATE=20190620
$ cat sample.txt | awk -f sample1.awk | awk '$2 >= DATE'{print} DATE=${REQUIRED_DATE}
test2 20190620 21:02:39
test3 20300620 21:02:39

AWKでSUMする

# grep "produced" ABC1100_JOBLOG_20170709193241.log | grep "Trf_02"
Trf_02,0: Output 0 produced 137502 records.
Trf_02,1: Output 0 produced 36238573 records.
Trf_02,2: Output 0 produced 6479675 records.
Trf_02,7: Output 0 produced 109750 records.
Trf_02,3: Output 0 produced 1711917 records.
Trf_02,5: Output 0 produced 367398 records.
Trf_02,4: Output 0 produced 720240 records.
Trf_02,6: Output 0 produced 213023 records.
[root@VI-DWH-DS01 log]# grep "produced" ABC1100_JOBLOG_20170709193241.log | grep "Trf_02" | awk '{sum+=$5}END{print sum}'
45978078

AWKで正規表現

# seq 1000 | awk '/[123][5][1]/{print}'
151
251
351

このブログを検索

山と自然に帰るブログ

awkの使い方

$NF 最後の文字列

数値が20000000より大きい

文字列抜き出し

タイムスタンプ(日本、英語）のフォーマット変換

AWKで指定日付より後のデータを抽出する

AWKでSUMする

AWKで正規表現

コメント

コメントを投稿

人気の投稿

How to Manually Stop/Start IBM InfoSphere Information Server services on Unix/Linux

Snowflake SnowSQL使用手順