awk to reformat text

aydj · December 19, 2013, 5:45am

I have this input and want output like below, how can I achieve that through awk:

Input:

CAT1 FRY-01
CAT1 FRY-04
CAT1 DRY-03
CAT1 FRY-02
CAT1 DRY-04
CAT2 FRY-03
CAT2 FRY-02
CAT2 DRY-01
FAT3 DRY-12
FAT3 FRY-06

Output:

category CAT1
item FRY-01
item FRY-04
item DRY-03
item FRY-02
item DRY-04
ok

category CAT2
item FRY-03
item FRY-02
item DRY-01
ok

category FAT3
item DRY-12
item FRY-06
ok

RavinderSingh13 · December 19, 2013, 6:33am

Hello,

Not a single liner but may help you.

value=`awk '!a[$1]++' check_remove_dupli_1st | awk '{print $1}'`
set -A array_value $value
for i in ${array_value[@]}
do
echo "catagory" $i
while read line
do
str=`cat check_remove_dupli_1st | grep "$i" | awk '{print $2}'`
set -A array_str $str
for j in ${array_str[@]}
do
echo item $j
done
echo "ok"
break;
done < "check_remove_dupli_1st"
done

Output will be as follows.

$ ksh check_remove_dupli_1st.ksh
catagory CAT1
item FRY-01
item FRY-04
item DRY-03
item FRY-02
item DRY-04
ok
catagory CAT2
item FRY-03
item FRY-02
item DRY-01
ok
catagory FAT3
item DRY-12
item FRY-06
ok

NOTE: where check_remove_dupli_1st is the file name which have input.

Thanks,
R. Singh

Franklin52 · December 19, 2013, 6:41am

Or try:

awk 'cat!=$1{if(cat){print "ok" RS} cat=$1; print "category ", $1} {print "item", $2}END{print "ok"}' file

Akshay_Hegde · December 19, 2013, 6:47am

If field1 is sorted then this would work

$  cat <<test | awk '$1!=p{print NR == 1 ? "category" OFS $1 : "ok" RS RS "category" OFS $1}{print "item",$2;p=$1}END{print "ok"}'
CAT1 FRY-01
CAT1 FRY-04
CAT1 DRY-03
CAT1 FRY-02
CAT1 DRY-04
CAT2 FRY-03
CAT2 FRY-02
CAT2 DRY-01
FAT3 DRY-12
FAT3 FRY-06
test

category CAT1
item FRY-01
item FRY-04
item DRY-03
item FRY-02
item DRY-04
ok

category CAT2
item FRY-03
item FRY-02
item DRY-01
ok

category FAT3
item DRY-12
item FRY-06
ok

mstafreshi · December 19, 2013, 6:55am

awk '$1 != last && NR != 1 { print "ok\n"; } $1 != last { print "category " $1; last = $1;}{print "item " $2}END{print "ok"}' filename

zozoo · December 19, 2013, 7:08am

can you explain what that means it can be helpful

---------- Post updated at 05:38 PM ---------- Previous update was at 05:37 PM ----------

akshay hegde:

If field1 is sorted then this would work

$  cat <<test | awk '$1!=p{print NR == 1 ? "category" OFS $1 : "ok" RS RS "category" OFS $1}{print "item",$2;p=$1}END{print "ok"}'
CAT1 FRY-01
CAT1 FRY-04
CAT1 DRY-03
CAT1 FRY-02
CAT1 DRY-04
CAT2 FRY-03
CAT2 FRY-02
CAT2 DRY-01
FAT3 DRY-12
FAT3 FRY-06
test

category CAT1
item FRY-01
item FRY-04
item DRY-03
item FRY-02
item DRY-04
ok

category CAT2
item FRY-03
item FRY-02
item DRY-01
ok

category FAT3
item DRY-12
item FRY-06
ok

can explain the working of the code it would be helpful

Franklin52 · December 19, 2013, 7:47am

awk 'cat!=$1{if(cat){print "ok" RS} cat=$1; print "category ", $1} {print "item", $2}END{print "ok"}' file

Explanation:

if variable cat != $1 then
  if variable cat is not empty then
    print "ok" and RS
  end if
  variable cat = $1
  print "category " and $1
end if
print "item" and $2

END {print "ok"}

Akshay_Hegde · December 19, 2013, 8:18am

 awk '$1!=p{print NR == 1 ? "category" OFS $1 : "ok" RS RS "category" OFS $1}{print "item",$2;p=$1}END{print "ok"}'

$1!=p ---> if field1 is not equal to p

p will not be equal to field1 in 2 condition, that is

1 ---> when awk read 1st line, p is not set

2 ---> p is set, meaning we already read line before but field1 of current line is not equal p (assume p is previous line field1)

NR == 1 ? ---> if the line read by awk is 1st line then we are interested print only,
string "category" OFS (default is space) and column1 field $1

else, line read by awk is not first one, then print
string "ok" record separator RS (default is "\n") and print string "category" OFS (default is space) and column1 field $1

print "item",$2 ---> print string "item" and field2 $2 , with output field separator being default (space)

p=$1 ---> set p is field1 read from current line, which will be used to compare with next line again $1!=p

END{print "ok"} ---> The code in this block will be executed after executing all code, in short this is the last code going to execute in program, in current program it prints string "ok"