In the f1
file below I am trying to clean it up removing lines the have _tn_
in them. Next, removing the characters in $2
before the ninth /
. Then I remove the ID_(digit- always 4)
. Finally, the charcters after and including the first _
. It is curently doing most of it but the cut
is removing $1
and I'm sure there is a better way. Thank you :).
f1
1112233 /xxxx/xxxx/xxxx/xxxx/yyy_yyyy_yy-yyyy-yyy-yyy_yyyy_yyyy_yyyy_yyyy_yyy_yyy_yyy_000_000/yyy/yyy/ID_1234_000000-Control_z_zzzz_zz_zz_zz_zz_zz_zzz_zz-zzzz-zzz-zzz_zzzz_zzzz_zzz_zzz_zzz_zzz_zzz.txt
1112231 /xxxx/xxxx/xxxx/xxxx/yyy_yyyy_yy-yyyy-yyy-yyy_yyyy_yyyy_yyyy_yyyy_yyy_yyy_yyy_000_000/yyy_tn_yyy/yyy/ID_1234_000000-Control_z_zzzz_zz_zz_zz_zz_zz_zzz_zz-zzzz-zzz-zzz_zzzz_zzzz_zzz_zzz_zzz_zzz_zzz.txt
current
000000-Control_z_zzzz_zz_zz_zz_zz_zz_zzz_zz-zzzz-zzz-zzz_zzzz_zzzz_zzz_zzz_zzz_zzz_zzz.txt
desired
1112231 000000-Control
sed '/_tn_/d' f1 | cut -d/ -f9 | awk '{ gsub(/ID_[0-9][0-9][0-9][0-9]_/, "", $2); print }' | cut -d_ -f1- > out