AWK / SED, replace non alphanumeric characters in specific field

Hi,
I have a tab delimited file. Some times we get user input notes in the field number 18 and I want to replace only the non alphanumeric values with some exceptions. I've written the following sed command but it applies on all the columns. Can we do it on column 18th only? I am fine with either AWK or SED. Here I let the characters like "@ + . ' ( ) / $ - " to not get replaced.

sed "s/[^[:alnum:]\t.@+:'()/$-]/ /g" file_nm

awk '{gsub(/[^[:alnum:]\t.@+:'\(\)/$-]/ , " ",$18}1'
or if your awk is actually a gawk, use one of the predefined character classes:

[[:alpha:]]  = A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z
[[:digit:]]  = 0 1 2 3 4 5 6 7 8 9
[[:alnum:]]  = 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z
[[:lower:]]  = a b c d e f g h i j k l m n o p q r s t u v w x y z
[[:upper:]]  = A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
[[:xdigit:]] = 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f
[[:punct:]]  = ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~
[[:cntrl:]]  = NUL SOH STX ETX EOT ENQ ACK BEL BS TAB LF VT FF CR SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US DEL
[[:graph:]]  = ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
[[:print:]]  = Space ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
[[:blank:]]  = TAB Space
[[:space:]]  = TAB LF VT FF CR Space

When I try the following, it asks for input. Is there a syntax error?

awk '{gsub(/[^[:alnum:]\t.@+:'\(\)/$-]/ , " ",$18}1' test.txt
>

  1. :alnum: ↩︎

Ah, those pesky quotes and (). If your awk is actually a gawk:

exc="[^[:alnum:]\t.@+-:'()/$]" awk '{gsub(ENVIRON["exc"], " ", $18)}1' myFile

See if that helps.

Data has characters like ":package: :sleeping: :dog: :blush:" They are not getting removed. The command works.

cannot see the non-workign chars - they're being converted to emotion icons :wink:
copy/paste the non working chars and surround them with no-parse markdown tag,e.g. ````

Also provide a sample data set and the expected output

There is a weak point in your problem statement.

If the field separator is a tab character, then it cannot be present in the exception pattern.

Perhaps it is necessary to add a space character to the exception pattern, and in the awk command to define only tabulation as a separator?

awk -F'\t'

and

[^[:alnum:] .@+-:'()/$]

  1. :alnum: ↩︎

Data is like below. Want to preserve the words like "don't" with

FreeEconomy James Smith 123 MAIN ST LONDON CA 92562-2205 US +1 222-222-2222 ext. 93120 0.00 0.00 Sleeping. Shhhhh. :sleeping: Bonus points if you don’t a
lert the dogs. :dog: Please do not knock or ring doorbell. It’s a bit loud when you open, but please try your best. Thanks. :crazy_face: true

Not very clear as a quote. Try to output this example via cat -vet file and take the output in code tags.

FreeEconomy^IJames Smith^I123 MAIN ST^I^I^ILONDON^ICA^I92562-2205^IUS^I+1 222-222-2222 ext. 93120^I0.00^I^I0.00^I^I^I^I^ISleeping. Shhhhh. M-pM-^_M-^XM-4 Bonus points if you donM-bM-^@M-^Yt alert the dogs. M-pM-^_M-^PM-6 Please do not knock or ring doorbell. ItM-bM-^@M-^Ys a bit loud when you open, but please try your best. Thanks. M-pM-^_M-$M-*^I^Itrue^I$

I might be wrong, but it sounds like you want to remove all control characters in the TAB separated fields?
previously quoted character classes:

[[:alpha:]]  = A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z
[[:digit:]]  = 0 1 2 3 4 5 6 7 8 9
[[:alnum:]]  = 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z
[[:lower:]]  = a b c d e f g h i j k l m n o p q r s t u v w x y z
[[:upper:]]  = A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
[[:xdigit:]] = 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f
[[:punct:]]  = ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~
[[:cntrl:]]  = NUL SOH STX ETX EOT ENQ ACK BEL BS TAB LF VT FF CR SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US DEL
[[:graph:]]  = ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
[[:print:]]  = Space ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~

either gsub(/[[:cntrl:]]/, " ",$18) or gsub(/[^[:print:]]/, " ",$18) (don't forget -F'\t' as noted by @nezabudka and also OFS='\t')
I guess you'll have to try and see...

See if it works?

awk -v OFS='\t' -F'\t' '{gsub(/\xf0\S*/, " ", $18)}1'

If emoticons are not separated by a space:

awk -v OFS='\t' -F'\t' '{gsub(/\xf0[^[:print:]]*/, "  ", $0)}1'

And it should be noted that emoticons occupy 2 columns if it's important. By attention, it is better to use two spaces to replace.

@nezabudka don't forget the OFS to match FS

awk -v OFS='\t' -F'\t' '{gsub(/\xf0[^[:cntrl:]]*/, "  ", $33)}1' 

seems to work on part of a file, but not sure why the following data is still an issue. Here i am looking at field #33

    333-3333-22444632^I694484434494344238^I2020-30-33T32:32:49-044:00^I2020-30-33T32:32:49-044:00^Ih34n9drnbp84qbr@marketplace.gmail.com^IDon^I+3 333-436-3333 ext. 38339^ISTZZ434^IRescue - 2 Year Data Recovery Plan for Flash Memory Devices ($20-$90)^I3^IUSD^I3.22^I0.40^I0.00^I0.00^IFreeEconomy^IDon W Ross^I3423 SW MAIN ST APT 3^I^I^ISEATTLE^IWA^I28326-4363^IUS^I+3 333-436-3333 ext. 38339^I0.00^I^I0.00^I^I^I^I^IPlease call apartment 406 or Ross or Kawt from the call box. If customer is not home there are two different numbers on a list next to the call box to call to have package delivered. Please call customer if having any problems delivering package. 333-239-3333 or 206344344682.^I^Ifalse^I^I $
    333-2222-64488229^I380044998226466^I2020-30-08T32:23:93-044:00^I2020-30-08T32:23:93-044:00^I3wl3mzt44y6w24gj@marketplace.gmail.com^ISusan DAVID^I+3 333-6443-2222 ext. 02220^ISTZZ902^IRescue - 3 Year Data Recovery Plan for Flash Memory Devices ($20-$42.22)^I2^IUSD^I2.28^I0.60^I0.00^I0.00^IFreeEconomy^ISusan DAVID^I330A JOHNS RD^I^I^IGREER^ISC^I22690-44433^IUS^I+3 333-222-6630 ext. 02220^I0.00^I^I0.00^I^I^I^I^IWe are open 8:30 to 9:00.  No one tried to deliver the package.......   We have been in this  building for 38 years......  WE ARE AT 330A JOHNS ROAD    LAZY DRIVERS DO NOT WANT TO GET OUT TO WALK THRU LTHE DOOR TO DELIVER  IT NEEDS TO BE DELIVER TODAY       ^I^Ifalse^I^I $
    334-3333-44394632^I3326304436044066^I2020-33-30T38:40:32-08:00^I2020-33-30T38:40:32-08:00^Iswkw0ktmns9s3bb@marketplace.gmail.com^IBrent DON^I+3 333-432-3333 ext. 238442^ISTZZ332^IRescue - 3 Year Data Recovery Plan for External Hard Drives^I3^IUSD^I34.22^I3.24^I0.00^I0.00^IFreeEconomy^IJoseph SMITH Dalley^I344449 SH 26^ISuite 3433^I^IGrapevine^ITX^I446093^IUS^I+3 333-432-3333 ext. 238442^I0.00^I^I0.00^I^I^I^I^IPlease put the item in the lockers provided. I can allow you access if you call https://drive.google.com/file/d/3ms-tsDN308jqDhEXa6zwI0t309xSMXKb/view usp drivesdk n nI have lots and lots of deliveries and you are the only driver that has an issue w access^I^Ifalse^I^I $