To search distinct special char in file

rohit_shinez · September 3, 2013, 1:37pm

hi,

i need to search distinct special character from a file which is a pipe delimited from a specific column

for eg:

input file.txt
aa|bb|cc|$abc
aa|bb|ccc|#abol
bb|xss|ddd|$xyz
nn|yyy|qqq|=qqqq
abe|qqq|yyy|=aaa
aaa|yyy|zzzz|#aaaa
.
.
.


my desired output
$
#
=

i know which column will have the special character for eg here col is 4th column

shamrock · September 3, 2013, 2:37pm

Which distinct characters are you looking for...

learnbash · September 3, 2013, 2:41pm

I think below command give you idea.

cut -f4 -d'|' filename | cut -c1

For uniqness.

cut -f4 -d'|' filename | cut -c1 | sort -u

Don_Cragun · September 3, 2013, 3:10pm

If what you want is the 1st occurrence of the 1st character in field 4 in your file, the following should be a little more efficient than learnbash's suggestion:

awk -F'|' '!((c = substr($4,1,1)) in s) {s[c];print c}' file.txt

Otherwise, as shamrock said, we need to know what you mean by special, where in field 4 the special character can appear, and whether there can be more than one special character in field 4 in any line in your input file.

If you want to try the above awk script on a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of awk .

rohit_shinez · September 3, 2013, 3:17pm

Hi,

i would like to take the distinct special char in 4th field where in it can occur in 4th filed any where

for my fourth fieldeg:
$aab$c#
$$abc.c=
.
.
.

output should be distinct special char like (!@#$%^&*()_",: other than numbers and characters

$#.=

RudiC · September 3, 2013, 3:36pm

Try this for non alnum chars in field 4:

sed -r 's/([^|]*\|){3}//;s/[[:alnum:]]*$//' file
$
#
$
=
=
#

EDIT: Or, if any position in field 4 is possible:

sed -r 's/([^|]*\|){3}//;s/^[[:alnum:]]*//;s/[[:alnum:]]*$//' file
$
#
$
=
=
#

MadeInGermany · September 3, 2013, 3:40pm

Like Don's sample, this runs on | separated field #4,
but this sample deletes the characters in the character set [a-zA-Z0-9]

awk -F"|" '{gsub("[a-zA-Z0-9]","",$4); print $4}' file

It is also possible to reverse the character set by a leading ^ followed by the special characters:

awk -F"|" '{gsub("[^-=!@#$%^&*()_]","",$4); print $4}' file

NB a - in the character set should be first, otherwise it would be interpreted as a range...

ahamed101 · September 3, 2013, 3:46pm

awk -F\| '{gsub(/[a-zA-Z0-9]/,"",$4)}!c[$4]++{print $4}' infile

--ahamed

rohit_shinez · September 3, 2013, 3:53pm

hi,

i am getting
sed: illegal option -- r
error if u use

sed -r 's/([^|]*\|){26}//;s/^[[:alnum:]]*//;s/[[:alnum:]]*$//' file.txt

Don_Cragun · September 3, 2013, 4:02pm

With the following input file:
file.txt

aa|bb|cc|$aab$c#
aa|XX|ZZ|NothingSpecialHere0123456789
aa|bb|ccc|$$abc.c=
bb|xss|ddd|!xyz
nn|yyy|qqq|qq qq
abe|qqq|yyy|=a,(){}aa
aaa|yyy|zzzz|#aaaa

The awk script:

awk -F'|' '
{       gsub(/[[:alnum:]]/, "", $4)
        for(i = 1; i <= length($4); i++) {
                if((c = substr($4, i, 1)) in s) continue
                s[c]
                printf("%s", c)
        }
}
END {   print ""
}' file.txt

seems to produce the unique set of non-numeric, non-alphabetic characters found in any position in field 4, producing the output in the format shown in message #5 in this thread:

$#.=! ,(){}

ahamed101 · September 3, 2013, 4:24pm

Another one

awk -F\| '{gsub(/[[:alnum:]]| /,"",$4)}$4{gsub(".","&\n",$4);print $4}' infile | sort -u

You can strip off the new lines from the output

--ahamed