Stripping unwanted characters in field

dagamier · July 8, 2015, 4:14pm

I wrote myself a small little shell script to clean up a file I have issues with. In particular, I am stripping down a fully qualified host/domain name to just the hostname itself. The script works, but from a performance standpoint, it's not very fast and I will be working with large data sets.

Here is a sample dataset:

Field1|hostname|field3|field4.....
Field1|hostname.f.q.d.n|field3|field4......

My code is below:

while read LINE
do
        CUST=`echo $LINE | cut -d\| -f1`
        SERVER=`echo $LINE | cut -d\| -f2 | sed 's/\..[^.]*//g'`
        REST=`echo $LINE | cut -d\| -f3-`
        echo "$CUST|$SERVER|$REST" >> tmp1
        mv tmp1 $1
done < $1

As you can see, not an elegant solution, but it creates the wanted output (strip FQDN from field 2). My awk is a bit rusty and my perl is basic. If someone has a faster, cleaner way of doing this, i'm all ears.

Aia · July 8, 2015, 4:31pm

while IFS='|' read f1 f2 rest; do
    echo "$f1|${f2%%.*}|$rest"
done < $1

prog.sh dataset.txt > saved_result.txt

Once that saved_result.txt is what you want, you can rename it.

RudiC · July 8, 2015, 4:39pm

Try also

awk '{sub(/\..*$/,"",$2)}1' FS=\| OFS=\| file

dagamier · July 8, 2015, 4:45pm

Both solutions worked, but your solution RudiC gave me the "just a few seconds" type of result I was looking for.

Thanks to you as well Aia as you taught me a different way to deal with "while read" loops that is cleaner than my old way.

Aia · July 8, 2015, 7:06pm

perl -pe 's/(|\w+)\.[^|]*/$1/' dataset.txt