Need advise/tip if there is more efficient way of doing this cut/paste/awk after changing a field

Hi,

This is the script currently and it is working as required. Just thought maybe there is a better or easier way of doing what I am trying to do.

$ cat x.ksh
#!/bin/ksh
#

cut -d"|" -f1 x.txt > x1.txt
cut -d"|" -f2 x.txt | awk -F"=" '{ print "USER="tolower($2) }' > x2.txt
cut -d"|" -f3- x.txt > x3.txt

paste -d "|" x1.txt x2.txt x3.txt | sort | uniq > x4.txt

cat x.txt
echo
cat x4.txt
echo

Below is an excerpt of the file that I want to change. This is x.txt, the original file that I want to run this on is about 1000+ lines. Basically, these files are from several log files merged into one and I am wanting to change the USER=<username> field so that <username> is in lower case. I am working on the assumption that USERNAME=<username> is always field2.

PROGRAM=JDBC Thin Client|USER=MICKEY|HOST=11.123.12.123|testmachine.xyz.com.zz
PROGRAM=JDBC Thin Client|USER=mickey|HOST=11.123.12.123|testmachine.xyz.com.zz

Sample run of the script below:

$ ./x.ksh
PROGRAM=JDBC Thin Client|USER=MICKEY|HOST=11.123.12.123|testmachine.xyz.com.zz
PROGRAM=JDBC Thin Client|USER=mickey|HOST=11.123.12.123|testmachine.xyz.com.zz

PROGRAM=JDBC Thin Client|USER=mickey|HOST=11.123.12.123|testmachine.xyz.com.zz

I could simply do

sort x.txt | tr [:upper:] [:lower:] | uniq

but for 'clarity' I prefer to only change USER=<username> to USER=<lowercase_username> and leave the rest of the line as it is. I can't work out the awk or sed command options to use to achieve what I wanted, hence I ended up with a shell script instead. Maybe there is an awk one-liner that can do what I am trying to achieve :o

Please advise. Thanks in advance.

Try

$ awk -F\| '{split ($2, T, "="); $2 = T[1] "=" tolower(T[2])} !a[$0]++' OFS=\| file
PROGRAM=JDBC Thin Client|USER=mickey|HOST=11.123.12.123|testmachine.xyz.com.zz

It converts just the username to lower case, and prints out only the first occurrence of the resulting line. No sort done.

1 Like

Hi RudiC

Thanks the one liner works like a charm. I won't be able to figure out for ages that this will do what I want :o It doesn't even requires any intermediate files.

awk -F\| '{split ($2, T, "="); $2 = T[1] "=" tolower(T[2])} !a[$0]++' OFS=\| file

Trying real hard to understand what's happening though.

{split ($2, T, "=")

this splits $2 and assign it T I believe and then some more processing happens.

Then

$2 = T[1] "=" tolower(T[2])} !a[$0]++

change $2 to be

USER=<lower_case_of_username_string>

if it is not a[$0]? Is that what

!a[$0]++

means and this is the one that prevents the duplicates? Do I understand it correctly?

Yes, $2 ( USER=MICKEY ) is split at the equals sign into array T , and then rebuilt with the lower case username in T[2] .

The !a[$0]++ is a trick (independent of $2 ): a[$0] evaluates to FALSE if it equals zero or doesn't exist or is created on first reference, so its negation is TRUE and triggers the default action: print . Then it is "post incremented" and will never (OK, not until reaching / crossing MAX_INT) trigger again. So any further occurrences of $0 are suppressed.

A more detailed explanation follows.
The main awk code runs for each input line.
!a[$0]++ is ultra-condensed, quick and dirty.
A bit more explicit is !($0 in A) { A[$0]; print } :
If not $0 in array A (A[$0] not defined) then define A[$0] (no A[$0]=value needed here) and print $0.
The array A is associative (string-addressed). So if the same $0 will occur in another input line it will see a defined A[$0] and won't print.
If there is a pre-condition and no { action code } following then the default for a true condition is { print } , and print without arguments defaults to print $0 .

Now to the quick and dirty !A[$0]++ :
Define A[$0] with value 0 if undefined, if the negated value is non-zero (true) then default-print. Also post-increment A[$0].
If the same $0 will occur then the A[$0] value will be 1, negated 0 (false), won't print, but post-incremented.
If the same $0 will occur then the A[$0] value will be 2, negated 0 (false), won't print, but post-incremented.
...

2 Likes