I am trying to obtain count of characters using awk, but "length" function returns a value of 1 for 2-byte or 3-byte characters as well unlike wc -c command.
I have tried to use the below commands within awk function, but it does not seem to work
I need the byte count in a line (ignoring the newline character). Currently wc -c is giving me the byte count and length function within awk is giving me the character count.
The description of your problem is extremely confusing. UTF-8, UTF-16, and UTF-32 are completely different character sets and if you have a single file that contains characters from all three, determining which bytes in that file represent a <newline> character may be impossible unless you can clearly describe byte offsets in your file where there are shifts from one codeset to another and clearly describe how any program reading this file can determine what codeset is in use for any particular byte in that file.
If you are reading a file that is entirely encoded in UTF-8 (in which characters can be encoded with one to six bytes), you could tell your script that the UTF-8 input file was instead a file encoded in ISO 8859-1 (in which all characters are one byte) and count characters in lines in awk using the length() function since the <newline> character is encoded the same way in both codesets.
But, since you haven't described what the rest of your awk program is doing, we have no way to guess at whether or not this option might work for you and no way to guess if there might be other options.
#!/bin/bash
# blc.sh '/full/path/to/filename'
linenumber=1
while read -s -r line
do
length=$( echo "$line" | wc -c )
echo "Line $linenumber length less newline = $(($length-1))..."
linenumber=$((linenumber+1))
done < $1
3) Fit this inside an awk script.
#!/bin/sh
# blc_awk.sh /full/path/to/filename
# Note there must be a whitespace after blc.sh.
awk -v arg="$1" 'BEGIN { system ( "~/Desktop/Code/Shell/blc.sh "arg ) }'
Run in this case as blc_awk.sh /tmp/text
(In this example my blc.sh sits in the directory shown in the script just change yours to suit.)
Hope this is useful...
Results:-
Last login: Tue Dec 1 21:08:13 on ttys000
AMIGA:barrywalker~> cd Desktop/Code/Shell
AMIGA:barrywalker~/Desktop/Code/Shell> echo "���������asdfh����
> ����������������AJHKK����������
> �����������������Ԉ�1234567�����������" > /tmp/text
AMIGA:barrywalker~/Desktop/Code/Shell> ./blc.sh /tmp/text
Line 1 length less newline = 62...
Line 2 length less newline = 94...
Line 3 length less newline = 111...
AMIGA:barrywalker~/Desktop/Code/Shell> ./blc_awk.sh /tmp/text
Line 1 length less newline = 62...
Line 2 length less newline = 94...
Line 3 length less newline = 111...
AMIGA:barrywalker~/Desktop/Code/Shell> _