awk with tab delimeter

bmk123 · January 8, 2019, 5:27am

Hi Team

below test file contains tab delimeter file and i am excepting the number of files 3.

File : test.txt

a	b	c

awk -vFPAT='\t' -vOFS="\t" -v a="0" -v b="10"  ' NR>a {if (NF != b ) print NR"@"NF }' test.txt

current output is
1@2
required output is
1@3

Cloud you please help me on this one.

RavinderSingh13 · January 8, 2019, 5:50am

Hello bmk123,

Could you please try following.

awk -F'[[:space:]]+' '{for(i=1;i<=NF;i++){if($i!="b"){val=val?val OFS i}};print val;val=""}'  OFS="@" Input_file

Thanks,
R. Singh

bmk123 · January 8, 2019, 5:56am

Thanks singh.
i don't want the change the code agian and i need to implemnt in same code

awk -vFPAT='([^,]*)|("[^"]+")| [[\t]] ' -vOFS="\t" -v a="0" -v b="10"  ' NR>a {if (NF != b ) print NR"@"NF }' test.txt

i tried above on it is not working.

Scrutinizer · January 8, 2019, 11:12am

@OP You are using

-vFPAT='\t'

, which means that the fields are single TAB characters and the fields are separated by anything else.
If I change -vFPAT='\t' to -vFS='\t' then I get:

1@3

-vFS='\t' can be shortened to -F'\t' , so it would become:

awk -F'\t' -v a="0" -v b="10"  ' NR>a {if (NF != b ) print NR"@"NF }' test.txt

or

awk -F'\t' -v a=0 -v b=10  'NR>a && NF!=b {print NR"@"NF}' test.txt

bmk123 · January 10, 2019, 2:05am

thanks Scrutinizer,
i tried as per your comment and it is not working.

awk -vFS='([^,]*)|("[^"]+")| [[\t]] ' -vOFS="\t" -v a="0" -v b="10"  ' NR>a {if (NF != b ) print NR"@"NF }' test.txt

sadique.manzar · January 10, 2019, 6:48am

Dear,
Please try below :

awk -F '[\t]' -v a="0" -v b="10"  ' NR>a {if (NF != b ) print NR"@"NF }' test.txt

durden_tyler · January 10, 2019, 5:21pm

That's because you are using a regular expression for FS that works differently than you (probably) think it does.
This part:

[^,]*

matches "0 or more occurrences of a character other than comma".
So, let's say I have this as my data:

a,b,c

Then each one of "a", "b" and "c" matches "0 or more occurrences of a character other than comma".
Hence awk considers each one of "a", "b" and "c" as the delimiter and the rest (i.e. commas) as the data tokens.
So it sees this:

data       delimiter      data       delimiter      data      delimiter     data
NULL       a                ,            b           ,           c          NULL

which explains the output below:

$
$ printf "a,b,c\n" | awk -vFS='([^,]*)' '{ print NF; print "{"$0"}"; for (i=1; i<=NF; i++){print "{"$i"}"} }'
4
{a,b,c}
{}
{,}
{,}
{}
$
$

If you have tabs instead of commas, then the entire string is a "delimiter" and it is sandwiched between two "data" tokens that are both NULL, as seen below:

$
$ printf "a\tb\tc\n" | awk -vFS='([^,]*)' '{ print NF; print "{"$0"}"; for (i=1; i<=NF; i++){print "{"$i"}"} }'
2
{a      b       c}
{}
{}
$
$

For the regular expression:

[[\t]]

awk considers the bold brackets as the character class in the following:

[[\t]]

Hence it processes both of the following as delimiters:

[]
\t]

You can test this by running the following one-liner:

$
$ printf "aaaa\t]bbbb[]cccc\n" | awk -vFS='[[\t]]' '{ print NF; print "{"$0"}"; for (i=1; i<=NF; i++){print "{"$i"}"} }'
3
{aaaa   ]bbbb[]cccc}
{aaaa}
{bbbb}
{cccc}
$
$

If you can explain what exactly you want as the delimiter, then maybe we can help you come up with a better regular expression.

bmk123 · January 11, 2019, 1:49am

Thanks all. i have handled tab delimiter independent.