awk and substr

Hello All;

I have an input file 'abc.txt' with below text:

512345977,213458,100021
512345978,213454,100031
512345979,213452,100051
512345980,213455,100061
512345981,213456,100071
512345982,213456,100091
512345983,213457,100041
512345984,213451,100011

I need to paste the first field and the third field in different files depending on the last two digits of first field.

--> If last two digit of first field in between 0 and 15
then the output will be '

General,<Field1>,123,0,<Field3>,0

in File1.txt
--> If last two digit of first field in between 16 and 25
then the output will be '

General,<Field1>,123,0,<Field3>,0

in File2.txt
--> If last two digit of first field in between 26 and 40
then the output will be '

General,<Field1>,123,0,<Field3>,0

in File3.txt
and so on...

Now, I can do it using awk command followed by multiple if loops - but it will take a lot of processing time as the input file is huge with millions of records.

So, how can i do it using 1 single awk command?:confused:

Please use code tags as required by forum rules!

And, show us your attempts.

---------- Post updated at 11:35 ---------- Previous update was at 11:33 ----------

There's no match for your criteria in ALL first fields of your sample input (assumed the field separator is , ).

Hello mystition,

Could you please try following and let me know if this works for you, though I haven't seen anything matching from your shown Input_file here.

awk -F, '{len=length($1);Q=substr($1,len-1);if(Q>=0 && Q<=15){print "General" OFS $1 OFS "123,0" OFS $3 OFS "0" >> "File1.txt"};if(Q>=16 && Q<=25){print "General" OFS $1 OFS "123,0" OFS $3 OFS "0" >> "File2.txt"};if(Q>=26 && Q<=40){print "General" OFS $1 OFS "123,0" OFS $3 OFS "0" >> "File3.txt"}}'  Input_file

As your Input_file is not having any of the content matching to your requirement so I couldn't test it, if your requirement is not matching to above code please show us more Input_file with expected output too.
EDIT: Adding a non-one liner form for above solution as follows.

awk -F, '{
                len=length($1);
                Q=substr($1,len-1);
                if(Q>=0 && Q<=15){
                                        print "General" OFS $1 OFS "123,0" OFS $3 OFS "0" >> "File1.txt"
                                 };
                if(Q>=16 && Q<=25){
                                        print "General" OFS $1 OFS "123,0" OFS $3 OFS "0" >> "File2.txt"
                                 };
                if(Q>=26 && Q<=40){
                                        print "General" OFS $1 OFS "123,0" OFS $3 OFS "0" >> "File3.txt"
                                  }
         }
        '   Input_file
 

Thanks,
R. Singh

You say "and so on..." after specifying the matching criteria, which leads us to believe that a similar logic applies for numbers greater than 40 (and less than 99 I assume). However, I can't see what increment pattern is being applied here as the first match is a range of 16 numbers (including 0), the second being 10 numbers and the third being 15 numbers. Can you please either confirm all ranges up till 99 (if applicable) or confirm that only these 3 ranges apply?

I added few records to your file just to test command that is as follows

[akshay@localhost tmp]$ cat abc.txt
512345977,213458,100021
512345978,213454,100031
512345979,213452,100051
512345980,213455,100061
512345981,213456,100071
512345982,213456,100091
512345983,213457,100041
512345984,213451,100011
512345910,213451,100011
512345909,213451,100011
512345917,213451,100011
512345922,213451,100011
512345927,213451,100011
512345939,213451,100011
[akshay@localhost tmp]$ cat map.txt
0,15,file1.txt
16,25,file2.txt
26,40,file3.txt

One liner command as you expected :slight_smile:

[akshay@localhost tmp]$ awk -F, 'FNR==NR{a[$3]=$1 FS $2;next}{for(i in a){c=substr($1,length($1)-1);split(a,d); if(c>=d[1]&& c<=d[2])printf("General,%s,123,0,%s,0\n",$1,$3)>i }}' map.txt abc.txt 

Output

file1.txt

[akshay@localhost tmp]$ cat file1.txt 
General,512345910,123,0,100011,0
General,512345909,123,0,100011,0

file2.txt

[akshay@localhost tmp]$ cat file2.txt 
General,512345917,123,0,100011,0
General,512345922,123,0,100011,0

file3.txt

[akshay@localhost tmp]$ cat file3.txt 
General,512345927,123,0,100011,0
General,512345939,123,0,100011,0

Readable version

awk -F, '
      # Read map.txt set array a where index of array being column3, and element being
      # column1 and column2 separated by input field separator
      # FNR==NR is true only when awk reads first file
      FNR==NR{
                   a[$3]=$1 FS $2
                   next
             }

      # Read second file abc.txt
             {
                     # Loop through array elements
                     for(i in a)
                     {
                           # Extract last 2 digits from column1 of current line read
                           c=substr($1,length($1)-1);
                      
                           # split array value a into array d where separator being FS
                           # where d[1] is min value, d[2] is max value
                           split(a,d) 

                           # if last 2 digits lies withing range, write to file i 
                           if(c>=d[1]&& c<=d[2])
                                  printf("General,%s,123,0,%s,0\n",$1,$3) > i 
                     }
              }
        ' map.txt abc.txt 

In current context close(file) that is close(i) is not incorporated, you should make some provision to close file,
if you are writing to many files, as you may receive makes too many open files error.

Sorry for being abstract. I have the ranges with different lengths to imply that there can be 'n' number of ranges - which I can specify manually. I need help with the logic to do so.

---------- Post updated at 05:12 PM ---------- Previous update was at 05:08 PM ----------

I am a member since past 7 years - for sure it's not a home-work. I did not post the attempts as I am working on a script after several years and getting some syntax errors.

Sorry for missing on the Code tags. I tried using NP - but it did not do the job.

---------- Post updated at 05:21 PM ---------- Previous update was at 05:12 PM ----------

Thanks Mr Singh,

I tried your code, but it's not working as expected...

If I use the first part itself:

awk -F, '{
                len=length($1);
                Q=substr($1,len-1);
                if(Q>=0 && Q<=07){
print "General" OFS $1 OFS "123,0" OFS $3 OFS "0" >> "File1.txt"}}' abc.txt

But somehow it is even taking values other than 00 to 07

$ cat File1.txt
General 512345700 123,0 10001 0
General 512345701 123,0 10001 0
General 512345702 123,0 10001 0
General 512345703 123,0 10001 0
General 512345704 123,0 10001 0
General 512345705 123,0 10001 0
General 512345706 123,0 10001 0
General 512345707 123,0 10001 0
General 512345708 123,0 10001 0
General 512345709 123,0 10001 0
General 512345710 123,0 10001 0
General 512345711 123,0 10001 0
General 512345712 123,0 10001 0
General 512345713 123,0 10001 0
General 512345714 123,0 10001 0
General 512345715 123,0 10001 0
General 512345716 123,0 10001 0
General 512345717 123,0 10001 0
General 512345718 123,0 10001 0
General 512345719 123,0 10001 0
awk -F, '
T = (T = $1%100)<16?1:T<26?2:T<41?3:""  {print "General", $1, 123, 0, $3, 0  > ("FILE" T ".txt")}
' OFS="," file
1 Like

I am trying to understand your code - its a really good piece of work.

But I am getting an error.

awk -F, 'FNR==NR{a[$3]=$1 FS $2;next}{for(i in a){c=substr($1,length($1)-1);split(a,d); if(c>=d[1]&& c<=d[2])printf("GENERAL,%s,,,,,%s,,,,,0\n",$1,$3)>i }}' map.txt abc.txt

awk: cmd. line:1: (FILENAME=abc.txt FNR=31) fatal: expression for `>' redirection has null string value

Sharing the first few lines of my sample input file abc.txt

512345678,200001,10234
512345679,200001,10234
512345680,200001,10234
512345681,200001,10234
512345682,200001,10234
512345683,200001,10234
512345684,200001,10234
512345685,200001,10234
512345686,200001,10234
512345687,200001,10234
512345688,200001,10234
512345689,200001,10234
512345690,200001,10234
512345691,200001,10234
512345692,200001,10234
512345693,200001,10234
512345694,200001,10234
512345695,213456,10001
512345696,213456,10001
512345697,213456,10001
512345698,213456,10001
512345699,213456,10001
512345700,213456,10001
512345701,213456,10001
512345702,213456,10001
512345703,213456,10001
512345704,213456,10001
512345705,213456,10001
512345706,213456,10001
512345707,213456,10001
512345708,213456,10001
512345709,213456,10001
512345710,213456,10001
512345711,213456,10001
512345712,213456,10001
512345713,213456,10001
512345714,213456,10001
512345715,213456,10001
512345716,213456,10001
512345717,213456,10001
512345718,213456,10001
512345719,213456,10001
512345720,213456,10001
512345721,213456,10001
512345722,213456,10001
512345723,213456,10001
512345724,213456,10001
512345725,213456,10001
512345726,213456,10001
512345727,213456,10001
512345728,213456,10001
512345729,213456,10001
512345730,213456,10001
512345731,213456,10001
512345732,213456,10001
512345733,213456,10001
512345734,213456,10001
512345735,213456,10001
512345736,213456,10001
512345737,213456,10001
512345738,213456,10001
512345739,213456,10001
512345740,213456,10001
512345741,213456,10001
512345742,213456,10001
512345743,213456,10001
512345744,213456,10001
512345745,213456,10001
512345746,213456,10001
512345747,213456,10001
512345748,213456,10001
512345749,213456,10001
512345750,213456,10001
512345751,213456,10001
512345752,213456,10001
512345753,213456,10001
512345754,213456,10001
512345755,213456,10001
512345756,213456,10001
512345757,213456,10001
512345758,213456,10001
512345759,213456,10001
512345760,213456,10001
512345761,213456,10001
512345762,213456,10001
512345763,213456,10001
512345764,213456,10001
512345765,213456,10001
512345766,213456,10001
512345767,213456,10001
512345768,213456,10001
512345769,213456,10001
512345770,213456,10001
512345771,213456,10001
512345772,213456,10001
512345773,213456,10001
512345774,213456,10001
512345775,213456,10001
512345776,213456,10001
512345777,213456,10001
512345778,213456,10001
512345779,213456,10001
512345780,213456,10001
512345781,213456,10001
512345782,213456,10001
512345783,213456,10001
512345784,213456,10001
512345785,213456,10001
512345786,213456,10001
512345787,213456,10001
512345788,213456,10001
512345789,213456,10001
512345790,213456,10001
512345791,213456,10001
512345792,213456,10001
512345793,213456,10001
512345794,213456,10001
512345795,213456,10001
512345796,213456,10001
512345797,213456,10001
512345798,213456,10001
512345799,213456,10001
512345800,213456,10001

Basically, it is giving an issue when the last two digits of input Field1 are starting to get repeated. I am going to prepare a more randomized file to try it now.

fatal: expression for `>' redirection has null string value

above error occurs if variable i is not set or defined

For example : see below

No error since variable is defined

[akshay@localhost tmp]$ awk 'BEGIN{s=123; print 12345>s}'
[akshay@localhost tmp]$ cat 123 
12345

Error since variable s is not defined

[akshay@localhost tmp]$ awk 'BEGIN{print 12345>s}'
awk: fatal: expression for `>' redirection has null string value

Please check map.txt file,

I get result without any error like this

[akshay@localhost tmp]$ awk -F, 'FNR==NR{a[$3]=$1 FS $2;next}{for(i in a){c=substr($1,length($1)-1);split(a,d); if(c>=d[1]&& c<=d[2])printf("GENERAL,%s,,,,,%s,,,,,0\n",$1,$3)>i }}' map.txt abc.txt

[akshay@localhost tmp]$ for i in file*.txt; do echo $i; cat $i; done
file1.txt
GENERAL,512345700,,,,,10001,,,,,0
GENERAL,512345701,,,,,10001,,,,,0
GENERAL,512345702,,,,,10001,,,,,0
GENERAL,512345703,,,,,10001,,,,,0
GENERAL,512345704,,,,,10001,,,,,0
GENERAL,512345705,,,,,10001,,,,,0
GENERAL,512345706,,,,,10001,,,,,0
GENERAL,512345707,,,,,10001,,,,,0
GENERAL,512345708,,,,,10001,,,,,0
GENERAL,512345709,,,,,10001,,,,,0
GENERAL,512345710,,,,,10001,,,,,0
GENERAL,512345711,,,,,10001,,,,,0
GENERAL,512345712,,,,,10001,,,,,0
GENERAL,512345713,,,,,10001,,,,,0
GENERAL,512345714,,,,,10001,,,,,0
GENERAL,512345715,,,,,10001,,,,,0
GENERAL,512345800,,,,,10001,,,,,0
file2.txt
GENERAL,512345716,,,,,10001,,,,,0
GENERAL,512345717,,,,,10001,,,,,0
GENERAL,512345718,,,,,10001,,,,,0
GENERAL,512345719,,,,,10001,,,,,0
GENERAL,512345720,,,,,10001,,,,,0
GENERAL,512345721,,,,,10001,,,,,0
GENERAL,512345722,,,,,10001,,,,,0
GENERAL,512345723,,,,,10001,,,,,0
GENERAL,512345724,,,,,10001,,,,,0
GENERAL,512345725,,,,,10001,,,,,0
file3.txt
GENERAL,512345726,,,,,10001,,,,,0
GENERAL,512345727,,,,,10001,,,,,0
GENERAL,512345728,,,,,10001,,,,,0
GENERAL,512345729,,,,,10001,,,,,0
GENERAL,512345730,,,,,10001,,,,,0
GENERAL,512345731,,,,,10001,,,,,0
GENERAL,512345732,,,,,10001,,,,,0
GENERAL,512345733,,,,,10001,,,,,0
GENERAL,512345734,,,,,10001,,,,,0
GENERAL,512345735,,,,,10001,,,,,0
GENERAL,512345736,,,,,10001,,,,,0
GENERAL,512345737,,,,,10001,,,,,0
GENERAL,512345738,,,,,10001,,,,,0
GENERAL,512345739,,,,,10001,,,,,0
GENERAL,512345740,,,,,10001,,,,,0

Works like magic. I only edited the range and it worked smoothly for me.

Can you please explain the code?:b:

----------------------------

My understanding is that you are dividing the Field1 by 100 to get the remainder value, and mapping it to a file-number accordingly.
But, What does the '?' mark do in this case?

% is not a division operator, but the modulo or remainder operator, so the result of above are the last two digits.
expr?expr:expr is the conditional operator; it evaluates the first expression, and, if TRUE, executes the second else the third. Above actually is three condops stacked.