How to put a word starting at particular position in a file using shell scripting

subhrap.das · November 26, 2009, 1:46pm

Hi all,
I'm new to shell scripting and hence this query.
I have 2 files. temp.txt and config.txt.
The values in temp.txt are tab separated.

ex: temp.txt

AB   CDE  GHIJ    OPQRS   WXY

ex:config.txt (1st line for 1st element of temp.txt and so on)

start = '1' end='5'
start = '6' end = '10'
start= '11' end ='15'

and so on....

i have to create an output such that each word of temp.txt is inserted into a new file main.txt with starting position as specified in config.txt.
like position 1-5 shud have the 1st word i.e 1st position 'A' 2nd position 'B' and position 3-5 shud be blank. C shud start from position 6.

radoulov · November 26, 2009, 3:50pm

awk > main.txt 'NR == FNR { 
  n = split($0, t, "\047")
  fmt[NR] = t[4]; next 
  }
{ 
  for (i=1; i<=NF; i++) 
    printf "%-*s", fmt, $i
  print ""    
  }' config.txt temp.txt

summer_cherry · November 26, 2009, 9:54pm

my $str= "AB   CDE  GHIJ    OPQRS   WXY";
my @tmp = split(/\s+/,$str);
while(<DATA>){
	if(/start\s*=\s*'([0-9]+)'\s*end\s*=\s*'([0-9]+)'/){
		my $len = $2-$1;
		printf "%-$len"."s",$tmp[$.-1];
	}
}
__DATA__
start = '1' end='5'
start = '6' end = '10'
start= '11' end ='18'
start= '19' end ='25'
start= '26' end ='28'

subhrap.das · November 27, 2009, 1:52am

the o/p is as follows

A    AB        ABC            ABDC

From the 1st and 2nd word are at correct position but 3rd word onwards the logic goes for a toss.

radoulov · November 27, 2009, 4:21am

Yes,
the code is wrong.

Try this instead:

awk > main.txt 'NR == FNR { 
  split($0, t, "\047")
  fmt[NR] = t[4] - (t[2] - 1); next
  }
{ 
  for (i=1; i<=NF; i++) 
    printf "%-*s", fmt, $i 
  print x
  }' config.txt temp.txt

subhrap.das · November 27, 2009, 4:59am

radoulov:

Yes,
the code is wrong.

Try this instead:

awk > main.txt 'NR == FNR { 
  split($0, t, "\047")
  fmt[NR] = t[4] - (t[2] - 1); next
  }
{ 
  for (i=1; i<=NF; i++) 
   printf "%-*s", fmt, $i 
  print x
  }' config.txt temp.txt

Works like charm radoulov but a slight change in the requirement
The config file now looks like this:

{
            'name'  :   'field1',
            'type'  :   'input',
            'spos'  :   1,
            'size'  :   2,
        },
        {
            'name'  :   'field2',
            'type'  :   'input',
            'spos'  :   3,
            'size'  :   1,
        },
        {
            'name'  :   'field3',
            'type'  :   'input',
            'spos'  :   4,
            'size'  :   8,
        },

and the values have to be placed based on 'SPOS'(starting position).
Please help me with this.

radoulov · November 27, 2009, 5:49am

Something like this:

awk 'NR == FNR {
  /^[ \t]*\47name/ && c++      # get the field number
  if (/^[ \t]*\47size/) {
    split($0, t, ":")          
    gsub(/[ \t\47,]/, x, t[2]) # strip punctuation
    fmt[c] = t[2]              # get the size 
    }
  next                         # run the above actions                                     
  }                            # + only for the first input file
{ 
  for (i=1; i<=NF; i++)        # output the strings in the correct format
    printf "%" (length($i) > fmt ? "." : "-" ) fmt "s", $i           
  print x
  }' config.txt temp.txt

subhrap.das · November 27, 2009, 6:33am

radoulov:

Something like this:

awk 'NR == FNR {
  /^[ \t]*\47name/ && c++      # get the field number
  if (/^[ \t]*\47size/) {
   split($0, t, ":")          
   gsub(/[ \t\47,]/, x, t[2]) # strip punctuation
   fmt[c] = t[2]              # get the size 
   }
  next                         # run the above actions                                     
  }                            # + only for the first input file
{ 
  for (i=1; i<=NF; i++)        # output the strings in the correct format
   printf "%" (length($i) > fmt ? "." : "-" ) fmt "s", $i           
  print x
  }' config.txt temp.txt

Needs some small modification i guess.
The input temp.txt

1       12      123     1234    1
A       AB      ABC     ABDC    e

config.txt

{
                        'name'  :       'Field1',
                        'type'  :       'input',
                        'spos'  :       1,
                        'size'  :       2,
                },
                {
                        'name'  :       'Field2',
                        'type'  :       'input',
                        'spos'  :       3,
                        'size'  :       2,
                },
                {
                        'name'  :       'Field3',
                        'type'  :       'input',
                        'spos'  :       5,
                        'size'  :       8,
                },
                {
                        'name'  :       'Field4',
                        'type'  :       'input',
                        'spos'  :       13,
                        'size'  :       11,
                },
                {
                        'name'  :       'Field5',
                        'type'  :       'input',
                        'spos'  :       24,
                        'size'  :       11,
                },

OUTPUT is as follows

1 12123     12341
A ABABC     ABDCe

Expected is

1 12123     1234       1
A ABABC     ABDC       e

radoulov · November 27, 2009, 6:47am

What version of AWK/OS you're using? I get the expected result (I added cat -e to show you the line endings):

% cat config.txt
{
                        'name'  :       'Field1',
                        'type'  :       'input',
                        'spos'  :       1,
                        'size'  :       2,
                },
                {
                        'name'  :       'Field2',
                        'type'  :       'input',
                        'spos'  :       3,
                        'size'  :       2,
                },
                {
                        'name'  :       'Field3',
                        'type'  :       'input',
                        'spos'  :       5,
                        'size'  :       8,
                },
                {
                        'name'  :       'Field4',
                        'type'  :       'input',
                        'spos'  :       13,
                        'size'  :       11,
                },
                {
                        'name'  :       'Field5',
                        'type'  :       'input',
                        'spos'  :       24,
                        'size'  :       11,
                },
% cat temp.txt 
1       12      123     1234    1
A       AB      ABC     ABDC    e
% awk 'NR == FNR {
  /^[ \t]*\47name/ && c++      # get the field number
  if (/^[ \t]*\47size/) {
    split($0, t, ":")
    gsub(/[ \t\47,]/, x, t[2]) # strip punctuation
    fmt[c] = t[2]              # get the size
    }
  next                         # run the above actions
  }                            # + only for the first input file
{
  for (i=1; i<=NF; i++)        # output the strings in the correct format
    printf "%" (length($i) > fmt ? "." : "-" ) fmt "s", $i
  print x
  }' config.txt temp.txt|cat -e
1 12123     1234       1          $
A ABABC     ABDC       e          $

subhrap.das · November 27, 2009, 7:08am

You are correct. Its working on AIX Version 5 (IBM machine)
but not on HP-UX.
Can i please have a machine independent code?

Edited by radoulov: Sorry, I edited your code by mistake, see my answer below.

radoulov · November 27, 2009, 7:24am

Try nawk instead of awk on HP-UX.

Try to set explicitly the field separator:

awk -F'\t' ...

subhrap.das · November 30, 2009, 6:32am

Hi Radalouv,
Your query works in all scenarios but only in AIX.
nawk doesnt work in HP-UX as well.

Any help in this regard will be great.
Thanks for all your help.

radoulov · November 30, 2009, 6:35am

Cuold you please post (copy/paste) the real input data (only if different from what you already provided), the exact command that you' re executing on HP-UX and the output you' re getting?

subhrap.das · November 30, 2009, 6:41am

This is the real input and output i'm trying rt now.

no change whatever i had pasted earlier.

Example 1:
I/p is

1       12      123     12 34   1
A       AB      ABC     ABDC    e

OUTPUT in HP-UX is as follows 

1 12123     12341
A ABABC     ABDCe


Expected/IBM output is  

1 12123     1234       1
A ABABC     ABDC       e

Example 2:

I/p is

1               123     12 34   1
A       AB      ABC     ABDC    e
cat -t tab.txt
1^I^I123^I12 34^I1
A^IAB^IABC^IABDC^Ie

HP o/p:
1 1212      341
A ABABC     ABDCe

IBM/expected o/p
1               123     12 34   1
A       AB      ABC     ABDC    e

radoulov · November 30, 2009, 6:52am

It's because the implicit data type conversion is different across different AWK implementations. Try this code:

awk 'NR == FNR {
  /^[ \t]*\47name/ && c++      # get the field number
  if (/^[ \t]*\47size/) {
    split($0, t, ":")          
    gsub(/[ \t\47,]/, x, t[2]) # strip punctuation
    fmt[c] = t[2]              # get the size
    }
  next                         # run the above actions                                     
  }                            # + only for the first input file
{ 
  for (i=1; i<=NF; i++)        # output the strings in the correct format 
    printf "%" (length($i) > fmt + 0 ? "." : "-" ) fmt "s", $i
  print x
  }' config.txt temp.txt

subhrap.das · November 30, 2009, 6:59am

Sir still not working

Input

1               123     12 34   1
A       AB      ABC     ABDC    e

Output in HP-UX
1 1212      34         1
A ABABC     ABDC       e

Expected o/p / AIX o/p
1           123        12 34      1
A AB        ABC        ABDC       e

radoulov · November 30, 2009, 7:13am

As I said, you need to explicitly set the field separator if it's differnet than the default one:

awk -F'\t' 'NR == FNR {
  /^[ \t]*\47name/ && c++      # get the field number
  if (/^[ \t]*\47size/) {
    split($0, t, ":")          
    gsub(/[ \t\47,]/, x, t[2]) # strip punctuation
    fmt[c] = t[2]              # get the size 
    }
  next                         # run the above actions                                     
  }                            # + only for the first input file
{ 
  for (i=1; i<=NF; i++)        # output the strings in the correct format
    printf "%" (length($i) > fmt + 0 ? "." : "-" ) fmt "s", $i           
  print x
  }' config.txt temp.txt

subhrap.das · November 30, 2009, 7:16am

i'm not able to understand which separator are we talking about here. How do i find that?

radoulov · November 30, 2009, 7:19am

The file temp.txt/tab.txt seams to contain tab separated fields.
Did you try the last command I posted (the one right above your last post)?

subhrap.das · December 1, 2009, 10:53am

Current output

HP-UX

1   123     12 34      1
A ABABC     ABDC       e


Expected o/p / AIX o/p
1           123        12 34      1
A AB        ABC        ABDC       e