RJG
November 1, 2016, 9:16am
1
I have input data looks like this which is a part of a csv file
7,1265,76548,"0102:04"
8,1266,76545,"0112:04"
I need to make the output data should look like this and the output data will be part of text file:
7|1265000 |7654899 |A|
8|12660000 |76545999 |B|
The logic behind the output data
1st output field= 1st input field
2nd output field= 2nd input field+(1st input field-length(2nd input field))zeros+(1st input field-length(3rd input field))spaces
3rd output field= 3rd input field+(1st input field-length(3rd input field))nines+(1st input field-length(2nd input field))spaces
4th Output field=
If the 3rd field of the quoted string is 0, 4th output will be A
If the 3rd field of the quoted string is 1, 4th output will be B
How can I do it using UNIX shell scripting?
RudiC
November 1, 2016, 9:41am
2
That problem can certainly be solved in *nix shell or text utilities. What escapes me is the logics defining the second and third output field composition. Why does the first record have three zeroes and two nines, and the second four zeroes and three nines? And, if you are talking of "the 3rd field of the quoted string", is that the third character?
RJG
November 1, 2016, 10:02am
3
rudic:
That problem can certainly be solved in *nix shell or text utilities. What escapes me is the logics defining the second and third output field composition. Why does the first record have three zeroes and two nines, and the second four zeroes and three nines? And, if you are talking of "the 3rd field of the quoted string", is that the third character?
Hi Rudi,
first record has three zeros because (7-length of (1265)) that is (7-4)=3 zeros
first record has two nines because (7-length of 76548)) that is 7-5=2 nines
And for 2 nd record
secondrecord has four zeros because (8-length of (1266)) that is (8-4)=4 zeros
second record has three nines because (8-length of 76545)) that is 8-5=3 nines
And yes, I was trying to say that "the 3rd character of the quoted string"
i.e for 1st field it will be 0
for 2nd field it will be 1
And I am currently using bash shell.I have tried to use the printf command for padding zeros and nines. But it does not work.
i have calculated the number of zeros diff_in_length_zero variable and the number of nines in diff_in_length_nine variablee and tried the following command
$(printf '%*s' {diff_in_length_zero} '0')
$(printf '%*s' {diff_in_length_nine} '9')
But it does not work.Previously I used the same command for adding space in a line.
Thanks,
RJG
RudiC
November 1, 2016, 10:21am
4
OK, understood, your syntax was a bit difficult to interpret. Mind an awk
solution?
awk -F, -vOFS="|" '
{$2 = sprintf ("%s%0*d%*s", $2, $1-length($2), 0, $1-length($3), " ")
$3 = sprintf ("%s%d%*s", $3, 10^($1-length($3))-1, $1-length($2), " ")
$4 = substr($4, 4, 1)=="0"?"A":"B"
$5 = ""
}
1
' file
7|1265000 |7654899 |A|
8|12660000 |76545999 |B|
RudiC
November 1, 2016, 10:37am
5
In fact, the modification of $2 prevents the correct calculation of $3. Try instead:
awk -F, -vOFS="|" '
{L2 = $1-length($2)
L3 = $1-length($3)
$2 = sprintf ("%s%0*d%*s", $2, L2, 0, L3, "")
$3 = sprintf ("%s%d%*s", $3, 10^L3-1, L2, "")
$4 = substr($4, 4, 1)=="1"?"B":"A"
$5 = ""
}
1
' file
7|1265000 |7654899 |A|
8|12660000 |76545999 |B|
RJG
November 2, 2016, 7:09am
6
rudic:
In fact, the modification of $2 prevents the correct calculation of $3. Try instead:
awk -F, -vOFS="|" '
{L2 = $1-length($2)
L3 = $1-length($3)
$2 = sprintf ("%s%0*d%*s", $2, L2, 0, L3, "")
$3 = sprintf ("%s%d%*s", $3, 10^L3-1, L2, "")
$4 = substr($4, 4, 1)=="1"?"B":"A"
$5 = ""
}
1
' file
7|1265000 |7654899 |A|
8|12660000 |76545999 |B|
Thanks Rudi for your reply
---------- Post updated 11-02-16 at 06:09 AM ---------- Previous update was 11-01-16 at 10:08 AM ----------
rudic:
In fact, the modification of $2 prevents the correct calculation of $3. Try instead:
awk -F, -vOFS="|" '
{L2 = $1-length($2)
L3 = $1-length($3)
$2 = sprintf ("%s%0*d%*s", $2, L2, 0, L3, "")
$3 = sprintf ("%s%d%*s", $3, 10^L3-1, L2, "")
$4 = substr($4, 4, 1)=="1"?"B":"A"
$5 = ""
}
1
' file
7|1265000 |7654899 |A|
8|12660000 |76545999 |B|
Hi Rudi,
I have tried with your code; It worked fine with the given input set.
But if the input and output field numbers are changed, then I am facing issue .
I tried with the following input data set:
7,1265,76548,"0102:04"
8,1266,76545,"0112:04"
And modified output set will be
E|1265000 |7654899 |7|A|
E|12660000 |76545999 |8|B|
where
1st field 'E' is common for both lines
4th field will be the 1st input field
Rest of the field logics are same
$1(E) field of the o/p file is different right now. And the $1 field of the i/p file will be reused in reused in 4th field of o/p file.
I have not worked much with AWk. Could you please help me.
May i ?
with little change and without interim you can use RudiC solution, following might work:
awk -F, -vOFS="|" '
{L2 = $1-length($2)
L3 = $1-length($3)
$5 = substr($4, 4, 1)=="1"?"B":"A"
$4 = $1
$1 = "E"
$2 = sprintf ("%s%0*d%*s", $2, L2, 0, L3, "")
$3 = sprintf ("%s%d%*s", $3, 10^L3-1, L2, "")
}
1
' file