I have some bitrate data in a csv which is in an odd format and is difficult to process in Excel when I have thousands of rows. Therefore, I was thinking of doing this in bash and using awk as the primary application except that due to its complication, I'm a little stuck.
Data____Output_____Calculations
968_____968________no K or M so do nothing
31K7____31700______31*K + 7*100
69K5____69500______69*K + 5*100
1M07____1070000____1*M + 5*10,000
842K____842000_____842*K
5M99____5990000____5*M + 99*10,000
Note: K = 1000, M = 1,000,000
As you can see, it's a bit of a nuisance. The file only has the Data column. What I need the script to do is the calculations and spit out the Output column. Please ignore the underscores as I couldn't get it to indent properly with the editor. Any help would be much appreciated.
Sorry, maybe I didnt make it clear. The file only has the Data column. What I need the script to do is the calculations and spit out the Output column.
Both very clever indeed. Output is correct if theres only one column. However, a little more assistance please. If the csv is as follow, how do I get it to convert the bitrate columns (data1, data2, data3) without having to parse the file first?
Additionally, I'd like to be able to call the script and give it an input file from the shell prompt rather than explicitly calling the file in the script. For example:
This applies the requested transformations to fields 2, 3, and 4. However, I have no idea why:
timestamp,data1,data2,data3
in your input file changed to:
timestamp,wan1,wan2,wan3
I don't know whether you were saying the other transformations were the incorrect values produced by Chubler_XL's script or if those were the values you want to get from the input you gave.
is in a file named infile , that you save my script above in a file named awkscript , adjust the /bin/ksh in the first line of my script to be the absolute pathname of the Korn shell on your system, and that you make awkscript executable by running the command:
PS Note that for the last three lines of this input file, this script and the script provided by Chubler_XL in message #7 in this thread produce different results. If I understand you input formats, I think this script does what you want.
I will agree that none of the sample input used xMxxx or xKxxx, but we also provide different output for the input 2K44 . My script produces 2440 , your script produces 24400 . Both of our scripts produce 4340000 for the input 4M34 and I can't believe the intent was for xMxx and xKxx to differ by a factor of 100 instead of 1000 in the way they handle two digits after the K and M multiplier codes. Both of these appear in the lines provided in the sample input (not just in my extended test cases).
We'll have to let shadyuk tell us which one of us made the right assumption for the desired behavior given that the specification didn't cover any of these cases explicitly.
I was originally using excel to do this but it continuously freezes when my worksheet contains tens of thousands of rows and becomes impossible to work with. Awk is far more efficient. I can then graph up what I need when I have the script output.
Works just as i'd like it to. I amended one part so that I can define which columns it should translate as my csv may have additional ones before and after the bit rate data. Please let me know if there's a more efficient way of doing this.
Obviously, the way I set up your output heading is incorrect since the sample input you gave us didn't match your input.
Assuming that instead of fields 2, 3, and 4 containing slash separated values that you want to convert, you now have fields 5, 6, 7, 8, 9, and 10 containing slash separated values you want to convert and that each of these fields contains a single slash character, then the 2nd for loop needs to be:
for(i = 5; i <= 16; i++){
instead of:
for(i = 5; i <= 10; i++){
since you're adding 6 new fields to the line in the first loop. If some of these fields don't have slashes or some of these fields contain more than one slash, you need to add the number of slashes in fields 5 through 10 to 10 for the end point for the loop. If there are a variable number of slashes in fields 5 through 10, additional logic is needed to determine the end point.
If you could set up your input so the fields that need to be modified are all at the end of the line, going back to using NF as the end point will be easier.
This would have been easier on all of us if you had given us a representative sample of what you wanted done originally instead of changing requirements every time we give you something that does what you requested!