Extract header data from one file and combine it with data from another file

Hi, Great minds, I have some files, in fact header files, of CTD profiler, I tried a lot C programming, could not get output as I was expected, because my programming skills are very poor, finally, joined unix forum with the hope that, I may get what I want, from you people,

Here I have attached some text files which defines output, which actually I am looking for,,,

Thanks,:):b:

Cheers..

Try

awk -F ":" 'NR==FNR{gsub("*","",$0);if($0 ~ /UpLoad Time/){split($0,P,"=")}else if(P[1]){A[++x]=$1;B[x]=$2};next}{if(FNR==1){printf P[1];FNV=P[2];
for(i=1;i<=x;i++){printf "\t%s", A;FNV=FNV"\t"B};print $0}
if(FNR>1){print FNV"\t"$0}}' file2 file1

How to I apply for loop for this
file 2 is .hdr
file 1 is .asc
for file in *.txt; do
for file in *.hdr; do
---
---
---
done
whether it works ..don't know much about shell scripting

Use a simple for loop.

for file in *.hdr
do
    awk ...   $file
done

from 2nd post,

see fpmurphy's profile and badges

file base of asc and hdr must match then only it should add latitude, longitude, etc...to main file, solution given by pamu is working fine, but to process number of files I have to include for loop, so I had asked...if anybody knows any good book or link to learn shell scripting please put link, I am very poor, in shell scripting..I know only C...

---------- Post updated at 03:28 AM ---------- Previous update was at 03:07 AM ----------

actually header file is having more information when I tried with another file I got wrong result....

Pamu's script reads, full header file, it should not read actually it has to filter out only particular information only...

they are ship,cruise,station, lat,long,DEPTH(SONIC),DEPTH(CAST),sst, and cast

* Sea-Bird SBE19plus Data File: 
* FileName = D:\SK-296_CTD_portable\Hex_data\n12a001001.hex 
* Software Version 1.59 
* Temperature SN =  6336 
* Conductivity SN =  6336 
* System UpLoad Time = Jul 14 2012 18:12:59 
** SHIP : India-ship 
** CRUISE : 500
** STATION : n12a001 
** LAT : 21 12.1985 N 
** LONG : 89 27.016 E 
** DEPTH(SONIC) : 93 m 
** DEPTH(CAST) : 77.7 m 
** SST : 29 degC 
* ds 
* SBE 19plus V 2.3  SERIAL NO. 6336    14 Jul 2012 17:58:03 
* vbatt = 12.2, vlith =  8.6, ioper =  61.8 ma, ipump =  39.9 ma, 
* iext01 =   4.6 ma, iext2345 =  27.1 ma 
* status = not logging 
* number of scans to average = 1 
* samples = 103375, free = 3767104, casts = 11 
* mode = profile, minimum cond freq = 2700, pump delay = 60 sec 
* autorun = no, ignore magnetic switch = no 
* battery type = alkaline, battery cutoff =  7.5 volts 
* pressure sensor = strain gauge, range = 5076.0 
* SBE 38 = no, WETLABS = no, OPTODE = no, Gas Tension Device = no 
* Ext Volt 0 = yes, Ext Volt 1 = no 
* Ext Volt 2 = yes, Ext Volt 3 = no 
* Ext Volt 4 = yes, Ext Volt 5 = no 
* echo characters = yes 
* output format = raw HEX 
 
* S> 
* 19plus 
* dh 
* cast   1 13 Jul 2012 07:31:59 samples 1 to 8619, avg = 1, stop = mag switch 

-----
still some texts are there...not including...

Pls help....

You can do something like this...

if ($0 ~ /SHIP/){A[++x]=$1;B[x]=$2}
if ($0 ~ /CRUISE/){A[++x]=$1;B[x]=$2}

Use this to implement your other headers also... :slight_smile:

pamu

Try saving the following script in a file (I used the name runner while testing it, but choose anything you like). Make it executable using chmod +x file_name and run it using ./runner header.txt main.txt to test it with the sample files you provided. The output this awk script produces match the output you requested (as long as you remove the trailing empty line at the end of main.txt) except that there are no spaces before tabs in the header line:

#!/bin/ksh
# Usage: runner hdr main [hdr main]...
if [ $# -lt 2 ] || [ $(($# % 2)) -ne 0 ]
then    printf "%s: Odd number of operands or less than two operands:\n" $0 >&2
        printf "Usage: %s hdr main [hdr main]...\n" "$0" >&2
        exit 1
fi
while [ $# -ge 2 ]
do      hf="$1"
        mf="$2"
        shift 2
        awk 'BEGIN{
                m["Jan"] = 1; m["Feb"] = 2;  m["Mar"] = 3;  m["Apr"] = 4
                m["May"] = 5; m["Jun"] = 6;  m["Jul"] = 7;  m["Aug"] = 8
                m["Sep"] = 9; m["Oct"] = 10; m["Nov"] = 11; m["Dec"] = 12
                FS = "[:=] "
                printf("System UpLoad\tTime\tSHIP\tCRUISE\tSTATION\tLAT\t%s",
                        "LON\tNATURE_OF_PR\tPROJ_NO\tINSTITUTE_CD\tST_DEPTH\n")
        }
        FNR>1 && FNR!=NR{
                printf("%s\t%s\t%s\t%s\t%s\n", hdr, $1, $2, $3, $4)
                next
        }       
        FNR==NR && /System UpLoad/{
                split($2, a, "  *")
                dt=sprintf("%02d/%02d/%02d", a[2], m[a[1]], a[3] % 100)
                tm=a[4]
                next
        }
        FNR==NR && /SHIP/{sh = $2;next}
        FNR==NR && /CRUISE/{cr = $2;next}
        FNR==NR && /STATION/{st = $2;next}
        FNR==NR && /LAT/{la = $2;next}
        FNR==NR && /LONG/{lo = $2;next}
        FNR==1 && NR>1{
                FS = "\t\t*"
                hdr = sprintf("%s\t%s\t%s\t%s\t%s\t%s\t%s",
                        dt, tm, sh, cr, st, la, lo)
                next;
        }' "$hf" "$mf"
done

You can invoke this script with any even number of files >=2 where the 1st file in each pair is in the format of header.txt and the 2nd file in each pair is in the format of main.txt.

1 Like

Hi
Don Cragun

I tried your script with new file its reading header file as well as main file, I am happy with your solution

How do I save it as file

How to give command..

Hi.
How do you save what as a file? It isn't clear to what "it" refers in you question?

If you mean that you want to save the output produced by the script for all input files as a single output file try:

./runner header.txt main.txt > output.txt

If you mean that you want the output produced by each set of input files to be stored in different output files, we can modify the script to take sets of three files (header file, main file, and output file) as operands.

Sir as I got one small problem in appending header please help.

I modified as per my requirement

#!/bin/ksh
# Usage: runner hdr main [hdr main]...
if [ $# -lt 2 ] || [ $(($# % 2)) -ne 0 ]
then    printf "%s: Odd number of operands or less than two operands:\n" $0 >&2
        printf "Usage: %s hdr main [hdr main]...\n" "$0" >&2
        exit 1
fi
while [ $# -ge 2 ]
do      hf="$1"
        mf="$2"
        shift 2
        awk    'BEGIN{
                m["Jan"] = 1; m["Feb"] = 2;  m["Mar"] = 3;  m["Apr"] = 4
                m["May"] = 5; m["Jun"] = 6;  m["Jul"] = 7;  m["Aug"] = 8
                m["Sep"] = 9; m["Oct"] = 10; m["Nov"] = 11; m["Dec"] = 12
                FS = "[:=] "
  printf("Date\tTime\tLatitude\t%s","Longitude\tPrDM\tT090C\tC0S/m\tDepSM\tSal00\tDensity00\tSigma-�00\tOxsolML/L\tSvCM\n")
        }

    FNR>1 && FNR!=NR{printf("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n", hdr, $1, $2, $3, $4, $5, $6, $7, $8, $9);next}
               
        FNR==NR && /System UpLoad/{split($2, a, "  *");dt=sprintf("%02d/%02d/%02d", a[2], m[a[1]], a[3] % 100);tm=a[4];next}

    FNR==NR && /NMEA Latitude/{lat = $2;next}

        FNR==NR && /NMEA Longitude/{lon = $2;next}

        FNR==1  && NR>1{FS = "\t\t";hdr = sprintf("%s\t%s\t%s\t%s",dt, tm, lat, lon)
    next
        }' "$hf" "$mf"
done

I attached data and header file also but latitude and longitude not coming, if I make

 FNR==NR && /NMEA Longitude/{lon = $2;next}

to

FNR==NR && /^NMEA Longitude/{lon = $2;next}

I can see time otherwise not

please tell me where I am doing wrong also

If you need this in a hurry (as indicated by the three private messages you sent me :mad: ), you need to find someone else to help you. I just found out that I'm going to be tied up most of the rest of this weekend and at least until Wednesday afternoon next week.

I did look at your data enough to know that one of your problems is that the files you attached to message #10 in this thread are all terminated by carriage-return and newline characters instead of just a newline character.

If you change:

    FNR>1 && FNR!=NR{printf("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n", hdr, $1, $2, $3, $4, $5, $6, $7, $8, $9);next}

in your script to:

                gsub(/]r/, "")
    FNR>1 && FNR!=NR{printf("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n", hdr, $1, $2, $3, $4, $5, $6, $7, $8, $9);next}

that may get rid of the carriage return characters that are causing the latitude and longitude data to overwrite earlier fields in the output.

Another significant change is that in your first main.txt file, you had tabs separating fields. In your new main.txt file, there are no tab characters so all of your input lines in that file probably have only one field when you set FS to "\t\t".

Good luck...

gsub(/]r/, "")

Sir I didn't find any changes from this sir

Try gsub(/\r/, "") instead (typo?).

And, as Don Cragun stated, your files are contaminated with windows style characters, repeatedly. Why don't you clean them before working on them or, even more important, posting them here? And, see same link, post consistent samples.

1 Like

Thanks but when you take output, say

sh script.sh file1 file2 > output 

full header also coming

---------- Post updated at 05:25 AM ---------- Previous update was at 05:12 AM ----------

awk '{for(i=4;i<=NF;i++){printf "\t""%s", $i}printf "\n"}

' I tried to make pure tab separated from this but still having problem, I understood what you said, before using unix what I need to do, that cleaning part is complex for me.. can you please tell, how you clean windows style char

---------- Post updated at 05:48 AM ---------- Previous update was at 05:25 AM ----------

sometimes I use this command to convert dos to unix, but this I won't trust because sometimes it results wrong..

if any modification I have to do means please tell me..

cat dosfile | tr -d '\r' > unixfile && mv unixfile dosfile

For "cleaning" dos files, use the dos2unix command, if available. Other very powerful and helpful programs are iconv or recode . For just removing the <CR> chars, tr -d "\r" <ifile >ofile will do.

Back to your problem: I do not understand what you are missing in the output, or what should be removed. Pls post a few line sample of the existing output and the same few lines of the correct ouput, all based on the sample files in post #10.

1 Like

Sir I found new way for existing code, just to separate variable using comma, then finally by sed command either space or tab as per my requirement. your gsub(/\r/, "") printed all variables

Sorry, the /]r/ should have been /\r/ . I don't have time to work out what you need for FS and OFS; you might try just removing the place where you set FS; the default value may work with your latest input file.

The key point is that you need to tailor your script to match your input files AND YOU DO NOT HAVE A CONSISTENT INPUT FILE FORMAT.

When you tell us to come up with a solution for input files that have one or more tabs as field separators and no carriage returns in the file, you got a solution that worked for files in that format. Now you give us a pair of input files with lots of spaces as field separators and carriage returns. It is no surprise that the previous solution doesn't work with files that have a completely different input format.

1 Like

Yes I agree, RudyC has explained me, about the problem, finally I got result, thank you both of you.