Date from filename inserted into records

Hi Folks,

I have below files in one directory:

Spiross-MBP:AIRTEMP spirospap$ ls -1
CPK2004001
CPK2004002
CPK2004003
etc...
JFK2003001
JFK2003002
JFK2003003
etc...
TEB1999001
TEB1999002
TEB1999003
etc...

Month/year is in Filename and also in the file Header, first line.
I only need 3 fields (out of several) and manage to filter single files and export the result to a new file with below awk one liner:

spirospap$ awk -F"," '{ print $1, $2, $9 }' CPK2004001 >CPK2004001.txt
Month:  01/2004 
Station Name:  "CENTRAL PARK" Call Sign: 
Day Time Dry Bulb Cel
01 0051    5.0
01 0151    5.0
01 0251    5.0
01 0351    5.6
01 0451    5.0
01 0551    5.0
01 0651    3.9
etc...
31 1851   -3.9
31 1951   -4.4
31 2051   -4.4
31 2151 -
31 2251    -.6
31 2351   -4.4

Tasks:

  1. Combine date from Header (or Filename) and express first field in Julianday instead of month day.
  2. Do this in one blow for all files in the directory, ending up with a single file for each station/year were each record looks like
Julianday Time Temperature

Any advise even for partial task(s) will be greatly appreciated.

Thanks

Not that I understood half of what you wrote, but for your printing problem you might use

awk '{print $1, $2, sprintf ("%5.1f", $9?$9:-11.1)}' file

, checking if $9 IS an empty string. Please be aware that in 31 2151 - there's NO missing value.

I simplified the request. Thanks for the comment and the printing suggestion although it doesn't produce the desired result.

If you're trying to compute the Julian day, it would help to know what operating system and shell you're using. And, if you don' have access to the GNU date utility, do you have access to a 1993 or later version of the Korn shell.

Please show us the output you are trying to produce. From your description, I am not at all sure where you want the Julian date(s) to appear in your output.

Hello,

Re kernel please see below:

Spiross-MBP:AIRTEMP spirospap$ uname -a
Darwin Spiross-MBP.fios-router.home 15.5.0 Darwin Kernel Version 15.5.0: Tue Apr 19 18:36:36 PDT 2016; root:xnu-3248.50.21~8/RELEASE_X86_64 x86_64

The file contents look like this:

Spiross-MBP:AIRTEMP spirospap$ head CPK2004001
Month:, 01/2004
Station Name:, "CENTRAL PARK" Call Sign:, NYC
Day,Time,StationType,Maint Indic,SkyConditions,Visibility,Weather Type,Dry Bulb Faren,Dry Bulb Cel,Wet Bulb Faren,Wet Bulb Cel,Dew Point Faren,Dew Point Cel,Rel Humd,Wind Speed,Wind Dir,Wind Char Gusts,Val. for Wind Char,Station Pressure,Press Tend,Sea Level Pressure,Report Type,Precip Total
01,0051,AO2 ,-,CLR                                          ,10SM   ,-,41   ,   5.0,-,   2.1,28   ,  -2.2, 60 , 6   ,VRB,-,0  ,29.92,5,180,AA,-
01,0151,AO2 ,-,CLR                                          ,10SM   ,-,41   ,   5.0,-,   2.2,28   ,  -2.2, 

I only need fields 1, 2 and 9 which contain Day of the Month, Hour and Temperatures in Degrees Celsius (labeled Dry Bulb Cel). I can do this by using the code below:

Spiross-MBP:AIRTEMP spirospap$ awk -F"," '{ print $1, $2, $9 }' CPK2004001 >CPK2004001.txt
Spiross-MBP:AIRTEMP spirospap$ head CPK2004001.txt 
Month:  01/2004 
Station Name:  "CENTRAL PARK" Call Sign: 
Day Time Dry Bulb Cel
01 0051    5.0
01 0151    5.0
01 0251    5.0
01 0351    5.6
01 0451    5.0
01 0551    5.0
01 0651    3.9

Eventually, I would like to have the first field (DAY) in Julian days like this:

Month:  01/2004 
Station Name:  "CENTRAL PARK" Call Sign: 
Julian Time Dry Bulb Cel
001 0051    5.0
001 0151    5.0
001 0251    5.0
001 0351    5.6
001 0451    5.0
001 0551    5.0
001 0651    3.9
......
001 2351
002 0051   4.0
002 0151   5.0
etc...
002 0151

I hope this clarifies
Thanks

Making some wild guesses about what you really want on the 1st three lines in each of your output files (.e., all of the data with <comma>s changed to <space>s instead of throwing away the call signs), the following seems to work using ksh on OS X to calculate the Julian dates for the 1st day of the month for the year and month specified by the filename:

#!/bin/ksh
for file in [A-Z][A-Z][A-Z][0-9][0-9][0-9][0-9]0[01][0-9]
do	m=${file#????????}
	y=${file#???}
	y=${y%???}
	jd1=$(printf '%(%j)T' "$m/01/$y")
	printf '%s,%s\n' "$file" "$jd1"
done | awk -F ' *, *' '
FNR == NR {
	jdb[ARGV[ARGC++] = $1] = $2 - 1
	next
}
FNR == 1 {
	if(ofn)	close(ofn)
	ofn = FILENAME ".txt"
	gsub(/,/, " ")
	print > ofn
	next
}
FNR == 2 {
	gsub(/,/, " ")
	print > ofn
	next
}
FNR == 3 {
	printf("Julian  %s  %s\n", $2, $9) > ofn
	fmt = sprintf("   %%03d  %%4.4s  %%%d.%ds\n", length($9), length($9))
	next
}
{	printf(fmt, $1 + jdb[FILENAME], $2, $9 ? $9 : "-") > ofn
}' -

With the following input files:
File: CPK2004001

Month:, 01/2004
Station Name:, "CENTRAL PARK" Call Sign:, NYC
Day,Time,StationType,Maint Indic,SkyConditions,Visibility,Weather Type,Dry Bulb Faren,Dry Bulb Cel,Wet Bulb Faren,Wet Bulb Cel,Dew Point Faren,Dew Point Cel,Rel Humd,Wind Speed,Wind Dir,Wind Char Gusts,Val. for Wind Char,Station Pressure,Press Tend,Sea Level Pressure,Report Type,Precip Total
01,0051,AO2 ,-,CLR                                          ,10SM   ,-,41   ,   5.0,-,   2.1,28   ,  -2.2, 60 , 6   ,VRB,-,0  ,29.92,5,180,AA,-
01,0151,AO2 ,-,CLR                                          ,10SM   ,-,41   ,   5.0,-,   2.2,28   ,  -2.2,

File: JFK2003003

Month:, 03/2003
Station Name:, "JFK AIRPORT" Call Sign:, JFK
Day,Time,StationType,Maint Indic,SkyConditions,Visibility,Weather Type,Dry Bulb Faren,Dry Bulb Cel,Wet Bulb Faren,Wet Bulb Cel,Dew Point Faren,Dew Point Cel,Rel Humd,Wind Speed,Wind Dir,Wind Char Gusts,Val. for Wind Char,Station Pressure,Press Tend,Sea Level Pressure,Report Type,Precip Total
01,0051,AO2 ,-,CLR                                          ,10SM   ,-,41   ,      ,-,   2.1,28   ,  -2.2, 60 , 6   ,VRB,-,0  ,29.92,5,180,AA,-
10,0151,AO2 ,-,CLR                                          ,10SM   ,-,41   ,      ,-,   2.2,28   ,  -2.2,

File: TEB2000012

Month:, 12/2000
Station Name:, "TETERBORO AIRPORT" Call Sign:, TEB
Day,Time,StationType,Maint Indic,SkyConditions,Visibility,Weather Type,Dry Bulb Faren,Dry Bulb Cel,Wet Bulb Faren,Wet Bulb Cel,Dew Point Faren,Dew Point Cel,Rel Humd,Wind Speed,Wind Dir,Wind Char Gusts,Val. for Wind Char,Station Pressure,Press Tend,Sea Level Pressure,Report Type,Precip Total
01,0051,AO2 ,-,CLR                                          ,10SM   ,-,41   ,   5.0,-,   2.1,28   ,  -2.2, 60 , 6   ,VRB,-,0  ,29.92,5,180,AA,-
31,0151,AO2 ,-,CLR                                          ,10SM   ,-,41   ,  -5.0,-,   2.2,28   ,  -2.2,

it produces the following output files:
File: CPK2004001.txt

Month:  01/2004
Station Name:  "CENTRAL PARK" Call Sign:  NYC
Julian  Time  Dry Bulb Cel
   001  0051           5.0
   001  0151           5.0

File: JFK2003003.txt

Month:  03/2003
Station Name:  "JFK AIRPORT" Call Sign:  JFK
Julian  Time  Dry Bulb Cel
   060  0051             -
   069  0151             -

File: TEB2000012.txt

Month:  12/2000
Station Name:  "TETERBORO AIRPORT" Call Sign:  TEB
Julian  Time  Dry Bulb Cel
   336  0051           5.0
   366  0151          -5.0

Note that this code only works with a 1993 or later version of the Korn shell. (OS X comes with a BSD-based date utility; not a GNU date utility. And, although BSD date has a -d option, it sets the kernel's idea of daylight savings time offsets; it does not have a way to use the GNU date -d option to specify an alternative date to process.

1 Like

Oh, this is perfection beyond belief! Thank you!

Is it possible to concatenate all the .txt files of a single year into a single file?
For example, all below files into one CPK2004.txt file?

Spiross-MBP:AIRTEMP spirospap$ ls -1 *.txt
CPK2004001.txt
CPK2004002.txt
CPK2004003.txt
CPK2004004.txt
CPK2004005.txt
CPK2004006.txt
CPK2004007.txt
CPK2004008.txt
CPK2004009.txt
CPK2004010.txt
CPK2004011.txt
CPK2004012.txt

Thank you

I'm not sure what you want:

  1. After the awk script creates one text file for each month, do you also want to create a file containing all of the monthly output in a separate yearly output file?
  2. Do you just want the awk script to produce yearly output files instead of monthly output files?
  3. Or, do you want the awk script to create yearly and monthly output files?

A yearly output file in addition to the monthlies would be fantastic. I believe it is Option #3.

Thank you

Try:

#!/bin/ksh
for file in [A-Z][A-Z][A-Z][0-9][0-9][0-9][0-9]0[01][0-9]
do	m=${file#????????}
	y=${file#???}
	y=${y%???}
	jd1=$(printf '%(%j)T' "$m/01/$y")
	printf '%s,%s\n' "$file" "$jd1"
done | awk -F ' *, *' '
FNR == NR {
	jdb[ARGV[ARGC++] = $1] = $2 - 1
	next
}
FNR == 1 {
	nofny = substr(FILENAME, 1, 7) ".txt"
	if(ofny && ofny != nofny)
		close(ofny)
	ofny = nofny
	if(ofnm)
		close(ofnm)
	ofnm = FILENAME ".txt"
	gsub(/,/, " ")
	print > ofnm
	print > ofny
	next
}
FNR == 2 {
	gsub(/,/, " ")
	print > ofnm
	print > ofny
	next
}
FNR == 3 {
	printf("Julian  %s  %s\n", $2, $9) > ofnm
	printf("Julian  %s  %s\n", $2, $9) > ofny
	fmt = sprintf("   %%03d  %%4.4s  %%%d.%ds\n", length($9), length($9))
	next
}
{	printf(fmt, $1 + jdb[FILENAME], $2, $9 ? $9 : "-") > ofnm
	printf(fmt, $1 + jdb[FILENAME], $2, $9 ? $9 : "-") > ofny
}' -
1 Like

Awesome!

Thanks again

---------- Post updated 06-19-16 at 09:50 AM ---------- Previous update was 06-18-16 at 11:37 PM ----------

An extra out-of-sequence record appears at the end of every monthly and in the yearly as well (305 in the example). Is there a way I can get rid of it?

   335  1851           8.3
   335  1951           8.3
   335  2051           8.3
   335  2151           8.9
   335  2251           9.4
   335  2351           9.4
   305                   -
Month:  12/2004
Station Name:  "CENTRAL PARK" Call Sign:  NYC
Julian  Time  Dry Bulb Cel
   336  0051           8.9
   336  0151           9.4
   336  0251          10.0
   336  0351           9.4

Thanks

Could it be there's a trailing extra empty line in your input file(s)? Then $1 , $2 , and $9 are empty, and it prints jdb, nothing, "-" ...

1 Like

...and there is..

If you haven't already figured it out, to ignore empty lines in your input files, insert three more lines in the awk script I suggested:

NF == 0 {
	next
}

just before the line:

FNR == NR {
1 Like

Honestly, I didn't but I found a perl one-liner to recursively delete the unnecessary lines before running the script you suggested

perl -i -pe "chomp if eof� */*

Now I am trying to substitute - with -9 , decimals in form .x into 0.x and negative decimals such as -.x into -0.x
For example:

 
   020  1751           1.7
   020  1851           1.7
   020  1951           1.1
   020  2051            .6
   020  2151             -
   020  2251             -
   020  2351             -
   021  0051           -.6
   021  0151          -1.1

should end up like this

 
   020  1751           1.7
   020  1851           1.7
   020  1951           1.1
   020  2051           0.6
   020  2151          -9.0
   020  2251          -9.0
   020  2351          -9.0
   021  0051          -0.6
   021  0151          -1.1

Any suggestions would be greatly appreciated.

Thanks

Instead of searching the web for explicit ways to perform very particular formatting requests, try reading the awk man page on your system and see if you can figure out how to modify the scripts you have been given to meet each of your new requirements. We are happy to help you learn how to use awk if there is something you can't figure out; but we are here to act as your unpaid programming staff for countless minor changes...

I will give you these two new additional trivial changes this time. Next time, we will expect you to show us what changes you want to your output and show us how you have tried to modify this awk script to meet your new requirements.

#!/bin/ksh
for file in [A-Z][A-Z][A-Z][0-9][0-9][0-9][0-9]0[01][0-9]
do	m=${file#????????}
	y=${file#???}
	y=${y%???}
	jd1=$(printf '%(%j)T' "$m/01/$y")
	printf '%s,%s\n' "$file" "$jd1"
done | awk -F ' *, *' '
FNR == NR {
	jdb[ARGV[ARGC++] = $1] = $2 - 1
	next
}
NF == 0 {
	next
}
FNR == 1 {
	nofny = substr(FILENAME, 1, 7) ".txt"
	if(ofny && ofny != nofny)
		close(ofny)
	ofny = nofny
	if(ofnm)
		close(ofnm)
	ofnm = FILENAME ".txt"
	gsub(/,/, " ")
	print > ofnm
	print > ofny
	next
}
FNR == 2 {
	gsub(/,/, " ")
	print > ofnm
	print > ofny
	next
}
FNR == 3 {
	printf("Julian  %s  %s\n", $2, $9) > ofnm
	printf("Julian  %s  %s\n", $2, $9) > ofny
	fmt = sprintf("   %%03d  %%4.4s  %%%d.1f\n", length($9))
	next
}
{	printf(fmt, $1 + jdb[FILENAME], $2, $9 ? $9 : -9) > ofnm
	printf(fmt, $1 + jdb[FILENAME], $2, $9 ? $9 : -9) > ofny
}' -

Note that the format string for printing field 9 has changed from %x.xs where x is the width of the header for field 9 (which displayed the contents of field 9 as a right justified string) to %x.1f (which displays the contents of field 9 as an x character floating point value with 1 digit displayed after the decimal point). And, changing $9 ? $9 : "-" in the printf statement printing field 9 (which printed the contents of field 9 if the contents is not the empty string and is not zero, or printed the string "-" if the contents of field 9 is zero or is an empty string) to ($9 != "") ? $9 : -9 causes the contents of field 9 to be printed if it is not an empty string, or to print -9 if the contents of field 9 is an empty string.

1 Like

You are very right. Perfectly understood I will do that in the future. Not that it makes any difference but I didn't even know that something like that was possible. Unix for DUMMIES, remember? I will do my best though. Thanks for the input and the comments.

I am perfectly aware that this is the "UNIX for Dummies Questions & Answers" forum. (It is not the "UNIX for Dummies Get Free Code Here" forum.) We are here to help you learn how to use UNIX and UNIX-like system tools, not to use those tools to do your job for you. We expect that when we make suggestions that help you do something you didn't know how to do before, that you will look at that code and learn from it. If you have trouble figuring out how code that was suggested works, read the manual page for that utility and see if you can figure it out. If you can't, ask questions and we'll be happy to explain how it works (i.e., give answers). Or, someone else might even suggest alternative code that might work better and explain why.

We are here to help you learn. Take advantage of the decades of experience provided free to you by the volunteers who are here to answer your questions.

1 Like

Noted and agreed. It may not be apparent but I am working very hard on this. And your experience and guidance is not taken for granted or unappreciated. I will keep all suggestions in mind.

Thanks