How to create file and file content based existing information?

Torhong · March 16, 2017, 7:06pm

Hi Gurus,
I am SQL developer and new unix user.
I need to create some file and file content based on information in two files.

I have one file contains basic information below file1 and another exception file file2. the rule is if "zone' and "cd" in file1 exists in file2, then file name is FILENAME in file2, file content is CONTENT in file2. if "zone" and "cd" doesn't exist in file2, then file name is ZONE_CD.file, file content is DT_dat. below are sample files and expected result.

I want to use while loop read file1, then somehow check file2 the zone and Cd, but I don't know how to use two items check. I want to use "grep", but I need to compare two items.

any input is welcome.

thanks in advance.

file1

 
  10 ABC 20170202 
 10 AAA 20170101 
 20 BBB 20170303

file2

 
 10 ABC SPECIALFILENAME.sh SPECIAL_CONTENT

expected result:

 
  SPECIALFILENAME.sh SPECIAL_CONTENT 1
 10_AAA.file 20170101_dat 
 20_BBB.file 20170303_dat

Don_Cragun · March 17, 2017, 1:53am

torhong:

Hi Gurus,
I am SQL developer and new unix user.
I need to create some file and file content based on information in two files.

I have one file contains basic information below file1 and another exception file file2. the rule is if "zone' and "cd" in file1 exists in file2, then file name is FILENAME in file2, file content is CONTENT in file2. if "zone" and "cd" doesn't exist in file2, then file name is ZONE_CD.file, file content is DT_dat. below are sample files and expected result.

I want to use while loop read file1, then somehow check file2 the zone and Cd, but I don't know how to use two items check. I want to use "grep", but I need to compare two items.

any input is welcome.

thanks in advance.

file1
 
 ZONE CD DT 
 10 ABC 20170202 
 10 AAA 20170101 
 20 BBB 20170303
 
file2
 
 ZONE CD FILENAME CONTENT 
 10 ABC SPECIALFILENAME.sh SPECIAL_CONTENT

 
expected result:
 
 filename filecontent 
 SPECIALFILENAME.sh SPECIAL_CONTENT 1
 0_AAA.file 20170101_dat 
 20_BBB.file 20170303_dat
 

What operating system are you using?

What shell are you using?

What have you tried to solve this problem on your own?

How does combining the two lines:

 ZONE CD DT 
 ZONE CD FILENAME CONTENT

produce the output:

 filename filecontent

? How does combining the two lines:

 10 ABC 20170202 
 10 ABC SPECIALFILENAME.sh SPECIAL_CONTENT

produce the output:

 SPECIALFILENAME.sh SPECIAL_CONTENT 1

? Why does converting the line:

 10 AAA 20170101

produce the output:

 0_AAA.file 20170101_dat

?

Why would you want to use a shell while loop and grep (multiple times) when the REs needed to isolate fields in a file are much more complicated that using awk to perform field splitting and record joining operations?

Is this a homework assignment? Homework and coursework questions can only be posted in the Homework & Coursework forum under special homework rules. If this is not homework, please explain the company you work for and the nature of the problem you are working on.

If you did post homework in the main forums, please review the guidelines for posting homework and repost.

Torhong · March 17, 2017, 7:21am

Hi Don,

I updated the file content. the first line is field name for reference. my os is solaris. shell is ksh.

this is not homework. the reason I want to use while loop is I am not familiar awk.

thanks.

RudiC · March 17, 2017, 8:20am

Having assumed that

you being an sql developer as explained in post#1 NOT posting homework questions
the original headers of both files shall be eliminated
the 1 trailing in output line 2 and missing in line 3 is a hyphenation error
usage of shell and grep are NOT compulsory

you might want to try

awk '
                {IX = $1 "," $2
                }
NR == FNR       {FN[IX] = $3
                 CT[IX] = $4
                 next
                }
FNR == 1        {print "filename filecontent"
                 next
                }
IX in FN        {print FN[IX], CT[IX]
                 next
                }
                {print $1 "_" $2 ".file", $3 "_dat"
                }
' file2 file1
filename filecontent
SPECIALFILENAME.sh SPECIAL_CONTENT
10_AAA.file 20170101_dat
20_BBB.file 20170303_dat

Torhong · March 17, 2017, 12:00pm

thanks Guys,

the requirement changed a little, we will read from file1, and use field1 and field2 find record in file2 and get filename and file content from file2.

I tried to write with while loop, it works. just wondering if the code can be improved.
I saw all answers use awk.

I know a little bit of awk. I will check the doc to fully understand the syntax, otherwise I will have problem if I just copy and paste.

 
 #!/bin/ksh
IFS=","
 error_cnt=0
while read zone feed date
do
echo $zone, $feed, $date
        exist_flag=0
        while read zone1 feed1 fname fct
                do
                        if [ $zone == $zone1 ] && [ $feed == $feed1 ]; then
                                echo "export ${fct}=${date}" > ${fname}
                                exist_flag=1
                                break
                        fi
                done<metapull.txt
        if [ exist_flag -eq 0 ]; then
                echo $zone, $feed " doesn't exisit in meta file" >> miss_records.txt
                error_cnt=1
        fi
done<sample.txt
 if [ error_cnt -ne 0 ]; then
        echo "Some feeds missing detail information, please check file " miss_records.txt
        exit 1
fi

RudiC · March 17, 2017, 12:33pm

With just glance at your script I see that

both input files being read via what seems to be the same file descriptor - might work by accident but is sloppy programming; use different fd explicitly
if it would work, for every line in file1 the entire file2 is being read - highly inefficient
every time fname is found in the input files, its hitherto existing contents will be overwritten - mayhap undesired

Torhong · March 17, 2017, 3:40pm

Thanks RudiC.

you are right, the second loop is very inefficient. is there any way to fix it without using awk as I am not familiar with awk.

thanks

RudiC · March 17, 2017, 6:41pm

Weren't it time to become familiar? With three to four line files you won't notice a difference between messing around with shell scripts and tools designed for text processing ( awk is not the only example, look at perl , or sed , or other). Have a look at post1 and post8: 45 min to process a huge file in shell compared to less than 1 min using awk ! And that's not the only thread recently citing tremendous performance improvements.

Torhong · March 20, 2017, 8:08pm

Hi Rudic,

I modified the code you provided, it works fine
I tried to add one more logic, if zone and cd doesn't exist in file2, create a error file. I got error. would you please take a look. thanks.

my file is
FILE1

AAA FIRST 20170203
BBB SECOND 20170204
CCC THIRD 20170205

FILE2

AAA     FIRST   file1.txt       content1=
BBB     SECOND  file2.txt       content2=

below is my code

nawk '{IX = $1 "," $2} NR == FNR{FN[IX] = $3; CT[IX] = $4; next}; {if(IX in FN) print CT[IX]$3 > FN[IX]; else {print "error" > abc}}
' FILE2 FILE1

when running above code, it creates two files as expected, but it throw error for the error record

nawk: null file name in print or getline
 input record number 3, file FILE1
 source line number 1

I use nawk because my OS is solaris. below is my OS infor:

SunOS xxxxxxxx 5.10 Generic_150400-04 sun4v sparc SUNW,SPARC-Enterprise-T5220

Don_Cragun · March 20, 2017, 9:00pm

torhong:

Hi Rudic,

I modified the code you provided, it works fine
I tried to add one more logic, if zone and cd doesn't exist in file2, create a error file. I got error. would you please take a look. thanks.

one more thing I want to add is if zone and cd doesn't exist in file2, create a error file and exit with 1.

my file is
FILE1
AAA FIRST 20170203
BBB SECOND 20170204
CCC THIRD 20170205 
FILE2
AAA     FIRST   file1.txt       content1=
BBB     SECOND  file2.txt       content2=
below is my code
nawk '{IX = $1 "," $2} NR == FNR{FN[IX] = $3; CT[IX] = $4; next}; {if(IX in FN) print CT[IX]$3 > FN[IX]; else {print "error" > abc}}
' FILE2 FILE1
when running above code, it creates two files as expected, but it throw error for the error record
nawk: null file name in print or getline
 input record number 3, file FILE1
 source line number 1
I use nawk because my OS is solaris. below is my OS infor:
SunOS xxxxxxxx 5.10 Generic_150400-04 sun4v sparc SUNW,SPARC-Enterprise-T5220

In awk (and nawk ) abc is a variable name and since you haven't assigned anything to that variables, its contents is an empty string (if a string is expected) or zero (if a number is expected). Assuming that you want the error log file to be named abc , try changing:

print "error" > abc

to:

print "error" > "abc"

And, to cover the exit status issue:

nawk '
{	IX = $1 "," $2
}
NR == FNR {
	FN[IX] = $3
	CT[IX] = $4
	next
}
{	if(IX in FN)
		print CT[IX]$3 > FN[IX]
	else {	print "error" > "abc"
		ev = 1
	}
}
END {	exit ev
}' FILE2 FILE1

Or, if you want to exit immediately if an error is found, change the:

		ev = 1

to:

		exit 1

and remove the END clause.

Torhong · March 21, 2017, 12:38pm

don cragun:

In awk (and nawk ) abc is a variable name and since you haven't assigned anything to that variables, its contents is an empty string (if a string is expected) or zero (if a number is expected). Assuming that you want the error log file to be named abc , try changing:
print "error" > abc
to:
print "error" > "abc"
And, to cover the exit status issue:
nawk '
{    IX = $1 "," $2
}
NR == FNR {
   FN[IX] = $3
   CT[IX] = $4
   next
}
{    if(IX in FN)
   print CT[IX]$3 > FN[IX]
   else {    print "error" > "abc"
   ev = 1
   }
}
END {    exit ev
}' FILE2 FILE1
Or, if you want to exit immediately if an error is found, change the:
   ev = 1
to:
   exit 1
and remove the END clause.

thank you Don Cragun. this is exactly I want.

---------- Post updated 03-21-17 at 12:38 PM ---------- Previous update was 03-20-17 at 09:48 PM ----------

don cragun:

In awk (and nawk ) abc is a variable name and since you haven't assigned anything to that variables, its contents is an empty string (if a string is expected) or zero (if a number is expected). Assuming that you want the error log file to be named abc , try changing:
print "error" > abc
to:
print "error" > "abc"
And, to cover the exit status issue:
nawk '
{    IX = $1 "," $2
}
NR == FNR {
   FN[IX] = $3
   CT[IX] = $4
   next
}
{    if(IX in FN)
   print CT[IX]$3 > FN[IX]
   else {    print "error" > "abc"
   ev = 1
   }
}
END {    exit ev
}' FILE2 FILE1
Or, if you want to exit immediately if an error is found, change the:
   ev = 1
to:
   exit 1
and remove the END clause.

Hi Don Cragun,

the code works perfectly. I need to add some parameters for file path and file name,

I tried below script, but it give me error as below.

would you please take a look the issue if you are free.

thanks in advance

 error message:
 nawk: division by zero
 input record number 1, file sample.txt
 source line number 10

 
 tfiledir=/home
errdir=/home
errfilename=miss_records.txt
 nawk -F","  -v tdir="$tfiledir" -v edir="$errdir" -v efilename="$errfilename" '
{       IX = $1 "," $2
}
NR == FNR {
        FN[IX] = $3
        CT[IX] = $4
        next
}
{       if(IX in FN)
                print CT[IX]$3 > tdir/FN[IX]
        else {  print "error" > edir/efilename
                ev = 1
        }
}
END {   exit ev
}' $1 $2

RudiC · March 21, 2017, 1:18pm

You fell into the same trap as you did in post#9. In programming, you need to differentiate between strings, and variables and operators. And, you need to know how e.g. awk deals with variables' types, converting strings into numerics and vice versa if need be. Alphabetic only strings result in a numeric value of 0.

When you're writing tdir/FN[IX] in line 10 (as indicated in the error message), awk tries to divide tdir (a string, i.e. 0 which doesn't hurt) by FN[IX] (another string, i.e. 0, but hurts in this case).

Try, as you've beed told before, to enclose the / in double quotes: "/" .

Torhong · March 21, 2017, 1:42pm

Hi RudiC,
thanks for help.

I tried "/", it failed with below error message:

Thanks.

nawk: syntax error at source line 10
 context is
                        print CT[IDX]$3 > >>>  tdir"/" <<< 
nawk: illegal statement at source line 10
nawk: syntax error at source line 11

RudiC · March 21, 2017, 3:17pm

Some awk versions don't like concatenated redirection targets - try parenthesizing them.