Reading file and matching it from set of file

arunkumar_mca · October 3, 2017, 11:10am

Hi All,

I have file with set of records. I have to find if the record is available in the set if files and if yes need to get the file names of it.

Example

head unmatch
02000417
02001855
02004639
02005420
02005440
02005454
02006624
02006743
02007097
02008770

-rw-rw-r-- 1 ftpuser pfus 131868931 Oct  3 10:45 50.h
-rw-rw-r-- 1 ftpuser pfus 135538052 Oct  3 10:46 51.h
-rw-rw-r-- 1 ftpuser pfus 123051798 Oct  3 10:47 52.h
-rw-rw-r-- 1 ftpuser pfus 123583711 Oct  3 10:48 01.h

The record in unmatch can be in any of the file 50.h,51.h,52.h,01.h . The record search should happen only on 1-8 position. I have to take the record from unmatch and then compare with all 1-8 position record in 50.h,51.h,52.h,01.h and the print the result where ever it matched.

RudiC · October 3, 2017, 11:28am

Some samples of matching and non-matching data lines in the files, please.

Corona688 · October 3, 2017, 11:36am

I think I understand what you're getting at... Each line of unmatch is the first 8 characters of a record you want to retrieve.

awk 'NR==FNR { A[$1] ; next } # Store matching record IDs in A
substr($0,0,8) in A # Check first 8 chars of record and print if in A
        ' unmatch 50.h 51.h 52.h 01.h

Use nawk on solaris.

arunkumar_mca · October 3, 2017, 12:29pm

Yes each line of unmatched will be in any of the set of the file with *.h . I have to take a record from unmatch and find if they are available on 1,8 position on *.h and print the record and the file *.h name where it was found.

I executed the awk . it is giving me the record. How can I get the file name too.

Don_Cragun · October 3, 2017, 12:47pm

Change the line:

substr($0,0,8) in A # Check first 8 chars of record and print if in A

in Corona688's suggestion to:

substr($0,0,8) in A { print FILENAME, $0 } # Check first 8 chars of record and print if in A

assuming that you want the file name printed before the contents of the line.

Showing us sample input and the output you wanted from that sample input (and telling us what operating system and shell you're using) would have gotten you a working suggestion much sooner.

arunkumar_mca · October 3, 2017, 12:54pm

Thanks that worked. Thanks a lot

RudiC · October 3, 2017, 12:57pm

Shouldn't that be substr($0,1,8) in A ?

I wasn't sure if the substring to be compared had to be in chars 1 - 8 or start in 1 - 8 , i.e. possibly be from 8 - 15.

Try also

sed 's/^/^/' unmatch | grep -f- *.h

Don_Cragun · October 3, 2017, 1:04pm

Yes. But on most versions of awk that I have used, both produce the same results.

Me either. That is why telling us what operating system and shell are being used and sample input and desired output are so important.

RudiC · October 3, 2017, 1:11pm

I'm afraid they don't:

awk ' {print FILENAME, substr($0,0,8), substr($0,1,8)}  ' unmatch 50.1 51.1
50.1 1234567 12345678
50.1 0123456 01234567

0,8 yields 7 chars, not 8, at least with my linux' mawk . In FreeBSD's awk , it seems to yield 8 chars in either case.

Scrutinizer · October 3, 2017, 2:29pm

rudic:

I'm afraid they don't:
awk ' {print FILENAME, substr($0,0,8), substr($0,1,8)}  ' unmatch 50.1 51.1
50.1 1234567 12345678
50.1 0123456 01234567
0,8 yields 7 chars, not 8, at least with my linux' mawk . In FreeBSD's awk , it seems to yield 8 chars in either case.

FWIW I could not find one awk that showed a difference, including all awk's on Solaris, AIX and HPUX. Even my version of (mawk 1.3.4 20100625) worked well.
On CentOS it worked fine as well mawk 1.3.4 20161120.

So maybe it is a bug in that particular version?

From the change log:

20090726
[..]
	+ modify workaround for (incorrect) scripts which use a zero-parameter
	  for substr to ensure the overall length of the result stays the same.
	  For example, from makewhatis:
		filename_no_gz = substr(filename, 0, RSTART - 1);