Compare 2 files, awk maybe?

I have 2 files,
file1:

alfa     numbers numbers 
vita     numbers numbers
gama   numbers numbers
delta    numbers numbers
epsilon numbers numbers
zita      numbers numbers
...

file2:

'zita'    keepnumbers keepnumbers keepnumbers
'gama' keepnumbers keepnumbers keepnumbers
'misc'  keepnumbers keepnumbers keepnumbers
'alfa'    keepnumbers keepnumbers keepnumbers
...

and I want to
print the lines of file2
of which the first word (in the first column)
matches with the first word of file 1 (in the first column), BUT keep the order of first file.
The output should look like

'alfa'    keepnumbers keepnumbers keepnumbers
'gama' keepnumbers keepnumbers keepnumbers
'zita'    keepnumbers keepnumbers keepnumbers

I have already tried with

awk 'NR==FNR{a[$1]++;next}a[$1]' file1 file2 > file3

but the order in file3 is like file2.
Moreover awk hits in the quote symbol ' is there a way to ignore it and read only the name inside quotes?
Thanks in advance for the help and time

Would that work?

awk 'NR==FNR{gsub("\47", "", $1); a[$1]=$0;next} {if( $1 in a) {print a[$1]}}' file2 file1 > file3
1 Like

To exactly match the requested output, you could try:

awk -v sq="'" '
FNR == NR {
	d[$1] = $0
	next
}
(sq $1 sq) in d {
	print d[sq $1 sq]
}' file2 file1 > file3
1 Like

It works but I have a problem because the first word in some lines has a space in the end like

 'alfa '   keepnumbers keepnumbers 

do you know a way to overcome it? Thank you very much! For the recomendations too!

Assuming that you want to keep 'alfa ' in the output.

awk 'NR==FNR{t=$1; gsub("\47", "", t); a[t]=$0; next} {if( $1 in a) {print a[$1]}}' file2 file1 > file3
1 Like

Or you could try:

awk -F " *'" '
FNR == NR {
	d[$2] = $0
	next
}
$1 in d {
	print d[$1]
}' file2 FS=' ' file1 > file3
1 Like

One last question: what can I do if i want to remove the space
when it is followed by single quote from wherever it is inside the file?
The point is to keep the single quote in the previous and next words of a column.
e.g.

 'numbers1' 'te1 ' text 
 'numbers2' 'te2 ' text 
...

will have to result the output:

 'numbers1' 'te1' text
 'numbers2' 'te2' text
...

Note to mention that only 4 characters exist inside the problematic quotes (like 'tes ') including the space.

Wonder if that would work!

awk 'NR==FNR{t=$1; gsub("\47", "", t); a[t]=$0; next} {if( $1 in a) {gsub(" \47", "\47", a[$1]);print a[$1]}}' file2 file1
1 Like

If we keep file1 as it was in the original question and change file2 to contain:

'zita'    'keepnumbers '   'keepnumbers '   'keepnumbers'
'gama' keepnumbers  'keepnumbers '	'keepnumbers'
'misc'  'keepnumbers ' keepnumbers  'keepnumbers'
'alfa '    'keepnumbers '  'keepnumbers '   'keepnumbers'

(note that there is a tab between the last two fields instead of specs on the line containing "gamma"), Aia's code in message #8 in this thread produces:

'alfa'   'keepnumbers' 'keepnumbers'  'keepnumbers'
'gama' keepnumbers 'keepnumbers'	'keepnumbers'
'zita'   'keepnumbers''keepnumbers''keepnumbers'

which I think got rid of too many spaces.

If I understand the third set of requirements properly (only remove a single space at the end of fields between pairs of single quotes; keep spaces between fields as they were), I think this does what you want:

awk -v sq="'" '
BEGIN {	FS = OFS = sq
}
FNR == NR {
	for(i = 2; i <= NF; i +=2)
		if(substr($i, length($i)) == " ")
			$i = substr($i, 1, length($i) - 1)
	d[$2] = $0
	next
}
$1 in d {
	print d[$1]
}' file2 FS=" " file1 > file3

which produces the following output:

'alfa'    'keepnumbers'  'keepnumbers'   'keepnumbers'
'gama' keepnumbers  'keepnumbers'	'keepnumbers'
'zita'    'keepnumbers' 'keepnumbers' 'keepnumbers'
1 Like

Due to FS. :wink:
For the sake of keeping the gsub() saga.

awk 'NR==FNR {gsub(" \47", "\47");gsub("\47\47", "\47 \47"); t=$1; gsub("\47", "", t); s[t]=$0; next} $1 in s {print s[$1]}' file2 file1
1 Like

If you have an empty quoted field in the input such as in:

'alfa '    'keepnumbers '  'keepnumbers '   ''

this will remove spaces at the ends of the quoted fields that have spaces and add a space to the empty quoted field. It also still removes a space between fields when there are multiple spaces between field in the input. as in:

'alfa'   'keepnumbers' 'keepnumbers'  ' '

instead of:

'alfa'    'keepnumbers'  'keepnumbers'   ''

But, of course, we have no way of knowing whether or not this matters to the OP since requirements for these cases were not specified.

1 Like

Thank you very much both! Both of your solutions worked ut fine! you are an inspiration!