Return alias with pattern

jalaj841 · February 13, 2015, 2:07am

hello all,

i have ids repeated with alias names starting with different patterns like 'ab' , 'fg' etc...sometimes the same pattern names are repeated with the same ids like id=1 has names ab1 as well as ab3.

Lookup.txt

name id
ab1 1 
fg22 1
ab3 1
er2  1
fgh1 1
fg21 2
ab2 2
ab31 2
ab4 3
fgh4 3
fgh5 4
er3 4

Given a list of names

List.txt

fg22
fg21
er3

and a starting pattern , say 'ab' ... I would like to return the corresponding alias names (of the same id) , if available ..

I am looking for script, lookup.sh or awk or pl which when passed parameters

./script list.txt pattern

in this case

./lookup.sh list.txt ab

will return the pattern name , where available. In case where the id has multiple ab names, any one can be returned.

output.txt

ab1
ab2
er3

Don_Cragun · February 13, 2015, 2:35am

How big are your files?

What have you tried? Where are you stuck?

RudiC · February 13, 2015, 4:36am

Why would er3 show up in your output?

Don_Cragun · February 13, 2015, 5:03am

Because there is no name in lookup.txt with the abbreviation ab ( ab1 , ab3 , ab2 , or ab4 ) that has the same id as the id for name er3 (4).

I have a working awk script for this problem, but I'm waiting for the submitter to explain what has been tried (to show that we aren't just being used as an unpaid programming staff).

RudiC · February 13, 2015, 5:11am

Wide range for interpretation, no?

I'd read this to suppress names not starting with "ab" ... but I'm no native speaker.

Don_Cragun · February 13, 2015, 5:24am

Hi Rudi,
There is also the later statement in post #1:

that at least reduced the range of interpretations that fit the requirements. The way I interpreted it fit the output that was requested, but I agree that without the sample output I could interpret the requirements differently.

Anyway, i'm going to bed now; it is way past my bedtime...

Don

jalaj841 · February 13, 2015, 9:52am

I`m sorry if I made this post ambiguous, reading through your replies I think Don understands the problem correctly.

Here is my attempt, which gives me a wrong output

  awk  '{if(a[$2]){a[$2]=a[$2]","$1} else { a[$2]=$1}} END {for (i in a) {print a}}' lookup.txt > temp
  
  
  awk -F,  'FNR == NR {a[$0];next} {split($0,x,","); for (i=1;i<=length(x);i++) { if ($i~/^ab/) { $1=$i; break } } ; print $1}'  list.txt  temp

---------- Post updated at 10:51 AM ---------- Previous update was at 10:50 AM ----------

You are right, suppress the non "ab" names and return any of the "ab" names, where available.

---------- Post updated at 10:52 AM ---------- Previous update was at 10:51 AM ----------

File is not that big, 120k records.

RudiC · February 13, 2015, 2:07pm

Try this

awk     'FNR==NR        {T[$1]; next}
         FNR==1         {FILE++}
         FILE==1        {if ($1 in T) ID[$2]
                         next}
         $2 in ID &&
         $1 ~ "^" PAT    # {print; delete ID[$2]}
        ' PAT="ab"  list lookup lookup
ab1 1 
ab3 1
ab2 2
ab31 2

Remove the # before the last action to achieve an output like

ab3 1
ab2 2

Don_Cragun · February 13, 2015, 2:40pm

You could also try something like this:

#!/bin/ksh
awk -v abbrev="${2:-ab}" '
# Process 1st file (lookup.txt)...
FNR == NR {
	if(NR > 1) {	# Skip over heading in file lookup.txt.
		# Build the arrays a[abbreviation, id] = name for the 1st
		# abbreviation found for a given id, and n2id[name] = id for all
		# names.
		if(!(((ab = substr($1, 1, match($1, /[[:digit:]]/) - 1)), $2) in a))
			a[ab, $2] = $1
		n2id[$1] = $2
#printf("in=%s, n2id[%s]=%s, a[%s,%s]=%s\n", $0, $1, n2id[$1], ab, $2, a[ab,$2])
	}
	next
}
# Process 2nd file (1st script operand)...
(abbrev, n2id[$1]) in a {
	$1 = a[abbrev, n2id[$1]]
}
# Print original or converted input line...
1' lookup.txt "${1:-list.txt}"

This was written and tested using the Korn shell, but will also work with any other shell that performs POSIX standard variable expansions (such as bash , ash , or dash ).

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

It is intended to be invoked with two operands, but if only one operand (specifying the list file to be used) is provided, the abbreviation defaults to "ab"; and if no operands are provided, the list file defaults to "list.txt" and the abreviation defaults to "ab".

(If you uncomment the printf statement, you can watch it build the arrays it will be using as it reads lookup.txt.)

With your sample input, the output produced when it is invoked as:

./lookup.sh list.txt ab
        or
./lookup.sh

is:

ab1
ab2
er3

and when invoked as:

./lookup.sh list.txt fgh

it produces the output:

fgh1
fg21
fgh5