Awk or Sed, fubd match in column, then edit column.

glev2005 · February 4, 2011, 3:08pm

FILE A:

9780743551526,(Abridged)
9780743551779,(Unabridged)
9780743582469,(Abridged)
9780743582483,(Unabridged)
9780743563468,(Abridged)
9780743563475,(Unabridged)

FILE B:

c3saCandyland    9780743518321    "CANDYLAND"    "MCBAIN, ED"    2001 
c3sbCandyland    9780743518321    "CANDYLAND"    "MCBAIN, ED"    2001 
c4saCandyland    9780743518321    "CANDYLAND"    "MCBAIN, ED"    2001 
c4sbCandyland    9780743518321    "CANDYLAND"    "MCBAIN, ED"    2001 
c1sA Fourth Mega Market    9780743518482    "FOURTH MEGA  MARKET"    "ACAMPORA, RALPH"    2004

I need, that if the 13 digit number from FILE A: matches the 13 digit number in the second column of FILE B, to then print the word (Abridged) or (Unabridged) with the parentheses, depending on whichever was next to the number in FILE A, to the end of the title, which is the first thing in quotes on each line in FILE B.

Example:

FILE A:
9780743551526,(Abridged)

FILE B:
c3sbCandyland    9780743551526    "CANDYLAND"    "MCBAIN, ED"    2001  

Desired Output:
c3sbCandyland    9780743551526    "CANDYLAND (Abridged)"    "MCBAIN, ED"    2001

bartus11 · February 4, 2011, 5:36pm

awk 'NR==FNR{split($0,s,",");a[s[1]]=s[2];next}$2 in a{sub("\"$"," "a[$2]"\"",$3)}1' a b

glev2005 · February 4, 2011, 7:37pm

gregg@ubuntu1:~$ cat filea
9780743551526,(Abridged)
gregg@ubuntu1:~$ cat fileb
c3sbCandyland 9780743551526 CANDYLAND MCBAIN, ED 2001
gregg@ubuntu1:~$ awk 'NR==FNR{split($0,s,",");a[s[1]]=s[2];next}$2 in a{sub("\"$"," "a[$2]"\"",$3)}1' filea fileb
c3sbCandyland 9780743551526 CANDYLAND MCBAIN, ED 2001
gregg@ubuntu1:~$

Doesnt seem to work, am I missing something?

yinyuemi · February 4, 2011, 8:56pm

try:

awk 'NR==FNR{sub(","," ");a[$1]=$2}NR>FNR{for(i=1;i<NF;i++) {if($i~/^[0-9]../){$(i+1)="\""$(i+1)" "a[$i]"\""}printf $i"\t"}print ""}'  file1 FS="\t"  file2

rdcwayx · February 6, 2011, 4:04am

awk -F, '
NR==FNR{a[$1]=$2;next} 
{for (i in a) {if ($0~i) $2=$2 " " a}}1
' FileA FS="\"" OFS="\"" FileB

glev2005 · February 7, 2011, 9:47am

Yinuemi, your code is fast, but is not completely correct. it strips the date at the end and adds many extra quotes. Rdcwayx, It appears you have fround the solution! I am checking it now. your code works much slower than Yinuemi, but gives the correct output. Thank you!

---------- Post updated at 09:47 AM ---------- Previous update was at 09:43 AM ----------

RDC, I would love it if you could explain your code a bit.. I know basic AWK but this is beyond my level.

yinyuemi · February 7, 2011, 2:52pm

Hi Glev, you can try this:

awk 'NR==FNR{sub(","," ");a[$1]=$2}
NR>FNR{for(i=1;i<=NF;i++) {
if($i~/^[0-9]...../){$(i+1)=substr($(i+1),1,length($(i+1))-1)" "a[$i]"\""}printf $i"\t"}print ""}' fileA  fileB

rdcwayx · February 7, 2011, 9:38pm

if your awk support gensub function, below code will save much time.

$ awk   --version  |head -1
GNU Awk 3.1.8

$ awk --re-interval -F, '
NR==FNR{a[$1]=$2;next}
{ key=gensub(/.+([0-9]{13}).+/,"\\1",1)
  if (key in a) $2=$2 " " a[key]
}1' FileA FS="\"" OFS="\"" FileB