Merge multiple lines in same file with common key using awk

protosd · November 26, 2012, 3:22am

I've been a Unix admin for nearly 30 years and never learned AWK. I've seen several similar posts here, but haven't been able to adapt the answers to my situation. AWK is so damn cryptic!

I have a single file with ~900 lines (CSV list). Each line starts with an ID, but with different stuff after it which I want to merge onto a single line. Sometimes there are just a couple of lines to be merge, sometimes there are 4 or 5. The input file is already sorted.
Here's an example:

2,"URL","website","aaa.com"
2,"abc","text","some text"
2,"Password","password","12345678"
3,"URL","website","10.0.10.75"
3,"uname","text","some text"
3,"Password","password","password"
3,"ulang","text","some text"
4,"URL","website","192.168.2.110"
4,"Login","login","admin"
4,"Password","password","blah-blah"

Output should look like:

2,"URL","website","aaa.com","abc","text","some text",Password","password","12345678"
3,"URL","website","10.0.10.75","uname","text","some text""Password","password","password","ulang","text","some text"

---------- Post updated at 12:22 AM ---------- Previous update was at 12:11 AM ----------

I forgot to mention, each line in the input file only has 4 fields.

elixir_sinari · November 26, 2012, 3:27am

Assuming that the first occurrence of the whole second field is at the second-field:

awk -F, 'NR!=1 && p1!=$1{print prev;prev=""}
{p1=$1;prev=(prev"")?prev FS substr($0,index($0,$2)):$0}
END{if(prev"") print prev}' file

pamu · November 26, 2012, 3:45am

Try

awk -F "," 's != $1 || NR ==1{s=$1;if(p){print p};p=$0;next}
{sub($1,"",$0);p=p""$0;}END{print p}' file

protosd · November 26, 2012, 3:55am

AMAZING!

Thank you, that works!

I wish I could learn AWK, I know it's really powerful. I've coded in many different languages and can usually work out what stuff does, but AWK is so cryptic!

I'm looking at what you posted trying to understand. If you're up giving me a little tutorial, I'd love to understand.

Where does p1 come from?

Is prev a built-in variable?

# While NR (number of records) is not equal to 1 and p1 is not equal to the first field (the ID)
NR!=1 && p1!=$1

# Print the line and append a null so the line will join with the next
{print prev;prev=""}

# p1 gets the value of the ID field
{p1=$1;

# This is a bit cryptic, but it looks like you're assigning/merging the contents
# of previous line with the next one and stripping the first field??
prev=(prev"")?prev FS substr($0,index($0,$2)):$0}

# Also a bit cryptic, but guessing we've reached the EOF and spitting out what's left?
END{if(prev"") print prev}'

---------- Post updated at 12:52 AM ---------- Previous update was at 12:48 AM ----------

---------- Post updated at 12:55 AM ---------- Previous update was at 12:52 AM ----------

That works well also!

Thanks!

The great thing about Unix and AWK is there are so many different ways to do stuff.

The thing about AWK is it's hard to tell what special meaning certain things have and where they come from.

I wish I'd learned it a long time ago.

elixir_sinari · November 26, 2012, 4:10am

p1 is used to store the value of the first field. Since this is done at the end of each line processing, it becomes the previous value of the first field for the current line.

No. In awk , the built-in variables are all upper-case words. prev is used to store the line made up by appending segments of the lines which have common first fields. After printing each made-up line, we need to nullify the variable to make it ready for the next line to be made up.

And the cryptic line does what you thought it does. If prev"" ( prev is not-null) is true, then we must be having at least 1 line in prev so we need to append a , ( FS - field separator, comma in this case, and that's a special/built-in awk variable) and the substring of the line from the second field onwards. Else if prev is null, then it must be the first time since printing the last prev . So, just assign the current line ( $0 ) to prev .

And, I think you've understood rest of the things.

protosd · November 26, 2012, 4:20am

prev seems to magically appear where you say print prev, but where do you assign its contents?

Is it somehow defaulting to $0 ?

elixir_sinari · November 26, 2012, 4:50am

prev=(prev"")?prev FS substr($0,index($0,$2)):$0

The above line does the assignment:

Evaluate (prev"") .

If true, return prev FS substr($0,index($0,$2)) .

If false, return $0 .

And, this returned value is stored back in prev .