webkid
December 2, 2010, 3:28pm
1
I have file as shown below. Would like to split the file based on the context of data.
Like, split the content between "---- XXX Info ----" and "
---- YYY Info ----" to a file.
When I try using below command, 2nd file contains all the info starting after first "---- YYYY Info ----" instance.
csplit -ks pfm.txt '%XXX Info%' '/^---- YYY Info ----/' {2}
Any suggestions how to split the only reqd. data as mentioned above.
---- XXX Info ----
Buuuu xxx bbb
Kmmmm rrr ssss uuuu
Kwwww zzzz ccc
Roooowwww eeee
Bxxxx jjjj dddd
---- YYY Info ----
Kuuuu eeeee nnnn
Rpppp cccc vvvv cccc
Rhhhhhhyyyy tttt
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- YYY Info ----
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
hhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- YYY Info ----
---------- Post updated at 03:28 PM ---------- Previous update was at 03:26 PM ----------
For clarification:
I need output files like:
file 1:
---- XXX Info ----
Buuuu xxx bbb
Kmmmm rrr ssss uuuu
Kwwww zzzz ccc
Roooowwww eeee
Bxxxx jjjj dddd
file 2:
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
file 3:
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
ctsgnb
December 2, 2010, 4:07pm
2
awk '/^----/{f="file"(++c)".txt"}{print $0 > f}' input
$ cat in
---- XXX Info ----
Buuuu xxx bbb
Kmmmm rrr ssss uuuu
Kwwww zzzz ccc
Roooowwww eeee
Bxxxx jjjj dddd
---- YYY Info ----
Kuuuu eeeee nnnn
Rpppp cccc vvvv cccc
Rhhhhhhyyyy tttt
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- YYY Info ----
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
hhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
$ awk '/^----/{f="file"(++c)".txt"}{print $0 > f}' in
$ ls *.txt
file1.txt file2.txt file3.txt file4.txt file5.txt
$ cat file4.txt
---- YYY Info ----
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
hhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
$
1 Like
webkid
December 2, 2010, 5:51pm
3
Thanks for the reply. This seems to be working great on small files. However, I am seeing following problem with a big file.
# awk '/^----/ {print $2}' testy
Port
RG
LU
# awk '/^----/ {f="file"$2".txt"}{print $0 > f}' testy
awk: can't open file
record number 1
Any idea whats going on?
---------- Post updated at 05:51 PM ---------- Previous update was at 05:08 PM ----------
Actually there were couple of lines a head of the file before it starts with ---- (as shown below). This was causing the problem.
As work around, I removed those lines using csplit prior to run the code you suggested. Is there any better solution for this.
Kwwww zzzz ccc
Buuuu xxx bbb
---- XXX Info ----
Buuuu xxx bbb
Kmmmm rrr ssss uuuu
Kwwww zzzz ccc
Roooowwww eeee
Bxxxx jjjj dddd
---- YYY Info ----
Kuuuu eeeee nnnn
Rpppp cccc vvvv cccc
Rhhhhhhyyyy tttt
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- YYY Info ----
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
hhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
ctsgnb
December 2, 2010, 6:33pm
4
You can ignore the first lines until the first ^----
appear by this very little modification of the code :
awk '/^----/{f="file"(++c)".txt"}c{print$0>f}' input
webkid
December 2, 2010, 7:24pm
5
I am getting following error.
# awk '/^----/{f="file"(++c)".txt"}c{print$0>f}' /tmp/tt
awk: syntax error near line 1
awk: bailing out near line 1
Scott
December 2, 2010, 7:26pm
6
If you are using Solaris, use nawk or /usr/xpg4/bin/awk
webkid
December 3, 2010, 2:12pm
7
nawk works. However, what should I use If I have to use $2 instead of ++c.
awk '/^----/ {f="file"$2".txt"}?{print $0>f}' /tmp/tt
instead of
awk '/^----/{f="file"(++c)".txt"}c{print$0>f}' /tmp/tt
Thanks.
Scott
December 3, 2010, 2:53pm
8
Hi.
Assuming you still want one output file per "section":
awk ' /^----/ { file = "file" $2(++c) }
c { print > file }
' input.txt
1 Like
webkid
December 3, 2010, 7:23pm
9
Great. Thanks for your help.