Two Input Lines Into Single Output Line (CSV)

svermill · July 23, 2011, 4:24pm

Hi all,

My search karate must be weak because I'm about certain something very like this has been asked and answered here many times. I'll give you the exact scenario I've wasted a few hours of my Saturday on: :wall:

I'm trying to read through a very large number (~200) of router and switch configuration .txt files. My end-state goal is the have a spreadsheet that will show each switch name (it's embedded in the .txt file name, so no need to match the "hostname" line in the configs) and list the NTP servers it has configured. Some may have none configured (I still want an output line for these devices), most will have only one server, and some will have two.

So let's say I have a folder containing these three files:

router1.txt
router2.txt
switch1.txt

In this example, router1 has no NTP server. router2 has 'ntp server 1.2.3.4' in its configuration. switch1 has two lines as follows:

ntp server 1.2.3.4 prefer
ntp server 4.5.6.7

I want to create as output a .csv that looks as follows:

router1.txt,
router2.txt,ntp server 1.2.3.4
switch1.txt,ntp server 1.2.3.4 prefer,ntp server 4.5.6.7

I will then do "text to columns" in the spreadsheet, set the delimiter to ",", and then do a global "find and replace for ".txt" to "" (to remove the input file extension and then I basically have the device name in the left-most column). I have figured out how to get either grep or awk to print the input file name. And obviously I know how to match "ntp server" as a string. What I've been failing at so far is getting a single line of output per input file, comma-separated, regardless of the number of individual lines (including zero!) with the matching string found in said input files.

Can anybody help with that? I would be very grateful to you!!

Edit: Each input file contains a "hostname" line and that value is equal to the input file name (obviously less the .txt extension), so if it's difficult to print the input file name even if there isn't a match for "ntp server," then it would be just fine to also match on the hostname, thereby ensuring that I get an output line per input file, even when I don't hit on an ntp server line in the config. Hope that makes sense. Thanks again!

bartus11 · July 23, 2011, 5:18pm

If you don't mind comma at the end of line:

for i in *; do awk 'NR==1{printf FILENAME","}/ntp server/{printf $0","};END{printf "\n"}' $i;done > out.csv

svermill · July 23, 2011, 8:13pm

Thanks bartus11! I think you've gotten me to within a character or two of the solution. Here's what I'm seeing:

router1.txt,ntp server 10.40.40.1 prefer
,ntp server 10.40.40.2
,

Your help is very, very much appreciated!

Edit: The comma at the end doesn't bother me a bit (so long as it's at the end of a single line)!

agama · July 23, 2011, 8:23pm

A slight twist that doesn't need to invoke awk for every file (more efficient), writes a trailing comma only if there are not any ntp records (as per the example), and translates the filename (without the need for an extra sed pass over the file).

ls *.txt | xargs awk '
FILENAME != last {
        fn = FILENAME;
        gsub( ".txt", "", fn );            
        printf( "%s%s%s", n ? "\n" : "", fn, k ? "" : "," );
        last = FILENAME;
        n = 1;     # signal need for newline before fn (prevent lead newline)
        k = 0;     # when set, signals need for comma after fn
}

/ntp server/ {
    printf( ",%s", $0 );
    k = 1;
}

END {
    printf( "%s\n", k ? "" : "," );
}'  >out.csv

Hope this helps, or gives you some more ideas.

svermill · July 23, 2011, 8:41pm

Thanks also be to agama! I must say that some of your code is a tad over my head (I will take the time to understand it, though!), so I'm unable to properly break it down in my own mind. All that I can tell you is that I'm getting out.csv with the name of the very first device in the folder - nothing else follows that. Again, it seems as though this has got to be within a character or two of being a workable solution...

Edit: Oh, I guess I probably just assumed that a trailing comma would follow the filename in the case of zero "ntp server" matches. It's certainly not necessary; the the spreadsheet text-to-columns feature will not care one way or the other and nor would I...

agama · July 23, 2011, 9:42pm

Hummm. For me it's doing what I expect. If you run it this way, without xargs, is anything different? (I'm assuming that you don't have many .txt files to run it this way.) I don't expect there to be any difference, but it never hurts to try.

awk '
FILENAME != last {
        fn = FILENAME;   
        gsub( ".txt", "", fn );      
        printf( "%s%s%s", n ? "\n" : "", fn, k ? "" : "," );
        last = FILENAME;
        n = 1;     # signal need for newline before fn (prevent lead newline)
        k = 0;     # when set, signals need for comma after fn
}

/ntp server/ {
    printf( ",%s", $0 );
    k = 1;
}

END {
    printf( "%s\n", k ? "" : "," );
}'  *.txt >out.csv

I also assume that if you execute ls *txt from the command line it lists all of the files you expect the awk to parse.

I'd guess that this line is tough to understand:

printf( "%s%s%s", n ? "\n" : "", fn, k ? "" : "," );

Each %s is replaced with a string from the parameter list. The parameters are an optional leading comma, the filename (without .txt) and an optional trailing comma. The tertiary operator has the form:

expression ?  true-action : false-action

So for n ? "\n" : "" the string with the newline (\n) is added to the printf string at the first %s when the variable n is non-zero. When n is zero, then a null-string (no comma) is added.

---------- Post updated at 21:42 ---------- Previous update was at 21:38 ----------

It did just occur to me that if you are running on a Sun, you need to use nawk rather than awk. I don't have a Sun box to test, but the symptom sure seems like the FILENAME variable isn't functioning as we'd expect and that does suggest a different version of awk.

svermill · July 23, 2011, 11:04pm

Well I'm feeling very sheepish now because I thought to - and then failed to - mention that this is Mac OS X BSD. I know that sometimes things aren't fully supported in this case, and in retrospect that should've been clearly flagged by the OP (yours truly!). Very sorry 'bout that all!

OK, with that said, I tried your latest proposed code and things are looking very promising. A couple of details that I suspect you'll understand at a glance (example):

router1
router2,,ntp server 10.10.250.1
switch1,ntp server 10.10.250.1
switch2,ntp server 10.10.250.2
,ntp server 10.10.250.1 prefer

The double comma won't be too difficult to deal with in the spreadsheet. However, the last two lines above put me back to the original problem of needing to manually manipulate a lot of stuff (although I originally said ~200 devices, this is something that I'm going to need to repeat often, so the workload would be high).

If it's in any way helpful, I can post sanitized configs of these devices. However, these are all Cisco routers and switches, so you basically have an area of the config where zero, one, or two (or conceivably more but not in this specific case) instances of 'ntp server x.y.z.n' exist in succession...

And thanks once again!

Edit: I don't know whether or not this is of interest, but in the case of "switch2" above, I noticed in the config the following:

ntp server 10.10.250.2
ntp server 10.10.250.1 prefer

So the order is neither by lowest IP nor "prefer" first. Presumably, then, it's by chronological order of entry. Something I've never noticed in many years of working with Cisco configs but then again this is the first time I've tried parsing them in this way...

Further edit: Strangeness abounds. Another detail I just noticed is that router2 above doesn't actually have an 'ntp server' command at all in its config; rather, it has an 'ntp peer' command. Something I hadn't considered at first and need to account for. <<<NOT TRUE!! It actually turns out to be the case that it has both an 'ntp server' and an 'ntp peer' command. Not sure what I screwed up in my search that led to that misunderstanding on my part?? In any event, it is not the case as far as I can tell in reviewing things that there is any kind of misalignment of this sort. If no 'ntp' command of any sort exists, I'm only seeing the filename/hostname, as desired...

Further, further edit: Just noticed that in the relatively few cases where I see a double-comma as in the case of router2 above, it's always preceded by a router1 (no 'ntp server' command in the config). You're obviously highly observant of such details and so probably didn't need me to point this out. However, I just noticed it as a consistent pattern, so I thought I'd once again edit this post (sorry!)...

---------- Post updated at 08:51 PM ---------- Previous update was at 08:14 PM ----------

OK, enough edits to the above have already been published, so a totally new post here...

Again, not sure if such a detail is anywhere remotely significant, but I thought to add that Cisco automatically adds additional NTP commands to the config that no user has ever entered on the keyboard. Typically, you will see:

'ntp clock-period 1234567890'

immediately prior to the 'ntp server' commands (this is not a "stanza," per se; they are all at the main level of the config - totally to the left). This is an internal/proprietary that that Cisco has never bothered to explain. "Just is what it is" type of thing...

---------- Post updated at 09:04 PM ---------- Previous update was at 08:51 PM ----------

Lastly (I promise for tonight!), I should perhaps mention that NTP is the very last section of the config of most (all?) devices and so the .txt file looks something like this:

router1#show run

ntp clock-period 1234567890
ntp server 1.2.3.4 prefer
ntp server 4.5.6.7
end

router1#show blahblah

agama · July 23, 2011, 11:08pm

Ok, I've found the bug that was causing the double comma. The only dummy input file that was without an ntp record was the last one, so I wasn't seeing that I had coded that incorrectly.

Have a go with this:

 awk '
    BEGIN { n = 0; k = 1;      # needed for bug fix
    }

    FILENAME != last {
        fn = FILENAME;
        gsub( ".txt", "", fn );
        printf( "%s%s%s", k ? "" : ",", n ? "\n" : "",  fn );  # bug fix here
        last = FILENAME;
        n = 1;     # signal need for newline before fn (prevent lead newline)
        k = 0;     # when set, signals no comma needed in previous printf
    }

    /ntp server/ || /ntp peer/{
        printf( ",%s", $0 );
        k = 1;
    }

    END {
        printf( "%s\n", k ? "" : "," );
    }'

I'm still perplexed with the newline when there is a second record because I'm not seeing that with my dummied up test data. I'll continue to play with this for a bit, but wanted to see that this should have made the difference with the double commas.

I assumed that the 'ntp peer' record was to be handled the same way as the ntp server record. If not, post how you need it and a sample of the text from the file.

And not to worry about not posting the o/s type. It can help, but in this case I don't think OSX and its tools are to blame.

svermill · July 24, 2011, 12:02am

Me neither! (as if!)

Then it's obviously some important detail that I'm failing to share with you.

You assume correctly!

Glad to hear it!

And I just learned something new about these forums...

I was refreshing the thread each time before I posted an edit to my last post with the idea I would know that you hadn't been back (or at least that you hadn't posted anything new). However, when I reloaded the thread from the forum index, I saw this post from you. Sorry for any confusion that may have introduced!!

---------- Post updated at 10:02 PM ---------- Previous update was at 09:52 PM ----------

OK, what is it about:

awk 'BEGIN (...)

I'm having trouble with??

I guess I should call it a night and start fresh in the morning. I can't thank you enough for what you've done to get me this far...

agama · July 24, 2011, 12:39am

I didn't include the ls command and the pipe. My test script is still using xargs, so I only cut and pasted the awk portion.

I'm calling it a night; not sure when I can peek in tomorrow -- it will probably be early afternoon. You are most welcome for the help. I'm always surprised at what I end up learning as I figure out, or read other's solutions.

svermill · July 24, 2011, 12:59am

Understood. I have spent countless hundreds (likely thousands) of hours over the years doing likewise for others on the network engineering side (I'm a CCIE if that means anything to you), and I've learned as much if not more from that process as anything.

I have tried a bunch of different things but I always get a ">" after the first line of whatever I paste into my shell. Likely my ignorance is start to seriously show if it hasn't already. Sleep will likely help too!

Good night!

bartus11 · July 24, 2011, 5:18am

Do you have Perl on your Mac?

perl -nle 'printf "$ARGV," if $.==1;printf "$_," if /ntp server/;if (eof){$.=0;printf "\n"}' * > out.csv

agama · July 24, 2011, 11:47am

I can explain the > --

The greather than sign is the 'secondary' prompt (also known as PS2) which is issued when a newline is entered before a terminating single quote. So, pasting the awk snipit would cause this.

The easy solution, or in my opinion the easiest, is to edit a file (test_script.ksh or somesuch) with your favorite editor and paste the code example into that. Save the file, and then from the command line execute the script using ksh or bash (or your preferred shell). Like this:

ksh test_script.ksh

The other advantage to this is that you can edit the file and easily tweek things if you want, and you have the template for a script when you have something that works.

You can also use Kshell's (or bash's) command line editing features that allow direct invocation of emacs or vi, but that's more difficult to explain if you aren't already familiar with it.

svermill · July 24, 2011, 12:53pm

Yeah, I definitely needed a good night's sleep. So I stayed up and smoked a cigar instead!

Anyway, with xargs using both bash and ksh, I was only getting the hostname of the very first device in the folder - nothing else (so no commas, no ntp commands, no nothin'). So I dropped xargs and did *.txt in awk. I get an output file and both 'ntp server' and 'ntp peer' commands are present. However, only my to two core devices have a single-line entry for multiple ntp matches; all others have a newline between first and second entries when there are two matches.

Now this may or may not be an interesting detail: I was unaware that my core devices actually have three 'ntp server' commands. And yes, they're strangely all on one single line in the output file. I don't see any real differences between the core configs and any of the other configs??

---------- Post updated at 10:49 AM ---------- Previous update was at 10:45 AM ----------

Hi bartus11,

Yes, I just installed ActivePerl the other day (and I believe that some stripped down/likely outdated version of Perl ships native to Mac OS X). I'm trying to ensure that I actually understand (as much as possible) what I'm doing, though, so I don't want to jump to Perl just yet - my head is too full of new stuff this weekend as it is! But I promise that once I get this other approach sorted out, I'll turn my attention to your example Perl code. It certainly looks nice and streamlined!

Thanks again to you both!

---------- Post updated at 10:53 AM ---------- Previous update was at 10:49 AM ----------

OK, I admit that I just pasted in your Perl script to see what would happen.

Here's an example of the output:

router1.txt,
router2.txt,ntp server 10.10.250.1
,
switch1.txt,ntp server 10.10.250.2
,ntp server 10.10.250.1 prefer
,

Still suffering from newline issues??

bartus11 · July 24, 2011, 1:33pm

Can you post output of:

cat -eT switch1.txt

svermill · July 24, 2011, 2:18pm

It never takes long before there's something I can't do on OS X that could normally be done on a true *nix box. I couldn't do the -T switch but I could do a lower-case -t switch. For whatever it's worth, here's the output from that:

$ cat -eT switch1.txt | awk '/ntp/'
cat: illegal option -- T
usage: cat [-benstuv] [file ...]

$cat -et switch1.txt | awk '/ntp/'
ntp server 10.10.250.2^M$
ntp server 10.10.250.1 prefer^M$

Tell you anything valuable?

bartus11 · July 24, 2011, 2:19pm

Does your Mac have dos2unix utility?

svermill · July 24, 2011, 2:35pm

No, but what do you think about the final two posts in this thread:

Convert end-of-line characters (revisited) - Mac OS X Hints

It appears there're many ways to go about this, so if you could be so kind as to point out what I'm to replace (and what with), I'm sure I can get it done. Heck, maybe I can get it done during the creation of these .txt files (I'm generating them via a TCL script that calls Expect and does a bunch of interactive stuff to give me not only the configs, but a bunch of 'show processor cpu history' and so forth). Evidently that process is fouling up the EOL or something?

bartus11 · July 24, 2011, 2:52pm

Try this:

perl -nle 's/\x0d$//;printf "$ARGV," if $.==1;printf "$_," if /ntp server/;if (eof){$.=0;printf "\n"}' * > out.csv

svermill · July 24, 2011, 3:04pm

Yes, the above does work quite nicely! It gives me an output file in about one second, which isn't bad considering I have roughly 200 .txt files (some of which are quite large) in the directory.

However, I'm even less Perl aware than I am grep/awk aware, so I really have no idea what the above does or how. I have many, many applications for such as script, so I'm hopeful to walk away from all of this with a template that can be adapted/evolved. Not knowing much about Perl at all, I fear I would have to seek help getting anything new out of the above. Can you please help me to understand it, at least on a high level? I won't be the least bit offended if you say "no" - you've already given of your time.

agama,

I suppose this whole EOL thing made perfect sense to you all along. Is there now a tweak to your code that would work? I'm not sure what I have to do in order to get xargs going but I'll shortly find out...

---------- Post updated at 01:04 PM ---------- Previous update was at 01:03 PM ----------

Oh, and can I limit the above to just files in the directory with .txt extensions?