Command to get URL under string "SIT"

Hi All,

I have a requirement where I need to get all https URL's under config file SIT. example :

config environment="SIT"

<URL>https://yahoo.com</URL>

There are 100 files.xml and I need to search every .xml file and get URL's.

I tried with below command , but I'm getting URL's from 20 + folders, but I need from all 100 files.

find . -name �*.xml� |xargs grep -n �https�

To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags

```text
 and 
```

by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

What error messages are being printed by find , xargs , and grep ?

What are the exact names of the *.xml files that are not being searched properly?

Show us an example of the URLs in one of the files that are not being extracted properly by the command you showed us (using CODE tags).

Sorry for late reply, I'm using below command:

find . -name �*.xml� |xargs grep -n �SIT�

my config file has details of

"DEV, SIT,UAT and PROD" 

as below:

<configs>
	
	<config environment="DEV">
		<routingURL>https://yahoo.com</routingURL>
	</config>
	<config environment="SIT">
		<routingURL>https://google.com</routingURL>
	</config>
	<config environment="UAT">
		<routingURL>https://yahoo1.com</routingURL>
	</config>
	<config environment="PROD">
		<routingURL>https://yahoo2.com</routingURL>
	</config>
</configs>

The folder I'm searching has 100 xmls' and I need to traverse from each files and get all https exactly under SIT.

The command I'm using is showing output of URL's which has SIT has the part of it.

Please suggest.

Thanks,
VJ

You didn't mention what operating system you're using (which always helps when you ask for help in this forum), but if you your version of grep includes a -A option (which is not required by the standards), an easy way to do what you seem to want would be:

grep -A1 '"SIT"' *.xml |grep 'https'

An alternative would be:

awk '/"SIT"/+1 && /https:/' *.xml

As always, if you want to try the awk command on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

Thanks Don. The operating system I'm using is LINUX.

I can see its printing the URL under SIT, but I forgot to mention that I need to search under different folders and so I have used below command

find . -name "*.xml" |xargs grep -A1 "SIT" | grep 'https'

It is actually printing everything , so I need values of folder which has

*.xml

and go into that and get URL which is under SIT.

@Don Cragun: Please explain awk '/"SIT"/+1 && /https:/' *.xml

1 Like

Ouch! Thank you for catching the problem.

I mistakenly thought /"SIT"/+1 was selecting the line following a line containing "SIT" and the /https:/ would be true on those lines that contained https: causing that line to be printed.

The following seems to correctly do what I was trying to do:

awk 'n && /https:/;n = /"SIT"/{}' *.xml

Hahaha - I was trying and testing to and fro to find out what you meant and what the trick was...

I don't understand what you are saying here.

With the sample input you showed us, exactly what output are you trying to produce?

If the input you showed us was incomplete or not representative of your actual input, please provide a new example clearly explaining what you are trying to do AND show us the exact output you are trying to produce from that expanded example.