Bob's Notepad

Notes on projects I have done and things I have learned saved for my reference and for the world to share

Monday, October 22, 2007

Using WGET to retrieve all files of a certain type

wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i ~/mp3blogs.txt

And here's what this all means:

-r -H -l1 -np These options tell wget to download recursively. That means it goes to a URL, downloads the page there, then follows every link it finds. The -H tells the app to span domains, meaning it should follow links that point away from the blog. And the -l1 (a lowercase L with a numeral one) means to only go one level deep; that is, don't follow links on the linked site. In other words, these commands work together to ensure that you don't send wget off to download the entire Web -- or at least as much as will fit on your hard drive. Rather, it will take each link from your list of blogs, and download it. The -np switch stands for "no parent", which instructs wget to never follow a link up to a parent directory.

We don't, however, want all the links -- just those that point to audio files we haven't yet seen. Including -A.mp3 tells wget to only download files that end with the .mp3 extension. And -N turns on timestamping, which means wget won't download something with the same name unless it's newer.

To keep things clean, we'll add -nd, which makes the app save every thing it finds in one directory, rather than mirroring the directory structure of linked sites. And -erobots=off tells wget to ignore the standard robots.txt files. Normally, this would be a terrible idea, since we'd want to honor the wishes of the site owner. However, since we're only grabbing one file per site, we can safely skip these and keep our directory much cleaner. Also, along the lines of good net citizenship, we'll add the -w5 to wait 5 seconds between each request as to not pound the poor blogs.

Finally, -i ~/mp3blogs.txt is a little shortcut. Typically, I'd just add a URL to the command line with wget and start the downloading. But since I wanted to visit multiple mp3 blogs, I listed their addresses in a text file (one per line) and told wget to use that as the input.

Labels: , ,

Reference Link


Anonymous Anonymous said...

Bless you!

I really didn't want to right-click a thousand times.

I'm a lazy man, and you've helped me to be efficient at it : )


21/8/08 9:41 PM  
Blogger Unknown said...

pickcrafter idle craft mod apk

24/8/19 7:25 AM  
Blogger harish sharma said...

beetv apk


10/11/19 6:55 AM  
Blogger harish sharma said...

Redbox tv

25/1/20 4:41 PM  
Blogger 4i7fy9u0sa said...

There’s nearly no on-line on line casino right now where you won’t discover products from Evolution gaming. The presence of a monitor is important to the vendor too as it urges them to take motion when needed and allows them to keep track of the bets could be} positioned and those could be} closed. The monitor also allows the vendor to see the players who're on-line. As mentioned 1xbet above, players and sellers can engage in a reside chat, so each concern is promptly solved. Cash out options are extra limited for casino than sports activities as there are designated betting times. However, there are video games, such as Unibet’s blackjack providing, where cash out is possible.

20/1/23 7:43 AM  

Post a Comment

<< Home