Bob's Notepad

Notes on projects I have done and things I have learned saved for my reference and for the world to share

Monday, October 22, 2007

Using WGET to retrieve all files of a certain type

wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i ~/mp3blogs.txt

And here's what this all means:

-r -H -l1 -np These options tell wget to download recursively. That means it goes to a URL, downloads the page there, then follows every link it finds. The -H tells the app to span domains, meaning it should follow links that point away from the blog. And the -l1 (a lowercase L with a numeral one) means to only go one level deep; that is, don't follow links on the linked site. In other words, these commands work together to ensure that you don't send wget off to download the entire Web -- or at least as much as will fit on your hard drive. Rather, it will take each link from your list of blogs, and download it. The -np switch stands for "no parent", which instructs wget to never follow a link up to a parent directory.

We don't, however, want all the links -- just those that point to audio files we haven't yet seen. Including -A.mp3 tells wget to only download files that end with the .mp3 extension. And -N turns on timestamping, which means wget won't download something with the same name unless it's newer.

To keep things clean, we'll add -nd, which makes the app save every thing it finds in one directory, rather than mirroring the directory structure of linked sites. And -erobots=off tells wget to ignore the standard robots.txt files. Normally, this would be a terrible idea, since we'd want to honor the wishes of the site owner. However, since we're only grabbing one file per site, we can safely skip these and keep our directory much cleaner. Also, along the lines of good net citizenship, we'll add the -w5 to wait 5 seconds between each request as to not pound the poor blogs.

Finally, -i ~/mp3blogs.txt is a little shortcut. Typically, I'd just add a URL to the command line with wget and start the downloading. But since I wanted to visit multiple mp3 blogs, I listed their addresses in a text file (one per line) and told wget to use that as the input.

Labels: , ,

Reference Link


Sunday, September 30, 2007

Google Repositories (apt-get)

Add this line to /etc/apt/sources.list

  • deb http://dl.google.com/linux/deb/ stable non-free


Import the key by running this

  • wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add


Then of course run "apt-get update" and your ready to instantly install Google's apps using apt-get.

Labels: , , , ,

Reference Link


Tuesday, September 25, 2007

System Rescue CDs

I just wanted to put a little note here because I was reminded tonight of 2 very good rescue CDs that I have used in the past and I highly recommend to anyone who has a broken system (Linux, Windows, and I think even Mac).

The first and primary CD that I carry in my arsenal when I go on jobs is SystemRescueCD. This Linux distro is based on GenToo -- but fear not as there is no knowledge of GenToo required. The CD boots and detects just about any hardware you can throw at it and has a lot of VERY useful tools for recovering data and fixing stuff. The kernel has NTFS support built into it and the distro includes ntfs3g. There are also tools that rival (and maybe beat) PartitionMagic. It's also extremely handy for booting a system in a data center and letting a customer access the machine remotely (great for remote hands type services) as it includes setup scripts for networking and for starting sshd.... so you could walk even the most novice (ok, not all, I'll admit) techs. They also now have a PPC distro which theoretically will let you recover even MaxOS systems -- but I havent had the opportunity to try this yet.

You definately don't want to be without this CD if you do any kind of computer services. You'd be amazed at how handy it is.

Link:


The second CD is the same similar idea but it's based on Debian with an Ubuntu kernel. The name of this one is Kanotix. The advantage this CD has over SysRescueCD is that it allows you to apt-get applications on the fly. It also has a slightly different hardware driver selection but you'll probably find it has the majority of what you need. I'm definately a synaptics fan but this is still only my second choice for recovering systems because I think SysRescueCD definately has the streamlining down and makes quick tasks remain just that: quick tasks.

This is still something that you want to keep in your arsenal. There have been a few situations where SysRescueCD didn't cut it for me and I pulled out Kanotix and it worked fine. It's also handy if you have some advanced stuff that you need to work on since you can easily apt-get utilities that you may need.

Link:

Labels: , , , , , , ,

Reference Link


Friday, June 15, 2007

Unix Find Tutorial

I found a great reference to using the find command on UNIX/Linux systems. There is a lot of explanation of the mtime, ctime, etc operatives.

In case the site later is taken off-line, I have archived the page in PDF form: Download PDF

Labels: , , , ,

Reference Link


Sunday, March 11, 2007

Time Zone Updates

http://tf.nist.gov/general/dst.htm

Thanks to the all-knowing official-type people, they have made the world a better place by making us start DST three weeks early. Apparently none of these people realize that

1> This DOES NOT change the amount of daylight we have so their energy saving reason is mute

2> Computers have been programmed a certain way for years.... We should have anticipated the whole Y2K thing because its a numerical science... we COULDNT have anticipated whack-jobs changing the way our world runs.

At any rate, almost all of my servers and workstations updated flawlessly including Windows, Mac OSX, and Linux machines. A few, however, didnt -- mostly because they weren't running any type of automatic updates.

For my MythTV system (Debian based -- should work the same on other debian/ubuntu systems) I just downloaded a new tzdata deb file and installed it. Get the file from http://packages.debian.org/testing/libs/tzdata and then install it with "dpkg -i tzdata*"

I administer a RHEL4 machine (one of the last remaining -- someday it will become ubuntu, I swear) which did not do the update. Don't bother going to Red Hat's site... they step you through God knows what to accomplish something that really is not that hard.... oh, and God forbid their help page give you a link to the file -- you only get ease of use if you give them all your money. Anyway, get this file from rpmfind: ftp://rpmfind.net/linux/fedora/core/updates/6/i386/tzdata-2007c-1.fc6.noarch.rpm (yes, I am aware it's a fedora core rpm but it works quite well and you dont have to give Red Hat all your money). Once you have that file, run "rpm -i tzdata*" and once that completes, run "system-config-date" ... just reselect your timezone and exit and you're all set.

I don't see any reason why the above RHEL instructions won't work on other fedora, RHEL, or CentOS systems -- but I can't confirm.

Also, if you have a FreeBSD system, check this out: http://www.freebsd.org/cgi/cvsweb.cgi/ports/misc/zoneinfo/. I don't have any FreeBSD boxen but I found this link and figured I'd post it as well.

Mandrake/Mandriva users should be able to just do an update through urpmi ... but again, I don't have any of them boxens so I can't confirm :)

Windows: http://windowsupdate.microsoft.com"

Mac OSX:

  • Go to the Apple menu
  • Select Software Updates
  • Look for "Daylight Savings Time" update
  • Install it



Now I'm going to go enjoy my not-extra-hour-of-sleep.

Labels: , , , , ,

Reference Link