Phil Factor (10/8/2012)
I started out with a list of every word in the English language. There are several of these around. You probably won't find these used if you have a policy in place, but if you do the usual @ and 0 substitutions as well, then a lot crawls out. I add words from books on the Gutenberg project. Capitals should be random in a good password, but they usually aren't, so a simple doubling of the list with a capital for the first letter. Some SQL code that generates some permutations (1337 speak translations included, other rules cheated around by this being designed to test plaintext passwords for how bad they are in advance).
Note that this SQL could can be used to generate at least some options for PWDCompare to use, in a primitive, hardcoded form of the rules files that John the Ripper, Hashcat, PasswordPro, and other professional CPU and/or GPU hash cracking software uses. Thus you keep your actual list small, while you crack large numbers of passwords.
This approach, of course, quickly grows time-consuming on a computer level... which is why you switch over to a GPU level (Even a $50 or $100 card, much less a set of 8 $400 cards), and increase speeds by many orders of magnitude, at which time this approach leaves you needing longer wordlists and more and more rules in order to get a weekend run to actually take the whole weekend on real hardware, or an overnight run to take all night.
I have large multigigabyte wordlists and small word lists. Suggested starting points, if you don't just want to use the .rule files from hashcat or similar:
1) Add all numbers from 1 to 9999 to the end of each password - 4 digits gets years automatically.
2) Add all dates from the past 300 to next 100 years in the most common formats with various separators
3) Full 1337 speak translations in various dialects
3a) Partial 1337 speak translations - permutation based, so it does get big fast on long words.
4) Add fully random nonsense to the beginning and/or end.
5) Combination passwords from smaller dictionaries - i.e. envelopingadvertisers and its closest derivatives, Envelopingadvertisers, envelopingAdvertisers, EnvelopingAdvertisers, and so on with no space, a space, a comma, a dash, and if you want to get scientific about it, brute-force all combinations from length 0-2 for the separator. For a two word list, you could easily also try 3 character random separators.
5a) triple and quaduple combination passwords from small and very small dictionaries. For instance, "correct horse battery staple" has only length 7 or less words. If I look at a normal British wordlist's 7 character and less words, I see about 19,000 words. 19000^4 (four words, only one way of separating) comes out to, at current cracking speeds on a single top grade consumer cracking machine, about two months. 19000^3 (three word passwords), of course, is 19000 times smaller, and takes only five minutes to check, so we can easily apply a set of 64 rules and still check it in about five hours, or use cheaper hardware and check the 64 rule list in only a few days (say, over a weekend). Note that if we filter for 6 character and less words we find only 12000 words left, many of them not commonly used.
6) Combinations of the above.
Note that pegging a desktop/[gaming] laptop/dedicated PW audit server's CPU and/or GPU to 100% for days or weeks on end is nowhere near as problematic as doing so to a production SQL Server
P.S. Phil, note that utilities to "let you back in" generally operate by finding the location of the password hash and related information on disk and changing whatever was there to a known or newly generated value; they don't actually figure out the old plaintext password.