By John C. Rucker (Page 6 of 10)
With Squid out of the way, we can finally get on to the filter, DansGuardian. The amount of control that DansGuardian gives you in filtering content is truly amazing. If you want to explore all the possibilities, you should really read through every file in the /etc/dansguardian
folder to see what you can block. And it's a lot: you can block specific file extensions and mime-types to prevent music and movie downloads, Flash and Java applets, games, chat, zip files, censor-out swear words, and so much more. The comments in the files in those directories provide all the instructions you should need to customize things. And if you want to customize things even further, read up on the DansGuardian documentation.
There are a number of little changes we need to make to get DansGuardian working in a fashion suitable for a public library. The remaining steps on this page will simplify all that for you.
If you're receiving E-Rate funding discounts, CIPA requires you to prevent access to pornographic and obscene content, and that's all we should do for our adult patrons. For our children's computers, we must also prevent access to materials that are "harmful to minors". There are two ways to approach this: have one filter level for both adults and minors that is a little more restrictive than required, or have two separate filtering levels customized to each age group. In the interest of simplicity, our example here will just use one level. (I plan on revising my more advanced command-line-only tutorial in the near future, and I will cover multiple filtering levels there. In the meantime, you can still view my older tutorial using Ubuntu Dapper.)
CIPA is primarily concerned with blocking access to forbidden visual materials. This is problematic, since the ability of a computer to determine what constitutes nudity is still in the experimental stages, and the ability to determine if a nude image is pornographic is harder still. DansGuardian relies on a variety of methods to determine if a page should be blocked based on your needs: blacklists, pattern matching of block words in web site addresses, metadata provided by web sites, and especially term weighting. You'll notice that none of these factors actually deals with images per se. Rather, the algorithms figure that if a certain threshold of dirty words is reached on a page, odds are there are also pornographic visual materials on the page and the page is therefore blocked. If there are not any pictures on a page, DansGuardian will still block it if the dirty phrase threshold is high enough. Conversely, a page with no words, just pornographic images may get past your filter. To address both of these concerns, you can tweak the way DansGuardian filters pages.
We'll go through the essential DansGuardian configuration files one by one now, but I encourage you to look at all of them when you can. We'll start with the master configuration files. Open /etc/dansguardian/dansguardian.conf
. On the third line, you'll see UNCONFIGURED - Please remove this line after configuration
. So, please remove that line.
Next find loglevel = 2
and change the "2" to a "1". This will make it so the only requests logged are the ones that are blocked. This will help us in verifying that the filter is working, while preserving the privacy of patrons who are not being blocked.
We'll change just one line in /etc/dansguardian/dansguardianf1.conf
. Find naughtynesslimit = 50
. It's been our experience that a limit of 50 blocks way too much for a library setting. For our public wireless network, we have this set to 250. You should experiment to see what works best for you. Save and close the file.
Open /etc/dansguardian/lists/bannedextensionlist
. Comment out (that is, put a "#" in front of) any file extension that you want to allow through the filter. Since we are not locking things down tightly like a corporate environment might, you'll probably want to comment out every line here. If you think you'll never want to block specific file types, you could just select everything then delete the content of the file to save time. Save and close the file.
/etc/dansguardian/lists/bannedmimetypelist
is similar, you'll probably want to comment out, or delete, all the lines. Save and close the file. Now open /etc/dansguardian/lists/bannedregexpurllist
. For a library, the directives in this file have a tendency to over-block, so comment out or delete all lines. Save and close the file.
/etc/dansguardian/lists/bannedsitelist
is one of the more important files. This is where you can force an entire web site to be blocked. You can also use this file to make your filter a whitelist. That is, all sites will be blocked that you don't explicitly allow. Finally, you also have the option to use some built-in blacklists. You likely won't need to edit this file now, but later you might. The comments in the file explain how to enable the various options. Save and close the file if you made any changes.
/etc/dansguardian/lists/bannedurllist
works in a similar fashion, though only for individual pages within a larger web site, while leaving the rest of the web site alone. Like the last file, you probably don't need to edit it now.
The files starting with exception
—like exceptionsitelist
—work similar to the banned*
ones above, only they apply to things you don't want blocked. Edit them as you see fit. At Branch District Library we added all the major web mail providers to exceptionsitelist
so that emails would never get blocked, for example.
/etc/dansguardian/lists/weightedphraselist
contains pointers to the various categories of phraselists that DansGuardian uses in evaluating web pages. There's a nice description at the top of the file that describes how the term weighting works. Edit the file as you see fit, commenting out lines for things you don't want to be blocked. For our example we'll leave the weighted phrase lists only for good phrases and pornography, and comment out the rest. Save and close the file.
Notice that the languages that DansGuardian checks for pornography are mostly European languages. We have had to manually add blocks to pages for Arabic- and Russian-language porn that made it through the filter.
The last file for you to edit is couple of folders further down. Open /etc/dansguardian/languages/ukenglish/template.html
. This is the default page that patrons will be shown when they try to view a page that is blocked. If you know about HTML, edit this file to your heart's content. Otherwise, all you need to do is find the line that reads "YOUR ORG NAME" and put in your organization's name. You can then save and close the file.
Now's the moment of truth. By this point, we've installed Ubuntu and configured all the required parts. All that's left is turning these parts on and testing. We'll use those launcher icons we created earlier to do this. Double-click "Restart DHCP Server" and put in your password. (You'll probably only need to put in your password for the first one you click, the computer remembers you authenticated for a few minutes.) Any error messages? If so, go back and check your work. When the service is finished restarting, you'll see either "done" or "[OK]", at which point you can close the window. If there were no errors, repeat that for the other three launchers. Once all the parts are up, we can now test things.
Fire up a laptop on your wireless network and start your favorite web browser. Try to go to any old page, like google.com
or yahoo.com
. Did it work? Great! Now try to go to www.badboys.com
, a site blocked in the bannedsitelist
file. Do you see the "Blocked" banner? Good.
Sometimes a web site may get blocked when it shouldn't have been, other times something might slip through that should have been blocked. This is unavoidable and happens with commercial filters, too—even if they don't want to admit it. When this happens you can dive into the inner workings of DansGuardian to tweak how the term weighting is being applied, or you can just explicitly block or allow a partiular web page or even an entire domain. You do this by simply editing one of four files in the /etc/dansguardian/lists/
folder:
bannedsitelist
bannedurllist
exceptionsitelist
exceptionurllist
As mentioned earlier, the *sitelist
files cover blocking or allowing whole web sites, and the *urllist
files cover individual pages within a site. For example, if you wanted to block ThisIsAReallyBadSite you would add "thisisareallybadsite.com" to the list of sites in bannedsitelist
. And, as always, save and close your file.
To see what DansGuardian is blocking, you should periodically check the log files. On the menu bar at the top of the screen click on System -> Administration -> System Log
. In the System Log Viewer, click on the File
menu and choose Open
. The open dialog box will pop up, and you'll already be in the /var/log
folder when it opens. Scroll up and you'll see the folder dansguardian
. Double-click on that folder and you'll see access.log
. Open that file to read the log of blocked pages. If you see pages on there that shouldn't be blocked, add them to your exception*
lists.
Whenever you make a change to the settings of any of the 4 parts we've just configured—DHCP Server, Shorewall, Squid, or DansGuardian—you must restart the program in question. Use the launcher icons we created to quickly do this.
Aside from running the security updates, editing the banned*
and exception*
lists is likely the only administration you'll ever need to do with your filter. And even that probably won't need to be done very often. Here at the Branch District Library we have gone months without ever needing to tweak the banned*
or exception*
lists.
One last thing before we go on: let's restart the filter computer and make sure everything runs automatically without errors when it comes back up. If so, great! Now, on to the next step.
Previous Page: Configuring Squid
Next Page: Client Configuration