Don’t call me DOM

18 September 2006

Offline Web cache with Squid

Filed under:

One of the nice properties of HTTP as a protocol is that it includes a very well-thought caching protocol, which allows for proxies and caches to serve as intermediary between the authoritative server and the end-user client.

While this is often used either in big proxies set up where one proxy caches results for a large set of clients, or at a micro-level where a given user agent keeps some set of resources in its private cache, I’ve found it much more useful to set up a full cache for my laptop that allows to do transparently off-line browsing.

What is more irritating than being off-line and not having access to all these pages you access so frequently? Why should I need to be connected to browser the W3C specifications, which, once they have reached the Recommendation status, pretty much don’t change at all?

There are several software packages that can be used as proxy cache, but I have found them to be more often faulty in the way they interact with HTTP than not. I’ve settled to use Squid for the past few months with really good results; by default, Squid is configured to handle the big proxy-cache use case rather than my offline browsing one. So here is my recipe to make Squid a more approriate choice for this use case:

  1. you probably want some long time to live for the cached objects, so that they don’t get removed too quickly from your cache: to set the maximum time to live to one year, you need to set the last parameter of refresh_pattern to 525600; my own setting is: refresh_pattern . 0 40% 525600 (not a good idea, apparently)
  2. you also need to say how big you want your cache to grow up to; I’ve set it to 500 Mo with cache_dir ufs /var/spool/squid 500 16 256 — I would have set a higher bar if my disk wasn’t getting so full already
  3. Squid has a special mechanism to be used as offline cache, which is activated by the squidclient tool: /usr/bin/squidclient mgr:offline_toggle
  4. by default, that command needs a password; to make this available by anyone on localhost without password, add the line cachemgr_passwd none offline_toggle in the configuration file
  5. to get Squid in this offline mode automatically when you lose your network connection, I’ve added in my /etc/network/if-up.d/ directory the following script (make sure to make it executable):
    #!/bin/sh
    # Quit if we're called for the loopback
    if [ "$IFACE" = lo ]; then
            exit 0
    fi
    
    # make sure the state is OFF
    if [ -z "`/usr/bin/squidclient mgr:offline_toggle|grep OFF`" ]
            then /usr/bin/squidclient mgr:offline_toggle ;
    fi
    and similary in /etc/network/if-down.d/:
    #!/bin/sh
    # Quit if we're called for the loopback
    if [ "$IFACE" = lo ]; then
            exit 0
    fi
    
    # make sure the state is ON
    if [ -z "`/usr/bin/squidclient mgr:offline_toggle|grep ON`" ]
            then /usr/bin/squidclient mgr:offline_toggle ;
    fi

Et voilà! No more messing with offline and online modes in browsers, no more arcane commands to launch, everything just happens automatically.

7 Responses to “Offline Web cache with Squid”

  1. Mark Nottingham Says:

    I don’t *think* you need the refresh_pattern with the high max in there; offline mode basically allows Squid to serve stale objects without going forward, so it doesn’t matter how fresh they are. Objects that become stale aren’t removed from the cache immediately; it’s only when the cache becomes full are things evicted (and even then, freshness doesn’t come into play, IIRC; it’s a pure LRU replacement algorithm).

    Also, such an extreme value might cause Squid to be too aggressive when it’s online, for responses without explicit caching information.

    Cheers,

  2. dom Says:

    Thanks for the feedback, I’ve reduced my configuration accordingly!

  3. Prasanna Says:

    Hi
    I use a Dialup Connection and need to Use Squid in suh a way that when I am not connected to Internet
    It should fetch the Page that it obtained when it was Online quickly …
    When I use Default Conf of Squid It took 5 minutes for Squid to serve that page from Cache..
    After Seeing above Post and Making changes Now I get DNS server timeout
    I dont use any DNS server on my SUSE linux box and named is off
    Why is that Squid gives error .. It should serve the page from Cache When It finds that No COnnection is available
    to internet
    Also How Do I know the details of Page that it fetche dwhen it was online ..Info Like
    web server headers , X-Powered IP addresss…..??
    I tried USing Cache Manager But it Gives details Which is only useful to a Developer not a Common Person..
    Pls Reply to My POSt

  4. nissirilla Says:

    i have problem, how to make browsing offiline via squid 2.6
    Can you give command line squid for me.

    Thanks before
    Nissirilla

  5. Jacob Says:

    There is in an error in this post which prevents Chrome and Safari from rendering it properly. Try deleting that p tag on line 39.

  6. Dom Says:

    Thanks Jacob, this is now fixed

  7. bayu Says:

    hi thanks the tutorial

    im interested with the script you make to “auto offline” when disconnected from network.
    but i use Squid in Windows. Any information how to use “auto offline” in windows?

    thanks

Picture of Dominique Hazael-MassieuxDominique Hazaël-Massieux (dom@w3.org) is part of the World Wide Web Consortium (W3C) Staff; his interests cover a number of Web technologies, as well as the usage of open source software in a distributed work environment.