The Zope 2 VirtualHostMonster - how it works internally.

About the Zope VirtualHostMonster
The Zope 2 VirtualHostMonster - how it works internally.

We have  a customer from the automotive industry with the need to host 800+ Plone Lineage based sites. Together with redirects and landing pages this is not very maintainable in an Apache or NGINX configuration file. Even if this could be generated, the roll-out would be very complex and it needs configuration reloads on every change. It would be much better to direct each domain to Zope and let it decide.

In Zope we have already a mechanism to map incoming URLs to paths in Zope: the VirtualHostMonster (short: VHM). It rewrites incoming URLs using a special path pattern or it can be used to rewrite using own rules, persistet in the ZODB. Using such patterns it is possible to rewrite domain names to paths. The patterns need to be entered in a ZMI form. Maintaining 800 sites using the old clumsy ZMI interface is no fun.

The problem it solves is to map an incoming URL like https://www.foobar.com/cms/news to an object in ZODB under a different path like http://10.11.12.13:8080/Plone/en/news. Because of a webserver, cache and/or load balancer in front, the protocol, domain and port are different and need to be rewritten on generated absolute URLs.

VHM is one of the mature workers in Zope I never touched or looked at, just because there was no need. So I took the oppurtunity to take a look on how the current implementation works, and how to utilize Zopes features to make this simpler.

As already mentioned, VHM offers two built-in ways to do so:

  1. rewrite the incoming address in front (apache/nginx/...) using a specific pattern in the path
  2. configure a mapping with rules in Zope.

In practice the second way is rarely used. In theory both could be combined, but avoid it: danger of head explosions.

So how is this achieved? Technically there are mainly two classes/object involved as follows.

  1. ZPublisher.HTTPRequest.HTTPRequest The HTTPRequest provides some methods to influence the absolute_url() output later. Used here are:
    request.setServerURL(protocol, host, [port])

    ensures to deliver a site as https://www.foobar.com/ and not under the internal IP address/ port.

    request.setVirtualRoot(path)

    sets the zope-path to the root (Plone site, navigation root, ...) in order to strip parts of the path away when generating the absolute_url, i.e. /Plone/en/news to /news.

  2. Products.SiteAccess.VirtualHostMonster A VirtualHostMonster instance is added as an persistent object in ZODB as a child of the root Application object when Zope is initialized first.

    It detects either the configuration from the HTTPRequest instance using the already from the path extracted TraversalRequestNameStack, or reads a configuration from a persistent mapping.

    The VirtualHostMonster object is registered as a before traversal hook (__before_traverse__) with a priority of 25 (default: 99). The registration code lives in ZPublisher.BeforeTraverse. So it is called before the traversal mechanism of Zope grips. Thus the path to be traversed into can be modified as needed.

    Before traversal VMH is called with a signature def __call__(self, client, request, response=None). The VHM utilizes only the request object.

    In short it extracts a path to traverse to and the URL the path is accessed from the outside, considering - a prefix path which does not exist in ZODB but is visible outside, - a path to traverse to that is not part of the public visible path, - a prefix path to traverse to that is part of the public visible path, - a postfix path that is part of the public visible path to be traversed to.

    It detects also the outside public protocol and domain. Additional it takes port mappings (i.e. 80 -> 8080) into account too.

    Doing all this covering a wide bunch of use-cases and flavors makes it a real monster.

    After extracting all the information it does actually:

    1. calls request.setServerURL which sets protocol, domain, port
    2. calls request.setVirtualRoot in order to deliver the right path
    3. modifies request['TraversalRequestNameStack'] to point to the right target.

    Which is in fact stunning simple.