Overview
Goals and Objectives
This specification covers the protocol library – code-named “zanshin” – that Chandler will use in 0.6 to share collections over WebDAV and CalDAV.
- Grant Baillie
- Spec owner and contributor
- Lisa Dusseault
- Spec contributor
Background
Existing protocol libraries (both for HTTP and other internet protocol) typically have a model where applications instantiate a Connection object to a given server, open it, and then use it to send commands, wait for the server’s responses, and process them. In the case of Chandler sharing in particular, this model is less than ideal for several reasons:
- It’s inaccurate: Most HTTP “conversations” involve a single connection per request and response.
- It’s not useful: The underlying model of HTTP is that there are resources that reside on a server. So, a better abstraction would involve some kind of Resource object that could be queried for properties.
- HTTP and WebDAV performance needs intelligent caching: The API you get by making the obvious 1:1 mapping of request types to Connection methods often leads its applications to make unneeded requests, with the resulting extra load on network bandwidth and client and server CPU.
- It's synchronous: For GUI applications like Chandler, having the main thread block on network I/O leads to frustrating unresponsiveness when network outages or slowdowns. Typically, GUI applications adopting Connection-oriented APIs work around this by using background threads, but the need for locking often leads to code complexity.
- It's hard to extend: HTTP features like pipelining can be hard to implement if one is tied to a synchronous request/response API.
Since sharing of calendars and collections is a key part of the Chandler story, having a protocol library that addresses these concerns is important, both for 0.6 and beyond.
Definitions
Requirements
High-Level Features
Our base feature set is driven by the needs of Chandler sharing:
| Requirement | Chandler Feature |
|---|---|
| Check that a given URI specifies a WebDAV collection, and deal with the various failures | “Test this account” button in the Accounts dialog |
Download a .ics file from an HTTP server |
“Subscribe to collection” menu item |
| Download a collection from a WebDAV server. | |
| Download a calendar from a CalDAV server. | |
| Upload a collection and subresources to a WebDAV server | “Share collection” menu item |
| Upload a calendar to a CalDAV server | |
| Delete a WebDAV/CalDAV collection | “Manage collection” menu item |
| Synchronize a client-side cache of a WebDAV collection with a server, utilizing ETags if available | “Sync collection” menu item |
| Synchronize a calendar with a CalDAV server, utilizing ETags if available | |
| Allow custom SSL certificates for in HTTPS connections | Certificate store |
| Create a ticket for a WebDAV collection | Ticket support |
| Download a collection using a ticket URL. | |
| Fetch and set a WebDAV ACL for a resource. | ACL support. |
Interoperability
For 0.6, Chandler will interoperate with OSAF’s sharing server, Cosmo.
In addition, we will support the following WebDAV servers (test accounts for these were included in Chandler in 0.5):
- Xythos WebFile Server
- Venue Communications, Inc.
One point to note is that the third test WebDAV account in 0.5 Chandler
pointed to a server running mod_dav in Apache. Since this server
doesn’t support strong ETags and WebDAV ACLs, supporting it is
only a “nice to have” requirement for 0.6. Note: there is
an IT ticket to replace the pilikia mod_dav server with Slide.
What about Slide? It has been used in development, seems to work and has strong ETags and ACLs.
Low-Level Features
For an explanation of these features, see “Protocol Considerations” below.
| Feature | Priority for 0.6 |
|---|---|
| Strong ETags | Required |
| Simulating strong ETags on servers that don’t support them | Not Required |
| Pipelining | Nice to have |
| Basic caching, including Property “piggybacking” | Required |
| Uniquing caches | Nice to have |
| Cache expiration | Not required |
Documentation
All public API (classes, methods, instance variables, arguments, constants) will be documented via Python docstring.
License and Source Code
The library will be distributed under the M.I.T. license. The source code can be found in OSAF’s Subversion Repository.
High-level Decisions
The main high-level decision is to address synchronicity issues by adopting the Twisted networking framework, rather than by using threads. In favour of this decision are:
- Twisted’s architecture makes for a clean separation between protocol (i.e. interpreting streams of bytes, and generating responses) and transport (how the bytes are read and written).
- The Chandler Email Service has adopted Twisted (and in fact contributes code back to the Twisted source base). Having a Twisted-based solution for WebDAV and/or HTTP makes the Services APIs more consistent.
- In the course of implementing the Email Service, Brian Kirsch has written excellent utility code that can be leveraged to ease Twisted adoption.
- Cancelling a network operation, something users may well want to do in the case of network instability, is tricky in a threaded implementation. Either the thread has to check a global variable periodically, or else the GUI thread has to shut down the socket (paying careful attention to potential race conditions).
Arguments against the use of Twisted are:
- Twisted doesn’t have a good HTTP client implementation (in particular HTTP/1.1).
- The application code becomes somewhat more complex: Typically, one has to move from writing network-related code in a sequential style to a procedural one.
While not wanting to downplay the risks of these downsides, it’s worth noting that the situation is very similar to that prior to adopting Twisted for the Chandler Email Service. There, OSAF has made a positive contribution from which both Chandler and the Open Source community have benefited.
Code Design
Chandler Integration Issues
There are some Chandler-specific implementation requirements that lie beyond the scope of a general protocol library. These will be implemented in Chandler, mostly by subclassing objects from zanshin:
- The ability to interact with the repository and its views. In general, this will be taken care of by leveraging view management code in Chandler implemented by Brian Kirsch for the Email Service.
- The ability to interact correctly with Chandler’s custom certificate store, which allows users to accept or reject invalid SSL certificates (the most common cases are certificates from unknown authorities, or self-signed certificates).
- The current sharing code in Chandler has been written in a synchronous,
sequential fashion. As mentioned earlier, this requires some code refactoring.
To enable faster adoption by Chandler, zanshin will provide an API,
blockUntil(deferred-call), that blocks the UI thread until a Twisteddeferredfires (or fails).
Protocol Considerations
Abstraction of Interoperability Issues
• Hiding HTTP Features
Different HTTP servers may or may not implement certain features, and for the most part applications don’t need to concern them with the details. Examples are:
- Connection keep-alive: The client should completely abstract the concept of KeepAlive. If the client attempts to keep a connection alive, and the server can or cannot do so, or fails to keep the connection alive at some point, none of this should ever reach the attention of the Application (unless the connection cannot be restored, when an error will be returned).
- Chunking and Transfer-encoding: The client should abstract this and do the sensible thing.
It's worth noting that features like this could interesting to applications. For example, when an application can use more than one WebDAV server, knowing features are supported by each might help to make a choice.
• ETag support
The use of strong ETags on servers that support them is required for 0.6.
Another possible goal is to abstract away the support, or lack thereof, for strong ETags. For example, it might be possible to try use the Last-Modified header instead, although this is known
to be not completely reliable.
Are there alternatives?
Even if this is possible, there's still implementation dependencies around ETags. For example:
- If the server supports weak ETags, the client should know that and ask ASAP for a strong ETag. A weak ETag is useless in collaboration scenarios.
- If the server does return a strong ETag in response to PUT requests, that's great and the client should cache that. If not, the client should do a HEAD or PROPFIND to ask for the ETag so that it can be cached anyway without the application having to worry about it.
Caching
If a client library had a good representation of a resource, it becomes pretty easy to cache information so that the client doesn't have to make so many round-trips on behalf of the application. An application can ask first “Does this resource support WebDAV”? then ask “Does this resource support locking?” and the client can cache the answers to those two questions since they are both answered in the same OPTIONS response.
Some information can be deduced from other information already in the cache. For example, if a resource's parent supports WebDAV, then that resource MUST also support WebDAV. The client can either propagate that information as the cache is filled in or calculate it dynamically, and avoid a round trip.
The following are strategies for caching intelligently. Only the first is required for 0.6:
- Whenever the client does a PROPFIND, we will consistently asks for the most common properties needed in the cache, so that the cache can be updated via piggybacking on any property request.
- It's possible that different parts of the application might be interested in resources in a given account on a server. In this case, it would be good if cached information were shared between these different parts. Otherwise, the application might well end up with multiple caches, thereby wasting memory and network bandwidth.
- Eventually, we will implement timing out the cache, so that stale information is discarded. The timeout might be short for some types of information (a resource's lock state or ETag, for example) and slow for other information (whether a resource supports WebDAV, for example) or consistent for simplicity's sake.
Pipelining
For the most part, in an asynchronous framework like Twisted, HTTP pipelining
shouldn’t be too difficult to implement: The Deferred
paradigm already deals with the fact that a response to any given request
could come in at any point. It is unclear, however, as to how much of a
performance win there is to be had by pipelining on the client, even in cases
where pipelining would theoretically be effective (like uploading a large
number of files). As a result, pipelining support remains a stretch goal for
0.6.
Note: In the case of Mozilla, enabling pipelining improves page loading times by about 7% on LAN. It is not enabled by default because many servers do not support it. Chandler's use cases are different from a web browser, though, and Chandler should be able to benefit much more from pipelining.
Low-level interactions
No matter how good a protocol implementation library is, at some point it needs to be extended. For example, the application could need to do something complex like create entirely new methods, headers or bodies, when a major extension to HTTP is used. Alternatively, the application only needs to make minor tweaks – e.g. the ability to add a certain header to certain otherwise-standard requests. The client library must not preclude this, and should not make it too difficult.
• Optional WebDAV Features
Locking
While locking of WebDAV resources is required for reliability of updates in multi-user shared environments, it is consistent with the goals of Chandler 0.6 to delay the implementation of locking till 0.7.
Access Control (ACL)
For future WebDAV ACL support, API will be supplied to retrieve, examine and set the ACL for a given Resource.
Module Outline
WebDAV module
At the heart of the module lies the ServerHandle class. This is responsible for:
- Determining properties of a given HTTP server (via a
OPTIONS *request). - Maintaining a cache of Resource objects.
- Queuing HTTP requests, most of which are made by individual Resources.
[grant] The name “ServerHandle” could be improved
Resource objects are specified by URL (or path), and represent a resource on the server. These allow applications to:
- Check whether a resource exists on the server.
- Check whether a resource supports specific features (like
ACLs,LOCKing. - Query collection resources for their children.
- Create children of collection resources.
GETorPUTresource content.- Set or get WebDAV properties.
HTTP Module
This module provides a basic HTTP/1.1 client in Twisted, including Response and Request classes, as well as the required implementation of the Twisted Factory and Client interfaces.
ACL Module
This module provides classes to represent entities in the WebDAV ACL model. In addition, it implements support for converting ACLs to and from XML.
Sample Code
Since zanshin is a Twisted-based API, most of zanshin's methods
return twisted deferred objects. To avoid having a bunch
of callbacks/errbacks in this document's example code, we're going to
make use of a utility function in zanshin that waits for Deferred
objects to return.
>>> from zanshin.util import blockUntil
To start with, we need a server to test against. For testing purposes, zanshin comes preconfigured with a test WebDAV server, so let's start that up:
>>> import zanshin.webdav_server as server
>>> from twisted.internet import reactor
>>> listenPort = blockUntil(reactor.listenTCP, 8081, server.getTestSite())
The ServerHandle is what replaces the typical misleading terminology of "connection" in protocol libraries. Normally a protocol library does work directly with connections, but in HTTP connections may be dropped at the whim of the server, so instead we work with server handles.
>>> from zanshin.webdav import ServerHandle
A ServerHandleobject is instantiated with
the host and port of a HTTP server. (The constructor has other
parameterized arguments to specify username and password, and whether
to enable TLS, but these don't apply for our simple server):
>>> serverHandle = ServerHandle(host="localhost", port=8081)
There's not a lot you want to do with a raw
ServerHandle. Without a specific resource to query, the
only recommended action is to ping the server to see if it's there and
supports HTTP.
>>> blockUntil(serverHandle.ping) True
>>> bogusServer = ServerHandle("bogushost") >>> blockUntil(bogusServer.ping) Traceback (most recent call last): ... ConnectionError: DNS lookup failed: address 'bogushost' not found: (7, 'No address associated with nodename').
Resources
Besides being able to send and process HTTP requests, a
ServerHandle also maintains a cache of
Resource objects. The getResource method
enables you to get the resource for a given URI (or path).
>>> root = serverHandle.getResource("/") >>> root.path '/'
Note that getResource doesn't talk to the server, it
just returns a local object. However, a Resource can be
queried as to what features it supports:
>>> blockUntil(root.supportsWebDAV) False
In the default configuration, our test WebDAV server (like some
others) does not support WebDAV functionality on the root folder.
However, it does have a more interesting Resource:
>>> resource = serverHandle.getResource("/folder/") >>> blockUntil(resource.supportsWebDAV) True
We can do a simple existence check:
>>> blockUntil(resource.exists) True >>> blockUntil(serverHandle.getResource("/not-here").exists) False
WebDAV resources have other properties we can ask about:
>>> blockUntil(resource.supportsAcl) False >>> blockUntil(resource.supportsLocking) False
WebDAV also added concept of a *collection*, which like a
directory in a filesystem (rather than a simple file). We can ask a
given Resource if it's a collection:
>>> blockUntil(resource.isCollection) True
and we can query a collection resource for its children:
>>> blockUntil(resource.getAllChildren, includeParent=False) []
In the case where we have write access to the server, we can go ahead and make a child collection:
>>> child = blockUntil(resource.createCollection, "cake") >>> blockUntil(resource.getAllChildren, includeParent=False) [<Resource at 0x... (/folder/cake/)>]
Of course, you can also create ordinary files:
>>> f = blockUntil(child.createFile, "file", "Use this to escape!") >>> f.path '/folder/cake/file' >>> blockUntil(f.get).body 'Use this to escape!'
Collections or files can also be removed from the server:
>>> blockUntil(child.delete) <zanshin.http.Response object at 0x...> >>> blockUntil(child.exists) False >>> blockUntil(f.exists) False
Note that deleting a non-empty collection deletes all its subresources implicitly. (WebDAV has special ways to report the errors when some subresources can't be deleted for some reason).
Raw Method Requests
It's possible to issue raw HTTP requests via
ServerHandle's addRequest method. This
returns a Deferred that will fire once the
Request has been processed.
>>> from zanshin.http import Request >>> request = Request('GET', '/not-here', {}, None) >>> blockUntil(serverHandle.addRequest, request).status 404 >>> request = Request('GET', '/aFile', {}, None) >>> blockUntil(serverHandle.addRequest, request).body 'Hello, world!\n'
Special Considerations
QA / Test
Many of zanshin’s unit tests have been implemented to run standalone, i.e. without requiring external servers to talk to. This is partly to make sure that tests can be run frequently during the development process, but also so that they can be run automatically (for example, by a Tinderbox) without experiencing intermittent failures.
However, this runs counter to the goal of having interoperability with existing servers. Consequently, it makes sense for some WebDAV tests to be configurable to connect to an external server.
API / Developer Platform
If relevant, how the feature will be made accessible to coders?
Security
TBD
Internationalization / Localization
TBD (awaiting spec)
Build / Install
The library installs via the standard Python distutils mechanism. It requires at least Twisted version 2.0: Compatibility testing will be required as Twisted updates become available.