A Brief History of ∆Q

Bristol University

Dr Neil Davies’ interest in system performance began within the first few weeks of his first job in 1980. He picked up the CCITT's X.25 ‘yellow book’ and noticed that the section on "Quality of Service" was "for further study". He thought, 'this can't be that hard' - then realised that it was not as simple as it first appeared! This line of study continued with his Ph.D. [1], and through his career at the University of Bristol, which also encompassed issues of safety-critical design. He identified a crucial intersection between issues of performance and safety in the question of how systems behave as they go into overload, i.e. when some resource is exhausted. Working with a Ph.D. student, Pauline Cobley, this led the realisation of how to model the ‘load that cannot be carried’ as intangible probability mass: ∆Q was born!

As time went on, Neil increasingly found that his interests were not shared by other members of the Department of Computer Science, and so he was pleased when the opportunity came in 1996 to take a sabbatical at PACT.

SRF/PACT

The Partnership in Advanced Computing Technology was a research institute run jointly by the University of Bristol and SGS-Thomson Microelectronics, with the aim of participating in collaborative research projects. Neil Davies was the Principal Investigator, on sabbatical from the University, and Peter Thompson joined as Senior Research Fellow from SGS-Thomson Microelectronics. Amongst the projects that PACT participated in were:

AT’MAX

AT’MAX was a collaborative R&D project funded by the European Union under its Framework programme. It ran from 1997 to 2000, and was based around an ATM access device that aimed to use ATM QoS features to deliver a variety of services multiplexed over a single 155 Mbit/s ATM connection. At its heart was a cell scheduling chip that embodied a wide range of scheduling algorithms that consequently had many (c. 2000) configurable parameters.

PACT’s role in the project was to find ways to map service requirements onto the available parameters. To this end, PACT employed Dr Judy Holyer, an applied mathematician, who worked with Neil Davies and Peter Thompson to produce a mathematical formulation of the problem, which led to several published papers [2,3]. It also led to the conclusion that the extant scheduling algorithms were inadequate, since they all failed to take account of the fundamental degrees of freedom in the problem. In short, of all the parameters that could be set in the scheduling chip, a crucial one was missing!

SWIFT

A further project funded under the EU Framework programme was SWIFT (1998 - 2000), which targeted construction of highly scalable Ethernet switches offering QoS.

The PACT team applied the insights gained in AT’MAX to develop a practical packet scheduling technique that was called ‘Guarantee of Service’ (GoS) to distinguish it from the host of ‘quality of service’ mechanisms. This was tested in simulation and found to be highly effective, and a patent was obtained.

The team was expanded to include Jeremy Bradley, a Ph.D. student at the University of Bristol researching performance modelling using stochastic process algebras [4], supervised by Neil Davies, and Dr. Graeme Smith, who had previously worked at GEC Hirst Research Centre, and later at HP.

Degree2 Innovations

A decision was taken to close down PACT, and four of the researchers on the SWIFT project (Neil Davies, Judy Holyer, Graeme Smith and Peter Thompson) formed a start-up called Degree2 Innovations to exploit the GoS IP in late 1999. Degree2 received equity funding in 2000 from another company called U4EA, backed by the Irish financier Dermot Desmond. U4EA pushed for the IP to be productised, although the Degree2 founders felt that more R&D was needed. Despite their misgivings, a product device was created, called a ‘FlowFusion’, to manage the scheduling of IP packets onto a T1 connection, so that, e.g. VoIP could be safely mixed with other traffic without any restriction on utilisation. The VoIP market was very immature at this point, however, and most users avoided the need for multiplexing by installing a separate data connection, so FlowFusion sales were disappointing. The product was enhanced to support the data rate of a T3 connection and called a ‘FlowFusion Pro’; this was launched at a trade fair in Atlanta, GA, on September 11th 2001; needless to say this was not an auspicious day to attract potential customers!

Some interest was generated, however, including from Boeing, Inc., who were the Prime Contractor for an ambitious US DoD programme called ‘Future Combat Systems’ that involved issues of QoS over ad-hoc wireless networking. When Neil Davies, disagreeing with the narrow product focus, parted company from Degree2, he took this customer relationship into a new venture. Joining him were Fred Hammond, who had been working for U4EA on business development in the US, and Dave Reeve, a Ph.D. student whom Neil supervised, and who did part of the work on his thesis [5] at the Degree2 office.

Judy Holyer and Graeme Smith also left the company, but Peter Thompson continued in a series of roles until he too left at the end of 2011. The company, by then renamed 'GoS Networks', closed down in 2015.

Predictable Network Solutions

Neil Davies and Fred Hammond founded Predictable Network Solutions (PNSol) in 2003, with Dave Reeve as an employee. Peter Thompson joined in 2012.

PNSol's various projects include:

Contention Manager

As part of his Ph.D. work, Dave had written a performance simulator, that was particularly efficient due to the way it modelled time. This idea was used to build a completely new implementation of the GoS algorithm, called a Contention Manager (now renamed a QTA-Multiplexor), which can schedule packets with a constant-time complexity. It also has a dynamic internal architecture allowing it to be reconfigured while in use, while still delivering performance assurances.

Future Combat Systems

Future Combat Systems (FCS) was a $300Bn programme of the US Department of Defense with the aim of ‘modernising’ the US Army, and in particular its battlefield communications. The idea was to create wireless ad-hoc networks in the field (called the Joint Tactical Radio System) and use them to carry IP traffic serving a variety of purposes, ranging from ultra-urgent ‘action messages’ (for example preventing ‘friendly fire’ incidents) to voice communications, video feeds, uploading of geographic datasets etc..

Unfortunately, the ‘best effort’ approach of standard IP gave no comfort to the military customers of the project that the most important messages would be delivered in a timely way, particularly when the network was stressed due to dynamic reconfiguration and/or interference by adversaries. What methodology could be used to calculate how such a network would perform and thereby establish its operational parameters? How could critical services be preserved on an overloaded network?

Boeing were unable to find any answer to these questions in the USA, so they invited PNSol to become a Tier-1 subcontractor to the project - the only one with a turnover of less than $1Bn! Special clearances were obtained to enable non-US PNSol staff to join the c. 6000 engineers on the project.

PNSol developed a ‘language’ for capturing applied load and performance requirements called ‘Quantitative Timeliness Agreements’ (QTAs). This was crucial, since it had become clear that every application on the system was designed on the assumption that the network was ‘theirs’, with no concept of the ‘opportunity cost’. Dealing with this required a calculus for reasoning about the performance of distributed computer systems, even in overload. This came to be called the ‘∆Q Framework’.

Using these tools, PNSol were able to demonstrate that the designated bearer system using wireless point-to-point and ad-hoc connections, was unable to meet all the operational requirements of FCS. After another year, FCS was cancelled, having used $30Bn of its $300Bn budget, although by this time PNSol were no longer part of the project, despite having helped save the US taxpayer $270Bn!

CERN

Neil Davies and Peter Thompson had a long history of collaboration with one of the teams inside CERN, the European particle physics laboratory, as they were one of the partners in the SWIFT project (and its precursor, ARCHES). Neil resolved a perplexing performance problem on a long-distance fibre link from CERN to a partner institution, and went on to mentor a young CERN researcher to apply the ∆Q Framework to another performance problem inside the ATLAS experiment (that went on to confirm the existence of the Higgs boson) [6]. This engagement helped to refine the process for measuring ∆Q along network paths that is now being standardised by the Broadband Forum.

Video for sign language

In 2005/6, PNSol developed a system to enable a small prototype ISP (serving a handful of end-users) to deliver multiple services over basic-rate (256kbits/s - 512kbits/s) ADSL broadband. Working with the Deaf Studies Trust and SignWales (who were also working with British Telecom), the lead application was video conferencing suitable for carrying British Sign Language communication. The deployment included:

  • the Contention Manager;
  • a LDAP-based configuration-managed system allowing a bespoke service mix per end-user;
  • a custom CPE (including a minimal CM);
  • and SIP signalling and data-management agents including loss-concealment for the video stream.
The system met all its technical objectives, and attracted interest from British Telecom. However the public funding for the primary use-case was cancelled due to political issues. PNSol was left with leading-edge expertise in the architecture and performance of broadband networks, that has subsequently been combined with large-scale measurements of delivered ∆Q to service a variety of operator clients. This knowledge was also fed into a report for the UK telecoms regulator, Ofcom, on the issue of detecting traffic management [7].

Cardano

PNSol became a contractor to IOHK to work on the Cardano project in 2017. Cardano is a third-generation blockchain based on a proof-of-stake consensus algorithm called Ouroboros. Proof-of-work systems such as BitCoin consume vast computational resources in a ‘race’ to mine blocks; proof-of-stake systems such as Cardano replace this with a system of ‘taking turns’ based on the stake distribution. This enables blocks to be produced more frequently, improving throughput and settling-time parameters, but with the risk of creating accidental ‘forks’ when a previous block fails to reach the next block producer in time. This introduces a strong timeliness constraint on the ‘diffusion’ of new blocks to all nodes, effectively making Cardano a globally-distributed real-time system. PNSol thus saw the need to apply the ∆Q Framework to ensuring the timeliness of block diffusion. This proceeded in three main directions:

  • Formalisation of the ∆Q algebra, originally within the Ψ-calculus (a formalisation of process algebra), and later in the context of Outcome Diagrams [12];
  • Application of ideas from process algebras and stochastic probes to instrument the code-base to support both profiling/benchmarking and in-life management of the production system;
  • Embedding of point-to-point ∆Q measurement into the operational decisions of the autonomous data-diffusion function, performing dynamic arbitrage between different routes to deliver the best application-level outcomes.

Broadband Forum

PNSol joined the Broadband Forum in 2018 at the suggestion of Vodafone (who have been a PNSol client for several years), and together they kicked off the Quality Experience Delivered (QED) initiative. This aims to provide the industry with a performance focus beyond peak download speed, and has produced a Broadband Forum Technical Report [9] and Marketing Reports [8,10,11]. Future work is intended to produce further external marketing documents and Technical Reports standardising aspects of the measurement process.

References

  1. Davies, N., The Performance and Scalability of Parallel Systems, University of Bristol, UK, 1994
  2. Davies, N., Holyer, J., and Thompson, P., An Operational Model to Control Loss and Delay of Traffic at a Network Switch, The Management and Design of ATM Networks pp 20/1- 20/14, 1999
  3. Davies, N., Holyer, J., and Thompson, P., End-to-end management of mixed applications across networks, IEEE Workshop on Internet Applications pp 12-19, 1999
  4. Bradley, J.T., Towards Reliable Modelling with Stochastic Process Algebras, PhD Thesis, Department of Computer Science, University of Bristol, UK, October, 1999
  5. Reeve, D. C. A New Blueprint for Network QoS, PhD thesis, Computing Laboratory, University of Kent, Canterbury, Kent, UK, August 2003
  6. Leahu, L. Analysis and predictive modeling of the performance of the ATLAS TDAQ network, PhD Thesis Bucharest, Tech. U. January 2013
  7. Assessment of traffic management detection methods and tools, Prepared for Ofcom under MC 316, August 2015
  8. MR-452.1 Motivation for Quality Verified Broadband Services, Published by the Broadband Forum, October 2019
  9. TR-452.1 Quality Attenuation Measurement Architecture and Requirements, Published by the Broadband Forum, September 2020
  10. MR-452.4 QED Uses in Lab Evaluation and Network Design, Published by the Broadband Forum, February 2021
  11. MR-452.2 Use of ΔQ to manage customer SLA, Published by the Broadband Forum, July 2021
  12. Mind Your Outcomes: Quality-Centric Systems Development, Published by Computers, March 2022