IVOA Interop May 2017 - VOEvent

Follow up notes from the VOEvent session at the May 2017 IVOA Interop meeting in Shanghai.

Kafka at LSST

Informative presentation by Maria Patterson about plans to use Apache Kafka in combination with Apache Avro as the main delivery system within LSST.

As yet we don't have any direct experience with these. We plan to develop some experimental systems in the next few weeks to learn more about them.

It looks likely that we will use Kafka to connect our 'broker' to the main LSST fire-hose. Whether we would use Kafka to deliver the event stream to our downstream clients is up for discussion.

What does "live stream" mean ?

Very interesting discussion, started by asking about the use case for a replay facility, particularly in relation to the replay feature provided by Kafa and what LSST may be able to provide.

The discussion evolved into looking at the differences between synchronous and asynchronous transactions, real time alerts and queries to archives.

In particular, Hendrik Heinl and Kai Polsterer felt there was a significant difference between live and historical events, and that requesting a replay of past events was conceptually different to receiving stream of a live events and so should be modelled as a separate operation with a separate protocol.

From a technical perspective I suggested that the 'liveness' of the events in a stream is less relevant than the fact that they are a stream. Receiving a stream of events from yesterday is similar to receiving a stream of live events as they are generated. The technical aspects the server sending a stream of events and the client processing them as they arrive are the same. The ability to request a stream of events starting from 2 hours ago, where the response starts by sending the past events and until it catches up to and continues with the live stream blurs the distinction between historical and live events.

No doubt we will continue the discussion ...

What does "broker" mean ?

Discussions between scientists and engineers during and after the session revealed that there is confusion about what the term "broker" means.

The same term is being used to refer to two distinct things.

  1. The network transport layer component that distributes messages to subscribers.
  2. The science use case level components that generate, aggregate, filter, process and annotate alerts.

I have raised this concern with Maria and she is going to see what the people at the Tucson meeting think.

What does "mini-broker" mean ?

In my own mind there is still some confusion about how much LSST will provide to end users via what are currently referred to as the "mini-brokers".

Following the workshop at Edinburgh in April :

  1. LSST have said they will provide the full fire-hose stream to a limited set of down stream "brokers", who will then be responsible for providing topic specific streams to the general science community.
  2. LSST are also describing "mini-brokers", which are filters that will run inside the LSST system providing custom streams direct to end users.

To some extent we are still learning what these "mini-brokers" will actually do.

Slack message from David Young attending the meeting in Tucson:

"A little more info in the LSST 'mini'-brokers mentioned at the Edinburgh meeting last month. Eric Bellm clarified in his talk here in Tucson that the 'mini-brokers will only work on the information available in the alert packet', i.e. no catalogue crossmatching filtering etc."

https://lsstuk.slack.com/archives/C5G254SJF/p1495471908343393

The general idea seems to be that they will provide sufficient filtering to reduce the high bandwidth fire-hose to a specialised low bandwidth stream for each subscriber.

I still have a number of questions concerning "mini-brokers" :

  1. What science use cases do the "mini-broker"s meet ?
  2. Who will be allowed to register and subscribe to a "mini-broker" ?
  3. How much bandwidth will a "mini-broker" be allowed to use ?
  4. If the "mini-brokers" are only able to offer simple filtering, then their output will be still be a relatively high bandwidth fire-hose.

  5. How much compute resources will a "mini-broker" be allowed to use ?
  6. If the "mini-brokers" are able to do more complex filtering, then they will require a relatively high level of compute resources to complete the processing it time.

  7. How will we allocate and/or measure these ?

I have a suspicion that we are designing "brokers" and "mini-brokers" because they seem like good things to have, rather than to meet specific science use cases.

My own hunch is that the majority of science use cases will sit in the gap between the full fire-hose stream and the simple filtering offered by the "mini-brokers".

Which is what LSST are punting to the "community brokers".


If you require this document in an alternative format, please contact the LSST:UK Project Managers lusc_pm@mlist.is.ed.ac.uk or phone +44 131 651 3577