Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Attendees: Andy L, Roy W, Stephen S, Bob M, Meg S, Ken S, Matt N, Dave Y, Britton S,, Mark H , Stelios V, Michael F, Ally H, Dave R, Gareth F, Terry S, Cam Allen (Zooniverse), Nigel Hambley

Apologies:

Notes from discussion

...

Stephen S, LSST public and private; galaxies, stars and rocks

  • Bob asked where the 12-month history of an object originated.

    • Stephen confirmed it was in the first detection of an object

  • Dave Y asked whether it would be worthwhile augmenting Alert Sim with a light Kafka stream for the forced sources, to save other users having to query the Prompt Products database.

    • Stephen S believed, for that to be attractive to other groups, we would need to do this very quickly.

  • Transient alert stream should focus on fluxes rather than magnitudes for detected sources.

  • For visits in the plane, alert stream is likely to be dominated by stellar sources. Need to work out how to deal with this.

    • For example, suggests we need a data rate of 200Mb/s to the UK DAC.

  • Potential to significantly reduce database size if we only store DIAObjects in database and put sources into blob storage.

  • Meg S noted that solar system is small. Would it be a significant problem to include solar system objects in the database, and remove only the stellar objects.

  • Focus of Lasair is on extra-galactic transients.

  • Andy L proposes to engage Science Working Group members to determine what is/ isn’t useful for handling stellar objects, to ensure we capture stellar transients and outbursts.

  • Meg S asked if we had a summary of what other Community Brokers are planning to do?

    • Stephen does not believes this summary exists, but that it would be worthwhile to talk to others.

    • Ken will investigate what other community brokers plan to address in terms of science area.

  • Dave Y suggests that those interested in particular objects could use watchlist to ensure data was available in fast d/b

  • Dave M noted having multiple databases is a common strategy for handling high-rate data flows, as has been demonstrated by Social Media providers.

Ken Smith, HDD vs. SSD

  • Prompted by different experiences of using SSDs (instead of HDDs) to ingest data

  • Stephen ashed what was meant by ingestion.

    • Ken noted reading data from CSV file and inserting into a database. Like-for-like tests.

  • Difference between ZTF and LSST is that we do not need to associate sources to objects for LSST.

    • Both SSD and HDD were able to ingest 5k+ rows per second, equiv. to 0.5 Bn rows per day.

Roy Williams, Handling Lightcurves

  • Meg asked if strategy is based around the idea of prioritising stuff that needs to be handled urgently.

  • Stephen S asks whether user is capable of dealing with the stream?

    • Stephen is worried user can’t get what they want from Kafka stream (nor email alert).

    • Roy noted stream helps people identify object id and then look up more details in the database.

    • Alert is fairy rich for LSST, so monitoring the stream could significantly reduce load on the database.

    • Alert is the first indication, for a user, that there is something interesting.

  • Dave Y reminded people of the blog posts and tutorials for helping people use a Kafka stream. For example, we could show people how to filter and convert the Kafka stream and let them develop the basic template.

    • Meg noted that this is the approach that Zooniverse use.

  • George noted that typically a filter could produce hundreds or thousands of hits per night.

  • Gareth notes the intent to go with a modular approach to defining user workflows.

  • Gareth is concerned that Stephen’s presentation on forced photometry may prompt a revision to the model.

    • Stephen noted that, in first 24 hours, there would potentially be no forced photometry. After 24 hours, people would want this information

Roy Williams, Parallel Ingestion and Workflow

  • Dave M asked whether we use Kafka for data transfers, as this is what it is designed to do.

    • Gareth agrees for new components that is what we should do.

    • Cam noted potential risk with over using Kafka consumers and, instead using syncs/ Lambda function to eliminate bookkeeping and resilience in Kafka consumers.

    • Cam also notes Pulsar, as a competing technology to Kafka

    • Dave Y asks if Openstack includes components the same as AWS Lambda.

  • George B concerned the database is a bottleneck

    • Roy concurs and notes we are looking at strategies to minimise the traffic into databases, to the minimum

    • Dave M noted that Cassandra could be parallelised

  • Andy L noted Edinburgh Kafka meet-up, which discussed event-driven architectures.

    • Roy felt we didn’t want the full functionality of Kafka: we just wanted a data pipeline.

  • Dave M noted that modular approach would help to accommodate necessary changes, such as for forced photometry

Gareth Williams, SQL Queries on Data Streams