...
Attendees: Andy L, Roy W, Stephen S, Bob M, Meg S, Ken S, Matt N, Dave Y, Britton S,, Mark H , Stelios V, Michael F, Ally H, Dave R, Gareth F, Terry S, Cam Allen (Zooniverse), Nigel Hambley
Apologies:
Notes from discussion
...
Stephen S, LSST public and private; galaxies, stars and rocks
Bob asked where the 12-month history of an object originated.
Stephen confirmed it was in the first detection of an object
Dave Y asked whether it would be worthwhile augmenting Alert Sim with a light Kafka stream for the forced sources, to save other users having to query the Prompt Products database.
Stephen S believed, for that to be attractive to other groups, we would need to do this very quickly.
Transient alert stream should focus on fluxes rather than magnitudes for detected sources.
For visits in the plane, alert stream is likely to be dominated by stellar sources. Need to work out how to deal with this.
For example, suggests we need a data rate of 200Mb/s to the UK DAC.
Potential to significantly reduce database size if we only store DIAObjects in database and put sources into blob storage.
Meg S noted that solar system is small. Would it be a significant problem to include solar system objects in the database, and remove only the stellar objects.
Focus of Lasair is on extra-galactic transients.
Andy L proposes to engage Science Working Group members to determine what is/ isn’t useful for handling stellar objects, to ensure we capture stellar transients and outbursts.
Meg S asked if we had a summary of what other Community Brokers are planning to do?
Stephen does not believes this summary exists, but that it would be worthwhile to talk to others.
Ken will investigate what other community brokers plan to address in terms of science area.
Dave Y suggests that those interested in particular objects could use watchlist to ensure data was available in fast d/b
Dave M noted having multiple databases is a common strategy for handling high-rate data flows, as has been demonstrated by Social Media providers.
Ken Smith, HDD vs. SSD
Prompted by different experiences of using SSDs (instead of HDDs) to ingest data
Stephen ashed what was meant by ingestion.
Ken noted reading data from CSV file and inserting into a database. Like-for-like tests.
Difference between ZTF and LSST is that we do not need to associate sources to objects for LSST.
Both SSD and HDD were able to ingest 5k+ rows per second, equiv. to 0.5 Bn rows per day.
Roy Williams, Handling Lightcurves
Meg asked if strategy is based around the idea of prioritising stuff that needs to be handled urgently.
Stephen S asks whether user is capable of dealing with the stream?
Stephen is worried user can’t get what they want from Kafka stream (nor email alert).
Roy noted stream helps people identify object id and then look up more details in the database.
Alert is fairy rich for LSST, so monitoring the stream could significantly reduce load on the database.
Alert is the first indication, for a user, that there is something interesting.
Dave Y reminded people of the blog posts and tutorials for helping people use a Kafka stream. For example, we could show people how to filter and convert the Kafka stream and let them develop the basic template.
Meg noted that this is the approach that Zooniverse use.
George noted that typically a filter could produce hundreds or thousands of hits per night.
Gareth notes the intent to go with a modular approach to defining user workflows.
Gareth is concerned that Stephen’s presentation on forced photometry may prompt a revision to the model.
Stephen noted that, in first 24 hours, there would potentially be no forced photometry. After 24 hours, people would want this information
Roy Williams, Parallel Ingestion and Workflow
Dave M asked whether we use Kafka for data transfers, as this is what it is designed to do.
Gareth agrees for new components that is what we should do.
Cam noted potential risk with over using Kafka consumers and, instead using syncs/ Lambda function to eliminate bookkeeping and resilience in Kafka consumers.
Cam also notes Pulsar, as a competing technology to Kafka
Dave Y asks if Openstack includes components the same as AWS Lambda.
George B concerned the database is a bottleneck
Roy concurs and notes we are looking at strategies to minimise the traffic into databases, to the minimum
Dave M noted that Cassandra could be parallelised
Andy L noted Edinburgh Kafka meet-up, which discussed event-driven architectures.
Roy felt we didn’t want the full functionality of Kafka: we just wanted a data pipeline.
Dave M noted that modular approach would help to accommodate necessary changes, such as for forced photometry