Lasair Tech Review: Andy's version of key Qs
This is mostly Roy’s key questions, with some additional things and a little re-phrasing. Ideally we should turn each of these into either a decision; or an activity in Cycle 2; or something we punt downstream to Cycle 3 or later.
Testing
Should we make our own simulated alert stream? No for now
Sherlock
Should we incorporate classification by RAPID or similar?
No but make sure infrastructure can handle additional classifiers
Can we get degree of confidence in classification? No for now
Should we make it simpler to add new catalogues to Sherlock?
No, we should not design Sherlock 3.0 for users to add catalogues. It is not trivial, catalogues can be proposed to us for inclusion.
Science Driver Issues
Should we engage Science Working Group or others in prioritisation of functionality
E,g,, should we ignore solar system objects in Lasair? (size..)
Claim: SS could be Low effort, high return. Nobody else doing it?
Is there a science case for stellar transients in Lasair? (size..)
Survey the community: what science is alerts and what is data releases?
Stephen will do survey
Should we have separate DBs for different science?
Understand how to have multiple databases/schema/website
What science areas are other Community Brokers planning to address?
Dave Y and Stephen volunteered to try to find out.
Science Platform
Use Firefly as GUI? Or just embody ideas? ADQL, form based query
Not now but keep it in mind
Should we incorporate Nublado into Lasair?
Yes when its ready. Small difference from Jupyterhub. Will provide LSST images.
Should we set up a TAP service for Lasair? Could put Topcat in front.
Yes as spinoff from LSP (Stelios)
Queries
Do we restrict queries, and if so how? Vizier-type form? Parse SQL like WSA, VSA?
Yes: Form as default, freeform for advanced users with review/optimise process
Must also consider API/Jupyter queries
Review and propose the new system (Gareth and Andy)
3 layers: Form + SQL + Jupyter
Do we offer filtering on the Kafka stream to users? build queries in KSQL?
If so, do we also keep the close connection between static and streaming queries?
Yes if possible
Relational database
Are light curves in there? Is forced phot in there? Should Sources be kept in blob storage, leaving a lean relational database for Objects?
Ken build “blob store” with NoSQL and compare with CephFS
Need to build a data-mining API to this store, with query mechanism
Is there just one RDBMS or several? (see above)
Can we make a decision now on Cassandra or other noSQL?
Not for relational database
Survey science group for features of light curves
Build set of representative queries that need to be fast with testrig
Hardware
Do we plan to use SSDs or not?
For relational db, not for blob store
Can we get openstack nodes with ssd? How many, how much? Or even spinning disk
Tell Mark, George about implementation plans (Gareth)
IRIS/Cambridge doing this for SKA
Watchlists and user data
Measure scalability of watchlists (Roy)
Subsets of the MIllion Quasar catalog from Vizier, with Source radius = 2 arcsec, Matched against 2 million objects (instructions)
50 sources, upload 0.3 sec, crossmatch 1 sec
1000 sources, upload 2 sec, crossmatch 4 sec
10000 sources, upload 19 sec, crossmatch 47 sec
100000 sources, upload 270 sec, crossmatch 530 sec
1000000 sources, browser cannot paste it,
“Kafka Inside” or not
Should we base architecture around Kafka vs http, scp?
Should we make a test version with Kafka inside?
Yes. Two versions of Sherlock server. DaveM will do Kafka version. Roy will do http version.
DaveM, Gareth, Roy will meet for technical meeting
Service resilience
What are the changes we should make to improve resilience of Lasair?
LSST Targets for rebuild from backup
Resilience plan will be in Community Broker proposal
Database replication and hot spare
Easy re-deployment by containers and kubernetes
If you require this document in an alternative format, please contact the LSST:UK Project Managers lusc_pm@mlist.is.ed.ac.uk or phone +44 131 651 3577