Lasair Tech Review: Andy's version of key Qs

This is mostly Roy’s key questions, with some additional things and a little re-phrasing. Ideally we should turn each of these into either a decision; or an activity in Cycle 2; or something we punt downstream to Cycle 3 or later.

Testing
- Should we make our own simulated alert stream? No for now
Sherlock
- Should we incorporate classification by RAPID or similar?
  - No but make sure infrastructure can handle additional classifiers
- Can we get degree of confidence in classification? No for now
- Should we make it simpler to add new catalogues to Sherlock?
  - No, we should not design Sherlock 3.0 for users to add catalogues. It is not trivial, catalogues can be proposed to us for inclusion.
Science Driver Issues
- Should we engage Science Working Group or others in prioritisation of functionality
  - E,g,, should we ignore solar system objects in Lasair? (size..)
    - Claim: SS could be Low effort, high return. Nobody else doing it?
- Is there a science case for stellar transients in Lasair? (size..)
  - Survey the community: what science is alerts and what is data releases?
  - Stephen will do survey
- Should we have separate DBs for different science?
  - Understand how to have multiple databases/schema/website
- What science areas are other Community Brokers planning to address?
  - Dave Y and Stephen volunteered to try to find out.
Science Platform
- Use Firefly as GUI? Or just embody ideas? ADQL, form based query
  - Not now but keep it in mind
- Should we incorporate Nublado into Lasair?
  - Yes when its ready. Small difference from Jupyterhub. Will provide LSST images.
- Should we set up a TAP service for Lasair? Could put Topcat in front.
  - Yes as spinoff from LSP (Stelios)
Queries
- Do we restrict queries, and if so how? Vizier-type form? Parse SQL like WSA, VSA?
  - Yes: Form as default, freeform for advanced users with review/optimise process
  - Must also consider API/Jupyter queries
  - Review and propose the new system (Gareth and Andy)
  - 3 layers: Form + SQL + Jupyter
- Do we offer filtering on the Kafka stream to users? build queries in KSQL?
- If so, do we also keep the close connection between static and streaming queries?
  - Yes if possible
Relational database
- Are light curves in there? Is forced phot in there? Should Sources be kept in blob storage, leaving a lean relational database for Objects?
  - Ken build “blob store” with NoSQL and compare with CephFS
  - Need to build a data-mining API to this store, with query mechanism
- Is there just one RDBMS or several? (see above)
- Can we make a decision now on Cassandra or other noSQL?
  - Not for relational database
- Survey science group for features of light curves
- Build set of representative queries that need to be fast with testrig
Hardware
- Do we plan to use SSDs or not?
  - For relational db, not for blob store
- Can we get openstack nodes with ssd? How many, how much? Or even spinning disk
  - Tell Mark, George about implementation plans (Gareth)
  - IRIS/Cambridge doing this for SKA
Watchlists and user data
- Measure scalability of watchlists (Roy)
- Subsets of the MIllion Quasar catalog from Vizier, with Source radius = 2 arcsec, Matched against 2 million objects (instructions)
  50 sources, upload 0.3 sec, crossmatch 1 sec
  1000 sources, upload 2 sec, crossmatch 4 sec
  10000 sources, upload 19 sec, crossmatch 47 sec
  100000 sources, upload 270 sec, crossmatch 530 sec
  1000000 sources, browser cannot paste it,
“Kafka Inside” or not
- Should we base architecture around Kafka vs http, scp?
- Should we make a test version with Kafka inside?
  - Yes. Two versions of Sherlock server. DaveM will do Kafka version. Roy will do http version.
  - DaveM, Gareth, Roy will meet for technical meeting
Service resilience
- What are the changes we should make to improve resilience of Lasair?
  - LSST Targets for rebuild from backup
  - Resilience plan will be in Community Broker proposal
    - Database replication and hot spare
    - Easy re-deployment by containers and kubernetes