Lasair Tech Review: Andy's version of key Qs

This is mostly Roy’s key questions, with some additional things and a little re-phrasing. Ideally we should turn each of these into either a decision; or an activity in Cycle 2; or something we punt downstream to Cycle 3 or later.

  • Testing

    • Should we make our own simulated alert stream? No for now

  • Sherlock

    • Should we incorporate classification by RAPID or similar?

      • No but make sure infrastructure can handle additional classifiers

    • Can we get degree of confidence in classification? No for now

    • Should we make it simpler to add new catalogues to Sherlock?

      • No, we should not design Sherlock 3.0 for users to add catalogues. It is not trivial, catalogues can be proposed to us for inclusion.

  • Science Driver Issues

    • Should we engage Science Working Group or others in prioritisation of functionality

      • E,g,, should we ignore solar system objects in Lasair? (size..)

        • Claim: SS could be Low effort, high return. Nobody else doing it?

    • Is there a science case for stellar transients in Lasair? (size..)

      • Survey the community: what science is alerts and what is data releases?

      • Stephen will do survey

    • Should we have separate DBs for different science?

      • Understand how to have multiple databases/schema/website

    • What science areas are other Community Brokers planning to address?

      • Dave Y and Stephen volunteered to try to find out.

  • Science Platform

    • Use Firefly as GUI? Or just embody ideas? ADQL, form based query

      • Not now but keep it in mind

    • Should we incorporate Nublado into Lasair?

      • Yes when its ready. Small difference from Jupyterhub. Will provide LSST images.

    • Should we set up a TAP service for Lasair? Could put Topcat in front.

      • Yes as spinoff from LSP (Stelios)

  • Queries

    • Do we restrict queries, and if so how? Vizier-type form? Parse SQL like WSA, VSA?

      • Yes: Form as default, freeform for advanced users with review/optimise process

      • Must also consider API/Jupyter queries

      • Review and propose the new system (Gareth and Andy)

      • 3 layers: Form + SQL + Jupyter

    • Do we offer filtering on the Kafka stream to users? build queries in KSQL?

    • If so, do we also keep the close connection between static and streaming queries?

      • Yes if possible

  • Relational database

    • Are light curves in there? Is forced phot in there? Should Sources be kept in blob storage, leaving a lean relational database for Objects?

      • Ken build “blob store” with NoSQL and compare with CephFS

      • Need to build a data-mining API to this store, with query mechanism

    • Is there just one RDBMS or several? (see above)

    • Can we make a decision now on Cassandra or other noSQL?

      • Not for relational database

    • Survey science group for features of light curves

    • Build set of representative queries that need to be fast with testrig

  • Hardware

    • Do we plan to use SSDs or not?

      • For relational db, not for blob store

    • Can we get openstack nodes with ssd? How many, how much? Or even spinning disk

      • Tell Mark, George about implementation plans (Gareth)

      • IRIS/Cambridge doing this for SKA

  • Watchlists and user data

    • Measure scalability of watchlists (Roy)

    • Subsets of the MIllion Quasar catalog from Vizier, with Source radius = 2 arcsec, Matched against 2 million objects (instructions)

      50 sources, upload 0.3 sec, crossmatch 1 sec
      1000 sources, upload 2 sec, crossmatch 4 sec
      10000 sources, upload 19 sec, crossmatch 47 sec
      100000 sources, upload 270 sec, crossmatch 530 sec
      1000000 sources, browser cannot paste it,

  • “Kafka Inside” or not

    • Should we base architecture around Kafka vs http, scp?

    • Should we make a test version with Kafka inside?

      • Yes. Two versions of Sherlock server. DaveM will do Kafka version. Roy will do http version.

      • DaveM, Gareth, Roy will meet for technical meeting

  • Service resilience

    • What are the changes we should make to improve resilience of Lasair?

      • LSST Targets for rebuild from backup

      • Resilience plan will be in Community Broker proposal

        • Database replication and hot spare

        • Easy re-deployment by containers and kubernetes



 

 

If you require this document in an alternative format, please contact the LSST:UK Project Managers lusc_pm@mlist.is.ed.ac.uk or phone +44 131 651 3577