Lasair Cycle-4 mid-Project Review: notes and actions from (08/SEP/21)

Time and Venue: Wednesday 8th September 2021, via Zoom

Attendees: Andy Lawrence, Roy Williams, Terry Sloan, Cosimo Inserra, Stephen Smartt, Sara Casewell, Dave Young, Micheal Fulton, Julien Peloton, Ken Smith, Jakob Nordin, Gareth Francis, Matt Nicholl, Eric C Bellm, Stelios Voutsinas, Meg Schwamb, George Beckett, Nic Wolf,

Apologies:

Meeting public page

Notes from discussion

Session-1: Overviews

Welcomes (AndyL)

Please use the Zoom Chat discussion. This will be saved at the end of meeting.
Code of Conduct reminder
- Meeting contacts are AndyL and SaraC

LSST:UK context (Stephen Smartt)

Hope to get funded in Phase C but this will not count towards in-kind. It will however support other UK work that does count towards in-kind.
Spectra information inclusion is increasingly important. UK in a strong position.
The spectrograph of particular note to Lasair is 4MOST, the spectrograph for the VISTA telescope. TiDES is the part of the 4MOST survey focused on transients.

Lasair long term plan (Andy Lawrence)

Today’s meeting primarily concerned with the revised 4 year plan for the remainder of phase B.
Issues from today will noted and fed into the plan.
Overall LSST:UK phases have not changed relative to the Rubin schedule.
Current Cycle 4 of the Lasair plan is taking stock prior to build.
Key change - original plan was ZTF prototype would be frozen at V2 however further ZTF development has continued.
Overall Lasair architecture now believed to be set.
A number of key technology changes including assuming that users will be using Google Colab.

1325: Discussion

Cosimo Q - for Stephen regarding spectroscopic addition to Lasair, confused about the various projects. 4most tides does not do targetted opportunities. So it might be 20 to 100 days after the LSST discovery. Can understand how SOXS can be contribute but not so much 4MOST. So how much can these spectra actually contribute. What spectra information will be available and in what format?
- StephenS A - correct that 4MOST pointing cannot be controlled but it is likely to follow the LSST footprints. It is possible to control the fibre pointing within the fields of 4MOST. Anything brighter than magnitude 22 in the field of 4MOST will have a fibre put on it. Spectra of 20K to 30K transients plus 30K host galaxies are expected but cannot control exactly when the spectra will be collected. The challenge is to get objects out of the alert stream and onto the the 4MOST schedule. If fibre can be put on them then when an object is 4MOST observed, then the LSST:UK UK WP3.3 will recognise that the fibre is in place, make estimates and the information will then go back to Lasair as a annotation. All of the data will be public in ESO archives. We will get information back asap into the broker.
Cosimo Q - do you plan the same same thing for SOXS?
- Stephen S A - yes, there will be some link from Lasair.
Jakob Q - How well defined is the UK programme . Have the spectra discussed what they require in terms of triggering from LSST? To what degree has the UK:LSST community specified their LSST plans, such that these can be used as Lasair requirements.
- Stephen S A- yes these discussions are happening at the conceptual level. The triggers are classified as supernovae by sherlock (not an agn, not a variable star…). Essentially if bright enough for 4MOST to take a spectra then put a fibre on it.
Jakob Q - sounds like UK will only do supernvoa?
- Stephen A - actually mean extragalactic transients
Question from JulienP - Funding context was highlighted, but the focus was on PI/developers. What about the hardware funding status and plan ?
- Stephen S answer - This will be discussed in afternoon. Top level funding is secured. Lasair is a small part of the UK computing. LSSTUK is a small part of UK IRIS. So overall Lasair are not the dominant partner and hence we believe Lasair is safe. Andy agrees and explained that the plan for UK expected requirements is being put in to UKRI. This is for the 10 years operation and includes Lasair.

Session-2: ZTF prototype experience (chair: Stephen Smartt)
Lasair-ZTF and how it works (Roy Williams)

query, filter and stream are all the same in Lasair
Sherlock - cross-matches location of transient with other catalogs
Julien Q to Roy - screen annotation - what is the feedback on this? What can you say about user-based annotations?
- Roy answer - We have not built the system yet. Michael will talk later about one of the classifications. We would like to share classifications from other brokers eg FINK, if acceptable.
Sara Q - stream annotation - are there plans to create a collaboration so people are not competing on the same follow-ups? Will the annotations become public on a “follow-up” list or similar?
- Roy answer - Marshall refers to a group of people able to submit opinions on transients and make decisions.
- Andy answer - Marshall building is beyond our scope to solve this.
Jakob - annotations are hard. We decided instead to be flexible so people can do what they want but they then have the responsibility to maintain and make useful. On 2nd slide, no arrow to/from Rubin Science platform. Have you thought about how this relates to the DB contents. How will eg. updated photometry be incorporated?
- Andy answer- grimly aware need to think about this but not yet solved.
Jakob Q to Eric - how frequently can we expect updates and show seriously to take these.
- Eric answered - let discuss during talk

Science experience with ZTF (Matt Nicholl)

Experiments with Lasair-ZTF
Use cases
- Nuclear transient research currently migrating to Google Colab notebook
  - Difficulties are: filtering on colour, streams clogged with old objects, not having full light curve history (this is the biggest problem)
  - Intend doing citizen science with light curves. Documentation to do this was helpful
- Lensed transients
Email alerts could be more human readable
A thumbnail for each object on streaming page would be helpful
Testing spectra in advance would be good.
Email alerts and web interface are what
Roy stated that the light curves are coming so the major problem identified will be solved. Like/dislike for an object is the tail of tiger and could mean Lasair in effect building marshalls. As yet do not have the facility to push data into notebooks but will let you know this ready. facility yet.
Eric Q to Matt - Regarding the boundaries with marshalls, how do keep track of candidates?
- Matt answer - luckily the numbers are not big so not a problem yet. Have been using a note in Google collab. Also within research group have started tentative steps to building our own marshall.
Cosimo Q to Roy - Agree color curve would be fantastic. What about using Gaussian processes to interpolate the light curves in order to produce colours (with uncertainties)?
- Roy - happy to look at Gaussian code and to include this in the Lasair pipeline.
- Cosimo will forward the code to Roy.
Jakob Q to Roy - you are all using Google colab but we have encountered python limitations
- Roy answer - Ken will talk about this later.
- Ken - do not know how Google Colab behaves when it requires C binaries other than NumPy. Not yet installed own C code. We should explore this.
- Gareth answer- aware that Nubaldo will be running on RSP next door so if Google Colab not suitable then can consider integration with the RSP.

Differences between ZTF and LSST (Eric Bellm)

ZTF is on telescope that is almost 75 years old
Slide 3 shows physical differences with Rubin. Only advantage for ZTF is in field of view. Total number of exposures per night will be similar on both telescopes.
Major difference is ZTF only processes the data once i.e. as live data comes off the telescope.
Q Julien on chat - Cutouts to be transmitted for LSST are template and difference. Why choosing template over science? The science one is probaby what I look first to judge quality.
- A- The reason lost to pre-history. aim was to get 2 out of 3. Aim to go to 3 cut-outs.
Question: (Roy W asked) Can Lasair get away with using the LSST cutout service instead of saving them in our own database? How many can we fetch per day?
- Answer: Do not think Lasair should use US cutout service. Capacity limits are not clear at moment, but cut-outs won’t become available until after the underlying images, which means a 3-day delay (in line with Government requirements).
Q Jakob - Regarding upper limits on forced photometry, will there always be 12 month limits on alerts with varying limits for forced photometry.?
- A: Not built rigorously in pipelines, so remains conceptual. Should have history of every time observe a reason, and provide upper limit on noise estimate where don’t have forced photometry. This can change from night to night.
- A: Details of workload management for PPDB are to be confirmed, though it is a concern. Several ways to think about this. Outpcome of Broker W/shop was action on project to offer a database export of PPDB (world-readable) for those who require significant information from DB.
Davy asked to clarify that forced photometry in alerts in one epoch behind actrual alert.
- Eric confirmed this was case. Getting triggering DSRs, but will. Eric will make a note and hope anomaly can be achieved without significant effort.

Session-3: Architecture and Technology (chair: Andy Lawrence)
Lasair-LSST Architecture (Roy Williams)

Roy will also explain why the architecture is what it is.
Platform maintains ~1-week cache of alerts from ZTF stream
Cassandra and Galera are both scalable. Aim to be able to dynamically scale to deal with changing load
New feature - external annotator - can receive a stream from Lasair and then query DB to draw conclusions. This might be a classifier – e.g., Zooniverse.
Lasair user interface is SQL-based (specifically, simple SELECT), but supports both on-demand and streaming queries
Supports community contributions (queries, classifications, etc.)
Architecture based on scale-up of low-power VMs

Kafka processing (Gareth Francis/Roy Williams)

Kafka acts as the bus to enable the updates on the Lasair pipeline.

Hardware implementation (Roy Williams)

Julien P asked “Do you make the list of queries public such as others can take inspiration or just play it as is for their own work? Is this something the user that makes the query can choose (public/ private)?
- A: Each query has a tick box for making public/ private, as you wish. Public queries can be copied and modified.
Julien P also asked “Is Sherlock doing batch or live processing of the incoming stream?”
- A: Sherlock is doing batch processing, as is more efficient, though effectively same outcome as batches are very quick.
- Julien noted batch processing may be less similar for LSST, due to data rates. May wish to prioritise processing based on science case. Has Lasair team thought about this?
- Dave Y noted that Sherlock consumes whatever it is given. Code is multi-processed and scalable, so believe will be quick enough.
- Gareth noted currently running batches of up to 800 alerts (if there are enough). This has been seen to be a reasonable compromise, taking 3--4 minutes according to content.
Andy L asked what was delay in seeing output from an alert
- Roy noted around 30 minutes, based on recent sampling
- Andy L noted that Rubin aspiration is 60 seconds.
- Roy noted some of 30 mins might be within ZTF infra, rather than Lasair. Roy does not see significant use cases for 60-second avalability.
- Eric agrees not a lot of science cases need 60-second turn-around. Has been proposed as de-scope opportunity, but has seen pushback from Rubin Directorate for this. Rubin is still aiming to achieve 60 second latency. Once science cases mature, scope to evolve functionality to meet requirements.
- Andy noted that, if people need 1-min or 5-min latency, then hopefuly will be achievable somewhere in Rubin landscape – not necessarily in Lasair.
- Jakob N noted that idea to have everything evolve around queries and streams is very nice and supports science reproducability. Was intrigued to see Kafka used as a bus, as not something considered by Jakob’s group, to date. Jakob using combination of SQL and No-SQL databases, but for different things. SQL for cut-outs, no-SQL for objects and annotations. Previously proposed to have workshop focusing on database choices. Royt confirmed that this would be interesting. Ken noted that Pit? Google trying to organise workshop on this in late October/ early November (Ken is on Organising Committee). Stephen proposes dedicated session on databases at workshop and encouraged Ken to press for this.

Light curve classification (Michael Fulton)

Building photetric classifier to be integrated into Lasair. Transient detection rate far exceeds rate of classificatoin. Will get worse with LSST.
Plan to run FastFinder on LSST and highlight potential transients very quickly.
ML techniques work better with well-sampled light curves, but often too late for real science. Need to prioritise speed, while not losing reliability, so have chosen parameter fitting (feature selection comparison to parameter space of fast transients).
Template Datastore for lots of different fast transients, from LSST, Pan-STARRS, and …
Features including rise time, peak brightness, etc. extracted and used for classification step.
FastFinder produces list of scores for a candidate, based on different features. high-scoring features are favoured.
Approach is simple, but producing promising results, based on sampling with kilonova class transients, even when peak brightness not observed.
- FastFinder in use with PANSTARRS. Found 2021qvw, for example.
- FastFinder will act as an external annotator (see previous talk from Roy).
- Runs independent of Lasair.
FastFinder outputs could be included in Lasair website, in dedicated section for external annotations.

Eric Q to Michael “can you say more about how you are fitting/estimating the light curves features (peak time, rise and fall times)”.
- A. Rise time is the what was peak magnitude, what was the starting magnitude and what was the time between the two. Polynomial fit applied. This depends on the number of points available.
Julien P Q to Michael - Are predictions done using the aggregated light-curve or the last alert information ? What about the evolution of score, do you report the full history of classification or just the last score? Julien asked if prediction is based on light curve
- A. Prediction is based on position in parameter space of candidate. This is done for all features and results normalised to create score.
- A Lasair will only store the most recent classification from FastFinder, but hope to have option to click through to history of FastFinder classiificatons.
Cosimo asked whether FastFinder needs information from different bands to be on same day or in 3--4 day timeframe?
- Features are extracted as they become available, and information is derived as is possible based on available features. So, need to advise how many features have been used to create classification, as indication of relative reliability of classification.
Jacob noted using absolute magniture, which relies on good redshift. how sensitive is classifier. Also why not run on ZTF?
- Redshifts have been measured from Sherlock, so errors are attributable to Sherlock. If host nearby does not have redshift then object is ignored. Risk is minimal if have secure association with galaxy of known redshift. Likely dominated by errors in photometry (if spectroscopic). If not, then photometry is likey source of error, so make assumption know distance and filter on that. Idea is to filter down to small number of objects to follow up (quickly) for further follow-up. Michael noted absolute magnitudes do contain redshift error bars, so FastFinder can consider that as part of prediction.
- FastFinder is being run on ZTF.
Andy L noted that this is important as resolves argument of whether or not to provide lightcurve classifier within main Lasair service. Michael has provided an alternative method and, by incoprprating into Lasair, means is easily available. Question about how widely to make this option available.
Andy asked if scores are just a ranking method or do they have quantitative meaning (e.g. probabilities).
- Michael noted not true probabilities but derived from probability fiunction of template space. Scores add up to 100, so could be pictured as percentages, but they are not. Michael plans to improve scoring so quantitatively meaningful.
- Andy believes this should be straightforward to do.
Cosimo also asked is there plan to increase the database [of templates] to deal with new discoveries?
Cosimo Q on chat “Do you need a data point in each band or not? What are the plans to increase the data templates.”
- StephenS answered on chat . “one band is enough to get a score. The date templates are the most work, and yes we do plan to increase templates “

Session-4 Science processing and functionality (chair: Meg Schwamb)
Adding value: Sherlock (Dave Young)

overview of sherlock and how it works with Lasair
Purpose of Lasair annotations
Sherlock attempts to predict nature of an object based on cross-matches found
Sherlock is survey agnostic but the algorithm can be tweaked for specific surveys
Code is Python but modularised so the algorithm can be written in plain text
After intelligent cross-matching, Sherlock merges the associations. Ranking algorithm attempts to identify the best association.
Ultimately the resulting prediction is only as good as the underlying data. Expect LSST data therefore to improve many situations.
The better and the greater the number of annotations the more powerful the algorithms.
JulienP asked “What would be the cost of changing the contextual classification? Is this something envisaged in the future?
- Dave Y noted that could build in sub-classes, though would not recommend changing top-level classifiers. E.g., could try to specify supernova type. Also scope to make classification more fine-grained.
- Julien follow-on Q - What will happen to the past data?
  - Meg - let’s answer this later.

Annotation, features (Roy Williams)

DMTN-118 says what Rubin is providing.
In addition to ZTF, have added eg different ways of looking at the position, timings, first detection, …. Not interested in periodic data.
Difficult to change the set of features since it would need rebuilt in the relational DB. What can be added to the LSST list?
There are potential Sherlock attributes.
External annotators: From lasair, a query pushes out a Kafka stream to the external annotator of candidates where the annotation could be run on. Results are sent back from the external annotator and taken into Lasair.

Databases and storage : SQL, Cassandra, CephFS (Ken Smith)

Nic Wolf (Antares team) joined the session to accompany Ken.
Galera/MariaDB/ Cassandra in Lasair (Need slides from Ken - messaged him)
DB dump is not sustainable in the future due to sheer number of rows etc. Galera offers replication where all nodes are equal. Reading/writing tasks can be distributed to the different nodes.
Galera well integrated with MariaDB but not so well with MySQL hence the MariaDB choice.
Many of the Galera tools are free including the cluster control interface.
Detections originally stored as files on CephFS but worried this will not scale.
Cassandra writes better than reads. Widely adopted. Other brokers use it eg. Antares and core LSST (see DMTN-184).
NoSQL - not only SQL according to Cassandra authors.
Have been operating since Feb 21 hence the problem with light curves (see Matt Nicholl talk) prior to this. This will be solved.
Decided to use Cassandra since over the 10 years there will be 30 billion by the end. Relational DB clunky at that level.
For Cassandra no advantage of SSD over spinning disk so may devote the SSD to the relational DB.

Classification with FINK (Julien Peloton)

Broad overview of classification in FINK
In Fink all alerts are processed by science modules where each module is focussed on a specific science objective.
Modules can be cross-match, filters, or more complex, even ML
Most modules are provided by the community
Alert classification is produced based on the module processing.
Most classifications are matched with known objects or solar system objects.
In June 2021, 45% of ZTF alerts are unclassified ie unknown.
Use the classification process to identify candidates for follow-up
Can of course focus on the unclassified objects instead.
Adapting user code to work in the cloud-based environment often needs to be done by fink team
Trainings sets not often representative, looking at active learning to remedy this.
Combining classification labels is being investigated

Multi-messenger science with AMPEL (Jakob Nordin)

(need to get slides)
Will cover topics discovered during Ampel development cycle and what we considered doing.
Until now reproducibility unnecessary in astronomy due to instrument growth but after LSST not sure if that will still be case.
Most alerts are faint and hence junk
Reusable software has not been a priority in science, particularly due to PhD cycle, but with projects running for 10+ years this can no longer be the case.
Reproducibility hard to do due to the black box (isolated) nature of the various steps in studies/pipelines.
Users create an analysis schema that runs across the four different information tiers in Ampel.
Users prefer a UI but the Ampel approach is analysis(code) focussed
Q Roy to Julien - How good are your classifications? How many false +ve/-ve?
- Julien answer - difficult question to anwser, using TNS since Nov 2020 about 300 candidates have been reported. Half of these were spectroscopically followed-up. More than 80% of these were true SNs. This is a good result but this not the only thing to look at. Now trying to make reliable detection faster. Currently have to wait 7 days but would prefer 3 days
.Jakob asked Dave - Sherlock is very nice, but how do people see maintaining and updating software over next 15 years.
- Dave noted default algorithm now sits with code (individual users can adjust, but default is version controlled). When you run Sherlock, the version of the classifier is stored for reproducability.
- Jakob asked whether Dave would be about to update it
- Dave hoped so, but believed someone could be trained up to manage it in around two weeks, though noting code documentation needs to be improved.
Dave asked Julein if Fink classifdier is based on light curve data alone.
- Yes, for Type 1a Supernovae.
- Dave concerned that Type 1a SNe cannot be classified with just two or three data points
- Julien agrees, but believes representative training set can help.
In chat, Q to Ken from AndyL, “if Cassandra is widely used in industry, how come we don’t know how to use it? Is it because in Industry its used internally, not by independent users?”
Andy asked Ken about Cassandra and noted concern that not sure how users will interface with Cassandra. Can we learn from Facebook use?
- Matter of choice of technology, and past experience of relational databases. Use of Cassandra has been learned from scratch.. Facebook, etc. also use MySQL, etc. Ken has been sampling community use of databases, to see how well supported Cassandra might be. Cassandra just means outsdie of database. Could be file system, or something else.
Andy believes need to consider user requirements in more detail. Ken asked whether Dave had a sense of how many objects could classify per second (the Q in Zoom chat “ do you have a feel about how may transients Sherlock can classify per second?”)
- Dave noted recent speed tests, which achieved ~10k per second.
In zoom chat Q Eric to all : “most of the examples we’re discussing here are the explosive transients. Is that reflecting the actual or desired user communities of Lasair, FINK, Ampel? What are the ambitions ( or not) to support variables/AGN/solar system science?”
Eric noted focus on extragalactic transients, which is a mature field, but asked whether there was any ambition/ interest in supporting solar system, variable star, or AGN communities?
- Meg noted that this is her interest in Lasair, and ability to exploit Cassandra for Solar System alerts. Ambition to create Sherlock peer to classify solar-system alerts, and intent to pursue funding for this.
- AndyL notes same issue for variablew stars, but believes Lasair should be focused on Transents. Mistake to try to reverse engineer for other applications. Integration with DRs and RSP should allow users to undertake variability science, along side Lasair-focused transients. This is an unsolved problem.
- Sara stated in Zoom chat “There is interest from the White Dwarf community in using Lasair to look for drop outs/eclipses in WD lightcurves.
- Roy noted diffretence magnitudes in ZTF make it difficult to compute proper variables.
- Eric noted Rubin would provide more rigourous forced photometry, which would help.
- Meg noted less urgent need for turn-around for solar-system science. Next night is good enough.
Answer from Ampel side from Jakob in Zoom chat - “Our current user base mainly involve extragalatic greoups, so AGN modelling is include there. So it is not that we do not want to do it, but we do not have a lot of feedback from eg solar system groups regarding their LSST needs. would be happy to get that, though.”
Stephen noted intent to help solar-system find objects in Lasair, but then link that to the DR olight curve in Rubin, as useful.
JulienP Q in Zoom chat : “ For Fink: the LSST-FR community is, for historical reason highly focussed on explosive transients (incl. SN and MMA). But things change. We now support more and more the SSO science, with a large group of experts joining the team. However, we clearly lack of variable star experts (although we would love working on this as they seem to make most of the stream).”

Session-5: User Interface (chair: Dave Young)
Queries, streaming, mining (Roy Williams)

Will cover how the query builder, streaming system, and the light curve mining all work.
Lasair query builder uses very SQL-like syntax
Actions on query depend upon whether or not you are logged in and if it belongs to you or is public.
Checker confirms if the likely run-time will exceed limits
You can ask the Lasair team to promote your query to be public on he Lasair site
Email alert for the query is limited to one per 24 hours.
Soon we will provide annotation capability and if liked can be enabled to allow annotations to be pushed back into Lasair.
Mining light curves system constituent parts will all run on the same cloud.
Julien Q - how do you avoid a SQL injection through the web interface? (Julien Q in Zoom chat : “what about SQL code injection? any internal LIMIT to avoid running full data retrieval? I’m always afraid to put a query builder public and unsupervised …”)
- Roy answer - suggests Julien try and hack and report back. Also there is a validation system that checks for keywords.
- DaveY also answers - DB is also read-only
- Roy anwser in Zoom chat - “There is a LIMIT 1000 added to all queries, and an execution timeout.”
Jakob Q - will you be able to make queries based on other people's annotations?
- Roy answer - yes, but there will be delay.
- Jakob Q - Is there an internal log system on when queries were run?
- Roy answer - currently tracking all queries being run but not sure what to do with that information.

Web interface (Andy Lawrence)

Lasair philosphy is to keep it simple. Focus on functionality rather than appearance.
Beginning to define final user interface for start of operations.
Keen to hear comments and guidance from participants.
Functionality
- canned queries – e.g., cone search
- query builder – custom queries
- documentation - user guidance examples on website and a cook book on LSST:UK wiki
- helpdesk - email (also GitHub [tickets], though not for regular users)
- Embedded image viewer/ interactive plotter for visualising rich information that is available.
Open questions
- Is navigation and user flow obvious?
- Should we provide general information for public?
- Should we have more sophisticated documentation?
- Should we aim for more-interactive web pages, forms, etc.?
Sara Q in Zoom chat “I wonder whether some of the information for non-scientists/ interactive stuff could be put together in one of the LSST proposals to the current call”. At NAM there are keen amateurs using Lasair and so UI improvements might be helpful
- Meg - the call is only 30K USD, ~ 3mths of post-doc)
- Roy helps establish a collaboration
- Meg - should you be focussing on Lasair ZTF or is the goal Rubin Lasair? Would the call be better spent elsewhere?
Jacob likes how Lasair information is organised, on a single webpage, very easy to find and scan.
- Eric answer - for own science often go to Lasair initially since it is convenient.
George noted “For documentation, I wondered if we could follow a Rubin lead? We should find out what kinds of documentation Rubin will provide. By mirroring their style and tools we can provide something seamless and familiar.”
- Eric answer - More documentation is good. Rubin has dedicated team for this.

1710 API and notebooks (Ken Smith)

(need slides - messaged him)
Lasair team has introduced REST APIs using Django Rest Framework (looks to be defacto standard)
- See http://lasair-iris.roe.ac.uk/api for initial notes from Roy
Effectively, these are machine-readable versions of functions provided (interactive) on webpages
- /api/cone -
- /api/query
- …
Python wrapper “lasair” (available via PIP install) to help use API.
Plan to add support for querying Cassandra directly
Also have Jupyter Notebook examples, hosted on Google Colab
- Need user account on Lasair to access
- Ken provided live demo of cone-search notebook.
API throttled, based on different levels of token
- Action taken in response to use who was submitting thousands of queries, and putting strain on service (now using more efficient watch-list approach).
  - Anonymous use limit to 10 calls-per-hour, 1000 rows per query, …
Eric B Q on zoom chat - “ how are you handling auth for the public kafka service?”
- Roy answer on zoom chat - “Anyone can read it. There will be a username/password for annotators who write into it. Here is the notebook for reading Kafka https://colab.research.google.com/drive/1sV-JGzzVdZrP86P1tGu-naUQcMSSXAi7?usp=sharing

1720 Tools interface (Andy Lawrence)

Different interfaces – need to prioritise
- Webpage
- Scripts (Python, primarily)
- Other projects (website)
- iDAC/ RSP interface – opportunity in UK, as IDAC is next to Lasair broker, but needs requirement analysis and design work
- Topcat – need TAP
- Personal storage – e.g., MyDB, VOSpace, …
Eric asked how Kafka authentication is being handled.
- Roy noted hope not overwhelmed with requests [for credentials]
Ken noted experience of writing TAP service, which could be useful (Guy Rickson and Thomas Marquat (sp?)).
Jacob asked if annotation is same as a classification
- Roy noted one type of annotation, but other classifications were possible.
Julien asked about usage split between REST and Kafka. (Q in zoom chat “how is the usage split between the REST API and Kafka? as an example in Fink, most of users use the API, and very few Kafka.”)
- Roy noted that people are very conservative and stick to interfaces they know.
- Julien hopes, over time, people will become familiar with new technologies, which will have benefits for users.
- Andy notes absence of user tools is a barrier to uptake of Kafka. Generally speaking Kafka and Cassandra are not targetted to end users: they are behind the science, meaning astronomy is unusual.
- Dave noted size of data to be handled. Kafka can handle a stream which would cause APIs to fall over.
- Ken noted not intended to expose Cassandra to end users. Vision is to have a wraper (web page, for example) to hide Cassandra. PanSTARRS does something similar, for cone-search query.
Jakub asked what Lasair would do if one of Ken, Dave, or Roy moved on.
- Stephen answer - there would be a serious delay to replace staff
- In Zoom chat George B answered “We are in discussions with Software Sustainability Institute (http://www.software.ac.uk ) on good practice to try and ensure we are less affected by staff turn-over. Open development, code reuse, multi-role appointments/ rotations, all help.”
- In Zoom chat Dave Y answered “paired programming has been excellent for developing the SOXS pipeline. 2 developers with shared knowledge of the code. One of us could leave and the project would not lose much knowledge (tacit or explicit).”
Eric Q asked what are the plans in the next cycles regarding scientific integration with Rubin as it starts to come online. What do you need and what can Rubin provide? Data, schemas?
- roy - various services that LSST is goimg to offer. It does not need to be good data since already have the dummy data. Would like to have the Kafka stream so can see how fast to read. It would be good to check the plumbing.
- AndyL answer - simulated streams and commissioning data in order to exercise the system. Some sort of simulacrum of the DRs in the RSP would help test the integration of the broker and DAC in order to do variability science. do need tyest data.
- Eric - all very possible. There are the onging data previews. Is anyine here engaged
- Jakob - was under the impression that this is not for the alert streams
- Eric - correct but useful for understanding the RSP and interfaces, D0.2 will still be simulated data. Will take as action if can get preferential lane for the community brokers to ensure have easy access to that. On alerts, working on production Kafka stream but do not have timings yet. Definitely expect some of what Roy wants in the next year. some of the other services will be a little harder to mock up.
- Jakob - follow-up to simulated alerts. Q to Julien - how to integrate with the data elastic challenge.?
- Julien - yes looking at this. Have not heard back though.
- Jacokn - interested in the elastic to parse it in order to test throughput
- Jlien - agree it is plumbing work
Ken Q - when do Ampel/Fink transition from ZTF alerts? Do you envisage having to run two parallel brokers?
- Roy - not paid to do ZTF
- Ken - but we are not going to stop ZTF during the transition period.
- Jakob - not decided yet. Have designed a system that can run in parallel and ingest both streams but may run separately next to each other
- Julien - simpler since not tied to ZTF. Broker part is survey-agnostic but processing part is different. On DB side, need to investigate more. Process live with translation at the beginning
AndyL - what about maintenance ? this is where the effort goes ? May keep ZTF running but with no promises about maintenance. If important enough for users, then will need the money to do so. Q to Eric - what about the partial transition.
Eric - ZTF current NSF funding for public stream is to end 2023. That is before Rubin full operations in early 2024. Overlap therefore unlikely depends on funding. ZTF unlikely to be thrown away but long tail of maintenance unlikely. On Rubin, it will be a slow ramp-up to full alert volume. even in 1st year of operations.
Roy - perhaps get funding for a multi-survey transient system.
Julien Q to Roy - what is on the screen behind
- Roy - the ZTF transient stream
AndyL - need to digest the findings. Many thanks to the externals.

Terry add link to yjhe meeting page.

1730: Discussion