Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Attendees: Andy Lawrence, Roy Williams, Terry Sloan, Cosimo Inserra, Stephen Smartt, Sara Casewell, Dave Young, Micheal Fulton, Julien Peloton, Ken Smith, Jakob Nordin, Gareth Francis, Matt Nicholl, Eric C Bellm, Stelios Voutsinas, Meg Schwamb, George Beckett, Nic Wolf ,

Apologies:

Meeting public page

...

  • Today’s meeting primarily concerned with the revised 4 year plan for the remainder of phase B.

  • Issues from today will noted and fed into the plan.

  • Overall LSST:UK phases have not changed relative to the Rubin schedule.

  • Current Cycle 4 of the Lasair plan is taking stock prior to build.

  • Key change - original plan was ZTF prototype would be frozen at V2 however further ZTF development has continued.

  • Overall Lasair architecture now believed to be set.

  • A number of key technology changes including assuming that users will be using Google Colab.

1325: Discussion

  • Cosimo Q - for Stephen regarding spectroscopic addition to Lasair, confused about the various projects. 4most tides does not do targetted opportunities. So it might be 20 to 100 days after the LSST discovery. Can understand how SOXS can be contribute but not so much 4MOST. So how much can these spectra actually contribute. What spectra information will be available and in what format?

    • StephenS A - correct that 4MOST pointing cannot be controlled but it is likely to follow the LSST footprints. It is possible to control the fibre pointing within the fields of 4MOST. Anything brighter than magnitude 22 in the field of 4MOST will have a fibre put on it. Spectra of 20K to 30K transients plus 30K host galaxies are expected but cannot control exactly when the spectra will be collected. The challenge is to get objects out of the alert stream and onto the the 4MOST schedule. If fibre can be put on them then when an object is 4MOST observed, then the LSST:UK UK WP3.3 will recognise that the fibre is in place, make estimates and the information will then go back to Lasair as a annotation. All of the data will be public in ESO archives. We will get information back asap into the broker.

  • Cosimo Q - do you plan the same same thing for SOXS?

    • Stephen S A - yes, there will be some link from Lasair.

  • Jakob Q - How well defined is the UK programme . Have the spectra discussed what they require in terms of triggering from LSST? To what degree has the UK:LSST community specified their LSST plans, such that these can be used as Lasair requirements.

    • Stephen S A- yes these discussions are happening at the conceptual level. The triggers are classified as supernovae by sherlock (not an agn, not a variable star…). Essentially if bright enough for 4MOST to take a spectra then put a fibre on it.

  • Jakob Q - sounds like UK will only do supernvoa?

    • Stephen A - actually mean extragalactic transients

  • Question from JulienP - Funding context was highlighted, but the focus was on PI/developers. What about the hardware funding status and plan ?

    • Stephen S answer - This will be discussed in afternoon. Top level funding is secured. Lasair is a small part of the UK computing. LSSTUK is a small part of UK IRIS. So overall Lasair are not the dominant partner and hence we believe Lasair is safe. Andy agrees and explained that the plan for UK expected requirements is being put in to UKRI. This is for the 10 years operation and includes Lasair.

...

Light curve classification (Michael Fulton)

  • Building photetric photometric classifier to be integrated into Lasair. Transient detection rate far exceeds rate of classificatoinclassification. Will get worse with LSST.

  • Plan to run FastFinder on LSST and highlight potential transients very quickly.

  • ML techniques work better with well-sampled light curves, but often too late for real science. Need to prioritise speed, while not losing reliability, so have chosen parameter fitting (feature selection comparison to parameter space of fast transients).

  • Template Datastore for lots of different fast transients, from LSST, Pan-STARRS, and …

  • Features including rise time, peak brightness, etc. extracted and used for classification step.

  • FastFinder produces list of scores for a candidate, based on different features. high-scoring features are favoured.

  • Approach is simple, but producing promising results, based on sampling with kilonova class transients, even when peak brightness not observed.

    • FastFinder in use with PANSTARRS. Found 2021qvw, for example.

    • FastFinder will act as an external annotator (see previous talk from Roy).

    • Runs independent of Lasair.

  • FastFinder outputs could be included in Lasair website, in dedicated section for external annotations.

...

  • Eric Q to Michael “can you say more about how you are fitting/estimating the light curves features (peak time, rise and fall times)”.

    • A. Rise time is the what was peak magnitude, what was the starting magnitude and what was the time between the two. Polynomial fit applied. This depends on the number of points available.

  • Julien P Q to Michael - Are predictions done using the aggregated light-curve or the last alert information ? What about the evolution of score, do you report the full history of classification or just the last score? Julien asked if prediction is based on light curve

    • A. Prediction is based on position in parameter space of candidate. This is done for all features and results normalised to create score.

    • A Lasair will only store the most recent classification from FastFinder, but hope to have option to click through to history of FastFinder classiificatonsclassificatons.

  • Cosimo asked whether FastFinder needs information from different bands to be on same day or in 3--4 day timeframe?

    • Features are extracted as they become available, and information is derived as is possible based on available features. So, need to advise how many features have been used to create classification, as indication of relative reliability of classification.

  • Jacob noted using absolute magnituremagnitude, which relies on good redshift. how sensitive is classifier. Also why not run on ZTF?

    • Redshifts have been measured from Sherlock, so errors are attributable to Sherlock. If host nearby does not have redshift then object is ignored. Risk is minimal if have secure association with galaxy of known redshift. Likely dominated by errors in photometry (if spectroscopic). If not, then photometry is likey source of error, so make assumption know distance and filter on that. Idea is to filter down to small number of objects to follow up (quickly) for further follow-up. Michael noted absolute magnitudes do contain redshift error bars, so FastFinder can consider that as part of prediction.

    • FastFinder is being run on ZTF.

  • Andy L noted that this is important as resolves argument of whether or not to provide lightcurve classifier within main Lasair service. Michael has provided an alternative method and, by incoprprating into Lasair, means is easily available. Question about how widely to make this option available.

  • Andy asked if scores are just a ranking method or do they have quantitative meaning (e.g. probabilities).

    • Michael noted not true probabilities but derived from probability fiunction of template space. Scores add up to 100, so could be pictured as percentages, but they are not. Michael plans to improve scoring so quantitatively meaningful.

    • Andy believes this should be straightforward to do.

  • Cosimo also asked is there plan to increase the database [of templates] to deal with new discoveries?

  • Cosimo Q on chat “Do you need a data point in each band or not? What are the plans to increase the data templates.”

    • StephenS answered on chat . “one band is enough to get a score. The date templates are the most work, and yes we do plan to increase templates “

...

  • (need to get slides)

  • Will cover topics discovered during Ampel development cycle and what we considered doing.

  • Until now reproducibility unnecessary in astronomy due to instrument growth but after LSST not sure if that will still be case.

  • Most alerts are faint and hence junk

  • Reusable software has not been a priority in science, particularly due to PhD cycle, but with projects running for 10+ years this can no longer be the case.

  • Reproducibility hard to do due to the black box (isolated) nature of the various steps in studies/pipelines.

  • Users create an analysis schema that runs across the four different information tiers in Ampel.

  • Users prefer a UI but the Ampel approach is analysis(code) focussed

  • Q Roy to Julien - How good are your classifications? How many false +ve/-ve?

    • Julien answer - difficult question to anwseranswer, using TNS since Nov 2020 about 300 candidates have been reported. Half of these were spectroscopically followed-up. More than 80% of these were true SNs. This is a good result but this not the only thing to look at. Now trying to make reliable detection faster. Currently have to wait 7 days but would prefer 3 days

  • .Jakob asked Dave - Sherlock is very nice, but how do people see maintaining and updating software over next 15 years.

    • Dave noted default algorithm now sits with code (individual users can adjust, but default is version controlled). When you run Sherlock, the version of the classifier is stored for reproducability.

    • Jakob asked whether Dave would be about to update it

    • Dave hoped so, but believed someone could be trained up to manage it in around two weeks, though noting code documentation needs to be improved.

  • Dave asked Julein if Fink classifdier classifier is based on light curve data alone.

    • Yes, for Type 1a Supernovae.

    • Dave concerned that Type 1a SNe cannot be classified with just two or three data points

    • Julien agrees, but believes representative training set can help.

  • In chat, Q to Ken from AndyL, “if Cassandra is widely used in industry, how come we don’t know how to use it? Is it because in Industry its used internally, not by independent users?”

  • Andy asked Ken about Cassandra and noted concern that not sure how users will interface with Cassandra. Can we learn from Facebook use?

    • Matter of choice of technology, and past experience of relational databases. Use of Cassandra has been learned from scratch.. Facebook, etc. also use MySQL, etc. Ken has been sampling community use of databases, to see how well supported Cassandra might be. Cassandra just means outsdie outside of database. Could be file system, or something else.

  • Andy believes need to consider user requirements in more detail. Ken asked whether Dave had a sense of how many objects could classify per second (the Q in Zoom chat “ do you have a feel about how may transients Sherlock can classify per second?”)

    • Dave noted recent speed tests, which achieved ~10k per second.

  • In zoom chat Q Eric to all : “most of the examples we’re discussing here are the explosive transients. Is that reflecting the actual or desired user communities of Lasair, FINK, Ampel? What are the ambitions ( or not) to support variables/AGN/solar system science?”

  • Eric noted focus on extragalactic transients, which is a mature field, but asked whether there was any ambition/ interest in supporting solar system, variable star, or AGN communities?

    • Meg noted that this is her interest in Lasair, and ability to exploit Cassandra for Solar System alerts. Ambition to create Sherlock peer to classify solar-system alerts, and intent to pursue funding for this.

    • AndyL notes same issue for variablew variable stars, but believes Lasair should be focused on TransentsTransients. Mistake to try to reverse engineer for other applications. Integration with DRs and RSP should allow users to undertake variability science, along side Lasair-focused transients. This is an unsolved problem.

    • Sara stated in Zoom chat “There is interest from the White Dwarf community in using Lasair to look for drop outs/eclipses in WD lightcurves.

    • Roy noted diffretence difference magnitudes in ZTF make it difficult to compute proper variables.

    • Eric noted Rubin would provide more rigourous rigorous forced photometry, which would help.

    • Meg noted less urgent need for turn-around for solar-system science. Next night is good enough.

  • Answer from Ampel side from Jakob in Zoom chat - “Our current user base mainly involve extragalatic greoupsextragalactic groups, so AGN modelling is include there. So it is not that we do not want to do it, but we do not have a lot of feedback from eg solar system groups regarding their LSST needs. would be happy to get that, though.”

  • Stephen noted intent to help solar-system find objects in Lasair, but then link that to the DR olight light curve in Rubin, as useful.

  • JulienP Q in Zoom chat : “ For Fink: the LSST-FR community is, for historical reason highly focussed on explosive transients (incl. SN and MMA). But things change. We now support more and more the SSO science, with a large group of experts joining the team. However, we clearly lack of variable star experts (although we would love working on this as they seem to make most of the stream).”

...

  • Will cover how the query builder, streaming system, and the light curve mining all work.

  • Lasair query builder uses very SQL-like syntax

  • Actions on query depend upon whether or not you are logged in and if it belongs to you or is public.

  • Checker confirms if the likely run-time will exceed limits

  • You can ask the Lasair team to promote your query to be public on he the Lasair site

  • Email alert for the query is limited to one per 24 hours.

  • Soon we will provide annotation capability and if liked can be enabled to allow annotations to be pushed back into Lasair.

  • Mining light curves system constituent parts will all run on the same cloud.

  • Julien Q - how do you avoid a SQL injection through the web interface? (Julien Q in Zoom chat : “what about SQL code injection? any internal LIMIT to avoid running full data retrieval? I’m always afraid to put a query builder public and unsupervised …”)

    • Roy answer - suggests Julien try and hack and report back. Also there is a validation system that checks for keywords.

    • DaveY also answers - DB is also read-only

    • Roy anwser answer in Zoom chat - “There is a LIMIT 1000 added to all queries, and an execution timeout.”

  • Jakob Q - will you be able to make queries based on other people's annotations?

    • Roy answer - yes, but there will be delay.

    • Jakob Q - Is there an internal log system on when queries were run?

    • Roy answer - currently tracking all queries being run but not sure what to do with that information.

...

  • Lasair philosphy is to keep it simple. Focus on functionality rather than appearance.

  • Beginning to define final user interface for start of operations.

  • Keen to hear comments and guidance from participants.

  • Functionality

    • canned queries – e.g., cone search

    • query builder – custom queries

    • documentation - user guidance examples on website and a cook book on LSST:UK wiki

    • helpdesk - email (also GitHub [tickets], though not for regular users)

    • Embedded image viewer/ interactive plotter for visualising rich information that is available.

  • Open questions

    • Is navigation and user flow obvious?

    • Should we provide general information for public?

    • Should we have more sophisticated documentation?

    • Should we aim for more-interactive web pages, forms, etc.?

  • Sara Q in Zoom chat “I wonder whether some of the information for non-scientists/ interactive stuff could be put together in one of the LSST proposals to the current call”. At NAM there are keen amateurs using Lasair and so UI improvements might be helpful

    • Meg - said the call is only 30K USD, ~ 3mths of post-doc).

    • Roy helps said this can help establish a collaboration

    • Meg - answered should you Lasair be focussing on Lasair ZTF or is the goal Rubin Lasair? Would the call be better spent elsewhere?

  • Jacob likes how Lasair information is organised, on a single webpage, very easy to find and scan.

    • Eric answer - for own science often go to Lasair initially since it is convenient.

  • George noted “For documentation, I wondered if we could follow a Rubin lead? We should find out what kinds of documentation Rubin will provide. By mirroring their style and tools we can provide something seamless and familiar.”

    • Eric answer - More documentation is good. Rubin has dedicated team for this.


1710 API and notebooks (Ken Smith)

  • (need slides - messaged him)

  • Lasair team has introduced REST APIs using Django Rest Framework (looks to be defacto standard)

  • Effectively, these are machine-readable versions of functions provided (interactive) on webpages

    • /api/cone -

    • /api/query

  • Python wrapper “lasair” (available via PIP install) to help use API.

  • Plan to add support for querying Cassandra directly

  • Also have Jupyter Notebook examples, hosted on Google Colab

    • Need user account on Lasair to access

    • Ken provided live demo of cone-search notebook.

  • API throttled, based on different levels of token

    • Action taken in response to use who was submitting thousands of queries, and putting strain on service (now using more efficient watch-list approach).

      • Anonymous use limit to 10 calls-per-hour, 1000 rows per query, …

  • Eric B Q on zoom chat - “ how are you handling auth for the public kafka service?”

1720 Tools interface (Andy Lawrence)

  • Different interfaces – need to prioritise

    • Webpage

    • Scripts (Python, primarily)

    • Other projects (website)

    • iDAC/ RSP interface – opportunity in UK, as IDAC is next to Lasair broker, but needs requirement analysis and design work

    • Topcat – need TAP

    • Personal storage – e.g., MyDB, VOSpace, …

  • Eric asked how Kafka authentication is being handled.

    • Roy noted hope not overwhelmed with requests [for credentials]

  • Ken noted experience of writing TAP service, which could be useful (Guy Rickson and Thomas Marquat (sp?)).

  • Jacob asked if annotation is same as a classification

    • Roy noted one type of annotation, but other classifications were possible.

  • Julien asked about usage split between REST and Kafka. (Q in zoom chat “how is the usage split between the REST API and Kafka? as an example in Fink, most of users use the API, and very few Kafka.”)

    • Roy noted that people are very conservative and stick to interfaces they know.

    • Julien hopes, over time, people will become familiar with new technologies, which will have benefits for users.

    • Andy notes absence of user tools is a barrier to uptake of Kafka. Generally speaking Kafka and Cassandra are not targetted to end users: they are behind the science, meaning astronomy is unusual.

    • Dave noted size of data to be handled. Kafka can handle a stream which would cause APIs to fall over.

    • Ken noted not intended to expose Cassandra to end users. Vision is to have a wraper (web page, for example) to hide Cassandra. PanSTARRS does something similar, for cone-search query.

  • Jakub asked what Lasair would do if one of Ken, Dave, or Roy moved on.

    • Stephen answer - there would be a serious delay to replace staff

    • In Zoom chat George B answered “We are in discussions with Software Sustainability Institute (http://www.software.ac.uk ) on good practice to try and ensure we are less affected by staff turn-over. Open development, code reuse, multi-role appointments/ rotations, all help.”

    • In Zoom chat Dave Y answered “paired programming has been excellent for developing the SOXS pipeline. 2 developers with shared knowledge of the code. One of us could leave and the project would not lose much knowledge (tacit or explicit).”

  • Eric Q asked what are the plans in the next cycles regarding scientific integration with Rubin as it starts to come online. What do you need and what can Rubin provide? Data, schemas?

    • roy Roy answered - various services that LSST is goimg going to offer. It does not need to be good data since already have the dummy data. Would like to have the Kafka stream so can see how fast to read. It would be good to check the plumbing.

    • AndyL answer answered - simulated streams and commissioning data in order to exercise the system. Some sort of simulacrum of the DRs in the RSP would help test the integration of the broker and DAC in order to do variability science. do Do need tyest test data.

    • Eric answered - these are all very possible. There are the onging ongoing data previews. Is anyine anyone here engaged?

  • Jakob said - was under the impression that

    this is

    DPs are not for the alert streams.

    • Eric answered - correct but useful for understanding the RSP and interfaces, D0DP0.2 will still be simulated data. Will [ACTION] Eric - will take as an action to see if can get preferential lane for the community brokers to ensure have easy access to that. On alerts, currently working on production Kafka stream but do not have timings yet. Definitely expect some of what Roy wants in the next year. some of the other services will be a little harder to mock up.

  • Jakob

    - follow-up to

    regarding simulated alerts

    .

    a Q to Julien - how to integrate with the data elastic challenge

    .

    ?

    • Julien answered - yes looking at this. Have not heard back though.

    • Jacokn Jacok said - interested in the elastic to parse parsing it in order to test throughput

    • Jlien Julien said - agree it agreethis is plumbing work

  • Ken Q - when do Ampel/Fink transition from ZTF alerts? Do you envisage having to run two parallel brokers?

    • Roy said - Lasair not paid to do ZTF

    • Ken said - but we are not going to stop ZTF during the transition period.

    • Jakob answered - AMPEL not decided yet. Have designed a system that can run in parallel and ingest both streams but may run separately next to each other

    • Julien -answered - regarding FINK situation is simpler since not tied to ZTF. Broker part is survey-agnostic but processing part is different. On DB side, need to investigate more. Process Currently process data live with translation at the beginning.

  • AndyL Q asked - what about maintenance ? this is where the effort goes ? May keep ZTF running but with no promises about maintenance. If important enough for users, then will need the money to do so. Q to Eric - what about the partial transition.

    • Eric - ZTF current NSF funding for public stream is to end 2023. That is before Rubin full operations in early 2024. Overlap therefore unlikely depends on funding. ZTF unlikely to be thrown away but long tail of maintenance unlikely. On Rubin, it will be a slow ramp-up to full alert volume. even in 1st year of operations.

    • Roy said - perhaps get funding for a multi-survey transient system.

  • Julien Q to Roy - what is on the screen behind

    • Roy - the ZTF transient stream

  • AndyL - great meeting. need to digest the findings. Many thanks to the externals.

[ACTION] Terry add notes link to yjhe the meeting agenda page.1730: Discussion