Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Attendees: Andy Lawrence, Roy Williams, Terry Sloan, Cosimo Inserra, Stephen Smartt, Sara Casewell, Dave Young, Micheal Fulton, Julien Peloton, Ken Smith, Jakob Nordin, Gareth Francis, Matt Nicholl, Eric C Bellm, Stelios Voutsinas, Meg Schwamb, George Beckett, Nic Wolf,

Apologies:

Meeting public page

...

  • Eric Q to Michael “can you say more about how you are fitting/estimating the light curves features (peak time, rise and fall times)”.

    • A. Rise time is the what was peak magnitude, what was the starting magnitude and what was the time between the two. Polynomial fit applied. This depends on the number of points available.

  • Julien P Q to Michael - Are predictions done using the aggregated light-curve or the last alert information ? What about the evolution of score, do you report the full history of classification or just the last score? Julien asked if prediction is based on light curve

    • A. Prediction is based on position in parameter space of candidate. This is done for all features and results normalised to create score.

    • A Lasair will only store the most recent classification from FastFinder, but hope to have option to click through to history of FastFinder classiificatons.

  • Cosimo asked whether FastFinder needs information from different bands to be on same day or in 3--4 day timeframe?

    • Features are extracted as they become available, and information is derived as is possible based on available features. So, need to advise how many features have been used to create classification, as indication of relative reliability of classification.

  • Jacob noted using absolute magniture, which relies on good redshift. how sensitive is classifier. Also why not run on ZTF?

    • Redshifts have been measured from Sherlock, so errors are attributable to Sherlock. If host nearby does not have redshift then object is ignored. Risk is minimal if have secure association with galaxy of known redshift. Likely dominated by errors in photometry (if spectroscopic). If not, then photometry is likey source of error, so make assumption know distance and filter on that. Idea is to filter down to small number of objects to follow up (quickly) for further follow-up. Michael noted absolute magnitudes do contain redshift error bars, so FastFinder can consider that as part of prediction.

    • FastFinder is being run on ZTF.

  • Andy L noted that this is important as resolves argument of whether or not to provide lightcurve classifier within main Lasair service. Michael has provided an alternative method and, by incoprprating into Lasair, means is easily available. Question about how widely to make this option available.

  • Andy asked if scores are just a ranking method or do they have quantitative meaning (e.g. probabilities).

    • Michael noted not true probabilities but derived from probability fiunction of template space. Scores add up to 100, so could be pictured as percentages, but they are not. Michael plans to improve scoring so quantitatively meaningful.

    • Andy believes this should be straightforward to do.

  • Cosimo also asked is there plan to increase the database [of templates] to deal with new discoveries?

  • Cosimo Q on chat “Do you need a data point in each band or not? What are the plans to increase the data templates.”

    • StephenS answered on chat . “one band is enough to get a score. The date templates are the most work, and yes we do plan to increase templates “

...

Session-4 Science processing and functionality (chair: Meg Schwamb)
Adding value: Sherlock (Dave Young)

  • overview of sherlock and how it works with Lasair

  • Purpose of Lasair annotations

  • Sherlock attempts to predict nature of an object based on cross-matches found

  • Sherlock is survey agnostic but the algorithm can be tweaked for specific surveys

  • Code is Python but modularised so the algorithm can be written in plain text

  • After intelligent cross-matching, Sherlock merges the associations. Ranking algorithm attempts to identify the best association.

  • Ultimately the resulting prediction is only as good as the underlying data. Expect LSST data therefore to improve many situations.

  • The better and the greater the number of annotations the more powerful the algorithms.

  • JulienP asked “What would be the cost of changing the contextual classification? Is this something envisaged in the future?

    • Dave Y noted that could build in sub-classes, though would not recommend changing top-level classifiers. E.g., could try to specify supernova type. Also scope to make classification more fine-grained.

    • Julien follow-on Q - What will happen to the past data?

      • Meg - let’s answer this later.

Annotation, features (Roy Williams)

  • DMTN-118 says what Rubin is providing.

  • In addition to ZTF, have added eg different ways of looking at the position, timings, first detection, …. Not interested in periodic data.

  • Difficult to change the set of features since it would need rebuilt in the relational DB. What can be added to the LSST list?

  • There are potential Sherlock attributes.

  • External annotators: From lasair, a query pushes out a Kafka stream to the external annotator of candidates where the annotation could be run on. Results are sent back from the external annotator and taken into Lasair.

Databases and storage : SQL, Cassandra, CephFS (Ken Smith)

  • Nic Wolf (Antares team) joined the session to accompany Ken.

  • Galera/MariaDB/ Cassandra in Lasair (Need slides from Ken - messaged him)

  • DB dump is not sustainable in the future due to sheer number of rows etc. Galera offers replication where all nodes are equal. Reading/writing tasks can be distributed to the different nodes.

  • Galera well integrated with MariaDB but not so well with MySQL hence the MariaDB choice.

  • Many of the Galera tools are free including the cluster control interface.

  • Detections originally stored as files on CephFS but worried this will not scale.

  • Cassandra writes better than reads. Widely adopted. Other brokers use it eg. Antares and core LSST (see DMTN-184).

  • NoSQL - not only SQL according to Cassandra authors.

  • Have been operating since Feb 21 hence the problem with light curves (see Matt Nicholl talk) prior to this. This will be solved.

  • Decided to use Cassandra since over the 10 years there will be 30 billion by the end. Relational DB clunky at that level.

  • For Cassandra no advantage of SSD over spinning disk so may devote the SSD to the relational DB.


Classification with FINK (Julien Peloton)

  • Broad overview of classification in FINK

  • In Fink all alerts are processed by science modules where each module is focussed on a specific science objective.

  • Modules can be cross-match, filters, or more complex, even ML

  • Most modules are provided by the community

  • Alert classification is produced based on the module processing.

  • Most classifications are matched with known objects or solar system objects.

  • In June 2021, 45% of ZTF alerts are unclassified ie unknown.

  • Use the classification process to identify candidates for follow-up

  • Can of course focus on the unclassified objects instead.

  • Adapting user code to work in the cloud-based environment often needs to be done by fink team

  • Trainings sets not often representative, looking at active learning to remedy this.

  • Combining classification labels is being investigated

Multi-messenger science with AMPEL (Jakob Nordin)

  • (need to get slides)

  • Will cover topics discovered during Ampel development cycle and what we considered doing.

  • Until now reproducibility unnecessary in astronomy due to instrument growth but after LSST not sure if that will still be case.

  • Most alerts are faint and hence junk

  • Reusable software has not been a priority in science, particularly due to PhD cycle, but with projects running for 10+ years this can no longer be the case.

  • Reproducibility hard to do due to the black box (isolated) nature of the various steps in studies/pipelines.

  • Users create an analysis schema that runs across the four different information tiers in Ampel.

  • Users prefer a UI but the Ampel approach is analysis(code) focussed

  • Q Roy to Julien - How good are your classifications? How many false +ve/-ve?

    • Julien answer - difficult question to anwser, using TNS since Nov 2020 about 300 candidates have been reported. Half of these were spectroscopically followed-up. More than 80% of these were true SNs. This is a good result but this not the only thing to look at. Now trying to make reliable detection faster. Currently have to wait 7 days but would prefer 3 days

  • .Jakob asked Dave - Sherlock is very nice, but how do people see maintaining and updating software over next 15 years.

    • Dave noted default algorithm now sits with code (individual users can adjust, but default is version controlled). When you run Sherlock, the version of the classifier is stored for reproducability.

    • Jakob asked whether Dave would be about to update it

    • Dave hoped so, but believed someone could be trained up to manage it in around two weeks, though noting code documentation needs to be improved.

  • Dave asked Julein if Fink classifdier is based on light curve data alone.

    • Yes, for Type 1a Supernovae.

    • Dave concerned that Type 1a SNe cannot be classified with just two or three data points

    • Julien agrees, but believes representative training set can help.

  • Andy asked Ken about Cassandra and noted concern that not sure how users

  • Q to Ken from AndyL, “if Cassandra is widely used in industry, how come we don’t know how to use it? Is it because in Industry its used internally, not by independent users?”

1620: Discussion