Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Gareth Williams, SQL Queries on Data Streams

  • KSQL is different to MySQL specifically, but similar level of variations to other SQL variants.

  • Cam notes need for caution in creating MySQL cache

  • Cam asks if users would be creating queries to run on the stream? Is there a limitation on how many of these we can support.

  • Cam would suggest favouring KSQL unless it proves unsuitable. Especially if Kafka is the Data Bus.

  • Dave M believes the web interface approach would allow people to use either, without extra work.

Ken Smith, Making a Super Sherlock

  • Dave M asked if it was worth considering Kafka for transferring RA/ DEC data to Sherlock.

Ken Smith, Cassandra vs. MySQL

  • Citus Data has produced a distributed, relational database based on PostgreSQL. Similar to Qserv.

    • Cam noted that a fair portion of the code is open source.

    • Also suggested Cockroach DB is potentially interesting.

  • Cam A noted that if query needs change, then your data model (in Cassandra) needs to change, and there is a risk it is not easy to change it. This has tended to push people away from NoSQL and back to relational databases.

  • Cam A believes group-key indexing should be possible in relational database.

  • Dave Y clarified that the Partition Key was first to be put into group key. Is that a concern, given that telescope scanning across the sky would typically lead to imbalance in load on database for cross-matching.

  • Andy L asked what the problem is that Cassandra is trying to solve:

    • Ken believes intent for Cassandra is to distribute processing across multiple commodity nodes.

    • Ken noted that there is a replication problem, which is not solved.

    • Dave Y believes blob storage could help us tackle the scalability issues we have with MySQL.

    • Ken suggests we could store light curves in Cassandra, using Object ID as the primary key.

Gareth F, Storage Technologies

  • George asked if the performance issue with overwriting a file was a problem, given we have a write-once, ready many workload?

    • Gareth wasn’t sure. Might need to modify lightcurves, though they were large, so overhead was less.

    • Nigel asked if there was a risk of transferring the same information multiple times, for continuously varying objects that alerted each time.

    • Stephen agreed this could be the case.

    • Nigel wondered if it would be possible to edit out the repeated data.

    • Gareth was concerned that de-duplicating that data created an implicit serialisation, as found for ZTF.

    • Ken clarified whether this was the case, given that subsequent detections only contained the 30-day forced photometry, so may not be such a big deal.