...
Gareth Williams, SQL Queries on Data Streams
KSQL is different to MySQL specifically, but similar level of variations to other SQL variants.
Cam notes need for caution in creating MySQL cache
Cam asks if users would be creating queries to run on the stream? Is there a limitation on how many of these we can support.
Cam would suggest favouring KSQL unless it proves unsuitable. Especially if Kafka is the Data Bus.
Dave M believes the web interface approach would allow people to use either, without extra work.
Ken Smith, Making a Super Sherlock
Dave M asked if it was worth considering Kafka for transferring RA/ DEC data to Sherlock.
Ken Smith, Cassandra vs. MySQL
Citus Data has produced a distributed, relational database based on PostgreSQL. Similar to Qserv.
Cam noted that a fair portion of the code is open source.
Also suggested Cockroach DB is potentially interesting.
Cam A noted that if query needs change, then your data model (in Cassandra) needs to change, and there is a risk it is not easy to change it. This has tended to push people away from NoSQL and back to relational databases.
Cam A believes group-key indexing should be possible in relational database.
Dave Y clarified that the Partition Key was first to be put into group key. Is that a concern, given that telescope scanning across the sky would typically lead to imbalance in load on database for cross-matching.
Andy L asked what the problem is that Cassandra is trying to solve:
Ken believes intent for Cassandra is to distribute processing across multiple commodity nodes.
Ken noted that there is a replication problem, which is not solved.
Dave Y believes blob storage could help us tackle the scalability issues we have with MySQL.
Ken suggests we could store light curves in Cassandra, using Object ID as the primary key.
Gareth F, Storage Technologies
George asked if the performance issue with overwriting a file was a problem, given we have a write-once, ready many workload?
Gareth wasn’t sure. Might need to modify lightcurves, though they were large, so overhead was less.
Nigel asked if there was a risk of transferring the same information multiple times, for continuously varying objects that alerted each time.
Stephen agreed this could be the case.
Nigel wondered if it would be possible to edit out the repeated data.
Gareth was concerned that de-duplicating that data created an implicit serialisation, as found for ZTF.
Ken clarified whether this was the case, given that subsequent detections only contained the 30-day forced photometry, so may not be such a big deal.