Cone Searching & HTM
...
Partition Keys and Clustering Keys
Cassandra will now NOT allow you to query any old column with any constraint. Unlike databases, you have to build the column families (tables) with the query in mind, otherwise you can’t easily pull out the information you need. Let’s take a look at a row of the Gaia DR2 table. The unique identifier column is “source_id”. But this doesn’t mean much in terms of organising the data.
...
Decimal | Binary | Base4 | |
---|---|---|---|
HTM16 | 54680902005 | 110010111011001111000101100101110101 | N02323033011211311 |
HTM13 | 854389093 | 110010111011001111000101100101 | N02323033011211 |
HTM10 | 13349829 | 110010111011001111000101 | N02323033011 |
...
I’m certain these numbers will massively improve if we build a properly distributed Cassandra system. (To be done.) I’ve seen various statements online indicating that (e.g.) a 15 node cluster can cope with up to 120,000 inserts per second.
Cassandra and Lightcurves
Alternatives to spatial indexing could be to store the lightcurve in Cassandra and index by objectID and candidateID. This is actually easier to implement in Cassandra, but the caveats of how the data is queried still apply.
ObjectID | CandidateID | ra | decl | magpsf | fid |
---|---|---|---|---|---|
ZTF20aauwhfa | 1200382622615015029 | 218.1603106 | 31.6709366 | 18.3553 | 1 |
Conclusions & Further Work
...