Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Cone Searching & HTM

Many of the queries we want to do are related to the position of objects in the sky. We use HTM - Hierarchical Triangular Mesh - to index the sky and utilise the API to augment our SQL queries. In this document I will attempt to use HTM with Cassandra. This may of course be a bad match of technologies (see issues below), and there may be other spatial indexing technologies that tie in better with Cassandra.

...

Decimal

Binary

Base4

HTM16

54680902005

110010111011001111000101100101110101

N02323033011211311

HTM13

854389093

110010111011001111000101100101

N02323033011211

HTM10

13349829

110010111011001111000101

N02323033011

It turns out that in Cassandra, you can have multiple clustering columns provided the order is preserved when doing a query. In the cassandra query language (CQL) it is possible to have more than one column in an IN query.

In Cassandra, we can reorganise the above table, using elements of the (e.g.) base 4 HTM.

E.g.

HTM10

HTM13

HTM16

source_id

ra

dec

phot_g_mean_mag

N02323033011

211

311

4309278608071975424

288.70392

9.99498

17.5

Now, we can query using the HTM10 field on its own (the partition key), the HTM10 and HTM13 fields (= HTM level 13) and the HTM10 and HTM13 and HTM16 fields (= HTM level 16). (And the full set of fields + the source_id.)

E.g.

HTM10 query

Code Block
select * from tcs_cat_gaia_dr2 where htm10 IN ('S23023222101','S23023222132');

HTM13 query

Code Block
select * from tcs_cat_gaia_dr2 where htm10 IN ('S23023222101','S23023222132')
AND (htm13) IN
('000','001','003','000','002','003');

HTM16 query

Code Block
select * from tcs_cat_gaia_dr2 where htm10 IN ('S23023222101','S23023222132')
AND (htm13,htm16) IN
(('000','100'),('001','200'),
('003','200'),('000','200'),
('002','100'),('003','100'));

Cassandra Loading and Querying Data

...

I’m certain these numbers will massively improve if we build a properly distributed Cassandra system. To be done.

Conclusions & Further Work

Cassandra CAN be used for data loading and cone searching using the mechanisms above. Additionally the HTM interface also allows for other types of spatial searches - not just cones. All we need are the triangle IDs.

Primary keys need to be chosen very carefully!

Extraction of data based on ranges of other columns require that the data be copied into separate tables and primary keys chosen accordingly.

This should be deployed as a group of machines into a properly distributed infrastructure (e.g. a group of openstack machines) to test both loading speed AND query speed.