Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In Cassandra, replication is built in, and the replication factor determines how many other nodes the data is copied to. In the following example, objects with partition key hash of 10 would be stored in Node 1. If the replication factor is 3, then the data is replicated (clockwise) in to Nodes 2 and 3. Likewise, an object with a key value hash of 83 would be stored in Node 4, and replicated to Nodes 1 and 2.

...

My installation of Cassandra is just a single Node onto my laptop. However, it seems surprisingly robust, if not exactly quick. I’m still figuring out the way to insert and read data multithreaded, so for the time being all access is single threaded. I’m sure times (below) can be improved. To do - test on a distributed system.

Partition Keys and Clustering Keys

Cassandra will now allow you to query any old column with any constraint. Unlike databases, you have to build the column families (tables) with the query in mind, otherwise you can’t easily pull out the information you need. Let’s take a look at Gaia DR2 table. The unique identifier column is “source_id”. But this doesn’t mean much in terms of organising the data.

source_id

ra

dec

HTM16