Spark, Cassandra and Python

In this post we touch briefly on Apache Spark as a cluster computing framework that supports a number of drivers to pipe data in, and that its stunning performance thanks much to resilient distributed dataset (RDD) as its architectural foundation. In this hands-on guide, we expand on how to configure Spark, and use Python to … Read moreSpark, Cassandra and Python

DataStax Python Driver

For someone with relational database background, analyzing data in Cassandra isn’t intuitive. There are two reasons. First, Cassandra data table is hardly updated or deleted in avoidance of tombstones. Insertion is the only action on the table resulting in multiple versions of each record all stored in the same table, thus a much longer table … Read moreDataStax Python Driver

Cassandra data model (as opposed to relational model)

Bad data model design with Cassandra causes chronic pains as application scales. I had to re-read about data model design in “Cassandra – the Definitive Guide” and keep my notes and thoughts in this post. The data modelling in the relational world is indoctrinated to every students out of university. It embraces several things: Entity-Relation: … Read moreCassandra data model (as opposed to relational model)

Storage Nitty-Gritty 5 of 5 – Replication

Replication Terms PIT (point in time) replica – snapshot of the source at some specific timestamp;Continuous Replica – always in-sync with the production data;Recoverability – enables restoration of data from the replica to the source if data loss or corruption occurs;Restartability – enables restarting business operations using the replicas; Local Replication Use Case: Alternative source … Read moreStorage Nitty-Gritty 5 of 5 – Replication

Relational Database Normalization

If database migration involves a schema redesign, we need to follow certain forms. I have good instinct what these forms are. However, I got some questions are particular normal forms and I realized that I cannot spell them all out clearly, except that I learned it all in university (10+ years ago). I decided to … Read moreRelational Database Normalization

High level steps for database migration

Database migration may involve many different types, such as hardware platform migration, one technology to another (relational to relational, relational to NoSQL), on-premise to cloud, etc. In this posting, I outline some key steps at high level to a successful database migration, assuming the migration involves changing DBMS platform. Assessment and discovery – this is … Read moreHigh level steps for database migration

Cassandra Architecture Summary

Disclaimer: many contents here are from Cassandra The Definitive Guide Gossip and Failure Detection Cassandra uses a gossip protocol that allows each node to keep track of state information about the other nodes in the cluster. The gossiper runs every second on a timer. Gossip protocols assumes a faulty network, are commonly commonly employed in … Read moreCassandra Architecture Summary