Spark, Cassandra and Python

In this post we touch briefly on Apache Spark as a cluster computing framework that supports a number of drivers to pipe data in, and that its stunning performance thanks much to resilient distributed dataset (RDD) as its architectural foundation. In this hands-on guide, we expand on how to configure Spark, and use Python to … Read moreSpark, Cassandra and Python

Intro to Big Data Projects

Modern applications produce super large datasets beyond what traditional data-processing application can handle. Big data is a discipline that specialize in processing such data. For example, analysis, information extraction etc. The scale of large dataset grows well beyond the capacity of a single computer, which calls for computing power delivered by multi-node clustered systems. Intensive … Read moreIntro to Big Data Projects

Host legacy application in Docker

This is my notes from containerizing a legacy application with Docker compose. We have to run multiple instances of our application because we’re unable to secure additional VMs for this single-VM education environment. The application is target of containerization, because it requires mass reconfiguration (around TCP port) to run multiple instances of the application. We … Read moreHost legacy application in Docker

Zookeeper Summary

Distributed systems Distributed system involves independent computing entities linked together by network. The components communicate and coordinate with each other to achieve a common goal. In early days, designers and developers often had made some assumptions (aka. fallacies) of distributed computing: The network is reliable Latency is zero Bandwidth is infinite Network is secure Topology … Read moreZookeeper Summary

Virtualization 4 of 4 – Networking

Virtual LAN (VLAN) Although VLAN emerged before virtualization and is not part of virtualization topic. I’d just like to start from here as a refresher. Suppose we have computers from finance department and computers from sales department all connected to a single layer-2 switch. There are at least three problems: 1) too many devices on … Read moreVirtualization 4 of 4 – Networking

Virtualization 3 of 4 – Containers

In broad terms, virtualization of computing resource is about isolation of resources at different levels. We have covered hypervisor-based virtualization in the other post. In this article, we continue to dive into OS level virtualization. Remember again that the gist of virtualization is isolation of resource. To support OS level virtualization, the OS must have … Read moreVirtualization 3 of 4 – Containers

Cloud storage overview

In a narrow sense, cloud storage refers to object storage. In a broader sense, it refers to any storage service (block, file or object level) provided by cloud vendors, in a cloud business model. The underlying technology of storage, is the same be it in the cloud or on-premise. Block storage File storage object Interaction … Read moreCloud storage overview

Java Garbage Collection

Tuning the garbage collector is the most important thing that can be done to improve the performance of a Java application. OpenJDK has three collectors suitable for production, with different performance characteristics. In order to study the GC behaviours in application, it is important to turn on GC logging. The detailed step is different in … Read moreJava Garbage Collection

Virtualization 2 of 4 – Graphics Computing

We covered hypervisor in previous post. In this article we focus on the virtualization of graphics computing resource. GPU vs CPU GPU is a specialized type of microprocessor primarily designed for quick image rendering. GPU appeared as a response to graphically intense applications that put a burden on the CPU and degrated computer performance. They … Read moreVirtualization 2 of 4 – Graphics Computing

Virtualization 1 of 4 – Hypervisor

In broad terms, virtualization of computing resource is about isolation of resources, at different levels. There are five levels of virtualization: Application level, such as JVM, .NET CLR Library (user-level API) level Operating system level, such as LXC, Docker, OpenVZ Hardware abstraction layer (HAL) level, such as VMware, Xen, etc Instruction set architecture (ISA) level … Read moreVirtualization 1 of 4 – Hypervisor