Welcome To C2FO.io

Official website for the developers of C2FO.com

Exploring Apache Spark: Understand the RDD

Exploring Apache Spark: Understanding the RDD The Resilient Distributed Dataset (RDD) Developed at UC Berkley in 2009 and eventually open-sourced to the Apache Foundation, Spark RDDs implement a... [Read More]

Apache Spark: Config Cheatsheet (Part 2)

(Part 2) Client Mode This post covers client mode specific settings, for cluster mode specific settings, see Part 1. In my previous post, I explained how manually configuring your Apache Spark settings could increase the efficiency of your Spark jobs and, in some circumstances, allow you to use... [Read More]

Apache Spark: Config Cheatsheet

(Part 1) Cluster Mode This post covers cluster mode specific settings, for client mode specific settings, see Part 2. The Problem One morning, while doing some back-of-an-envelope calculations, I discovered that we could lower our AWS costs by using clusters of fewer, powerful machines. <img src="/img/apache-spark-config-cheatsheet/spark_config_cluster_table.png" alt="Table... [Read More]

Protecting your product with npm 'save-exact'

There’s no question that npm and node have a massive open source ecosystem backing them. Each day brings hundreds of new packages and thousands of updates to existing ones. With a simple npm install we can grab any package we want. npm install... [Read More]

Understanding Programmatic Side-Effects

What are Side-Effects? Side-effects are a concept I’ve been introduced to recently and when I examined my code it was surprising to see how much disruption they can cause. To better understand a programmatic side-effect we should start with a text-book definition of what this means in computer science.... [Read More]