Data Lake

Share on:

Top > Intelligence > Big Data and Analytics > Data Lake

  • AWS Lake Formation - An accelerator to set up and secure a data lakes. It is based on the orchestration of AWS primitives such as S3, Redshift, Athena, EMR, and Apache Spark.  🌐
  • Amazon S3 - Amazon S3 is a the de facto blob storage solution in AWS. Volumes in S3 are called Buckets.  🌐
  • Hadoop - Apache Hadoop is an open source BigData suite based on the MapReduce programming model. It was design for massive distributed storage and processing. Its core component is the Hadoop Distributed File System (HDFS) which provides sharding and replication. HDFS is typically accessed through APIs but may be mounted as a regular file system with some limitations.  🌐
  • Qubole - Qubole is essentially a managed (SaaS) Hadoop offering (on top of AWS) with additional bells and whistles.  🌐
  • Snowflake - A cloud-based data lake and analytics suite.  🌐
  • Yellow Brick - A niche data warehousing solution focused on large-scale, real-time analytics using a high-speed proprietary hardware architecture on-prem, and carefully chosen and optimised AWS primitives in the cloud/SaaS version.  🌐

Before You Leave

🤘 Subscribe to my 100% spam-free newsletter!

website counters