Data Analysis

Share on:

Top > Intelligence > Big Data and Analytics > Data Analysis

  • Amazon Athena - A serverless, interactive query service to analyse data directly in Amazon S3 using standard SQL without the need of undergoing an ingestion/ETL stage first.  πŸŒ
  • Apache Arrow - A language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. It provides binding for most mainstream languages such as Java, C#, Python, etc.  πŸŒ
  • Apache Flink - Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.  πŸŒ
  • Apache Spark - Apache Spark, similarly to Hadoop, is a distributed computing solution but focuses on the compute, rather than on the storage, aspect. Unlike solutions based on the MapReduce model, Spark allows defining iterative programs, using a functional/higher-order model, that allows, among others, visiting data multiple times.  πŸŒ
  • BigQuery - A petabyte-scale data warehouse by Google designed to ingest, store, analyse, and visualise data. BigQuery supports a standard SQL dialect that is ANSI-compliant. BigQuery (unlike Google BigTable) is ideal when there is the need to scan a large table including queries such as sums, averages, counts, groupings or even queries for creating machine learning models.
  • BigQuery Omni - A multi-cloud (AWS, Azure, etc.) solution based on Google Anthos to access and analyse data using the BigQuery user interface.   πŸŒ
  • CDF - Cloudera DataFlow (CDF) is a scalable, real-time streaming data platform with an emphasis on streaming analytics.   πŸŒ
  • DataProc - A platform to run open source data and analytics workloads (Apache Spark, Hadoop, etc.) in GCP.  πŸŒ
  • Databricks - A web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks.   πŸŒ
  • Microsoft Power BI - A service that provides interactive visualisations and businessβ€…intelligence capabilities with an interface simple enough for end users to create their own reports and dashboards.  πŸŒ
  • Ray - A universal Python and Java API for building distributed machine learning-based applications. It helps parallelising single machine code, with little to zero code changes.   πŸŒ
  • Snowflake - A cloud-based data lake and analytics suite.  πŸŒ

Before You Leave

🀘 Subscribe to my 100% spam-free newsletter!

website counters