Data Analysis
Top > Intelligence > Big Data and Analytics > Data Analysis
- Amazon Athena - A serverless, interactive query service to analyse data directly in Amazon S3 using standard SQL without the need of undergoing an ingestion/ETL stage first. π
- Top > Transversal > Cloud > Cloud Platforms > AWS > AWS Services
- Apache Arrow - A language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. It provides binding for most mainstream languages such as Java, C#, Python, etc. π
- Apache Flink - Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. π
- Apache Spark - Apache Spark, similarly to Hadoop, is a distributed computing solution but focuses on the compute, rather than on the storage, aspect. Unlike solutions based on the MapReduce model, Spark allows defining iterative programs, using a functional/higher-order model, that allows, among others, visiting data multiple times. π
- BigQuery - A petabyte-scale data warehouse by Google designed to ingest, store, analyse, and visualise data. BigQuery supports a standard SQL dialect that is ANSI-compliant. BigQuery (unlike Google BigTable) is ideal when there is the need to scan a large table including queries such as sums, averages, counts, groupings or even queries for creating machine learning models.
- Top > Transversal > Cloud > Cloud Platforms > GCP
- BigQuery Omni - A multi-cloud (AWS, Azure, etc.) solution based on Google Anthos to access and analyse data using the BigQuery user interface. π
- Top > Transversal > Cloud > Cloud Platforms > GCP
- CDF - Cloudera DataFlow (CDF) is a scalable, real-time streaming data platform with an emphasis on streaming analytics. π
- DataProc - A platform to run open source data and analytics workloads (Apache Spark, Hadoop, etc.) in GCP. π
- Top > Transversal > Cloud > Cloud Platforms > GCP
- Databricks - A web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks. π
- Microsoft Power BI - A service that provides interactive visualisations and businessβ intelligence capabilities with an interface simple enough for end users to create their own reports and dashboards. π
- Ray - A universal Python and Java API for building distributed machine learning-based applications. It helps parallelising single machine code, with little to zero code changes. π
- Snowflake - A cloud-based data lake and analytics suite. π
Before You Leave
π€ Subscribe to my 100% spam-free newsletter!