Kylo
Kylo is an open source enterprise-ready data lake management software platform for self-service data ingest and data preparation with integrated metadata management, governance, security and best practices inspired by Think Big's 150+ big data implementation projects. Self-service data ingest with data cleansing, validation, and automatic profiling. Wrangle data with visual sql and an interactive transform through a simple user interface. Search and explore data and metadata, view lineage, and profile statistics. Monitor health of feeds and services in the data lake. Track SLAs and troubleshoot performance. Design batch or streaming pipeline templates in Apache NiFi and register with Kylo to enable user self-service. Organizations can expend significant engineering effort moving data into Hadoop yet struggle to maintain governance and data quality. Kylo dramatically simplifies data ingest by shifting ingest to data owners through a simple guided UI.
Learn more
Google Cloud Lakehouse
Google Cloud Lakehouse is a storage engine designed to unify data warehouses and data lakes into a single, cohesive platform. It allows organizations to access and manage data in open formats such as Apache Iceberg, Parquet, and ORC. The platform enables users to work with a single copy of data without needing to duplicate or move it across systems. It provides fine-grained security controls to ensure proper data governance and access management. Google Cloud Lakehouse simplifies data operations by integrating analytics and storage capabilities. It supports modern data workflows, including big data processing and analytics. The platform is built to scale with enterprise data needs while maintaining performance and flexibility. Overall, it helps organizations streamline data management and unlock insights more efficiently.
Learn more
AWS Lake Formation
AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake lets you break down data silos and combine different types of analytics to gain insights and guide better business decisions. Setting up and managing data lakes today involves a lot of manual, complicated, and time-consuming tasks. This work includes loading data from diverse sources, monitoring those data flows, setting up partitions, turning on encryption and managing keys, defining transformation jobs and monitoring their operation, reorganizing data into a columnar format, deduplicating redundant data, and matching linked records. Once data has been loaded into the data lake, you need to grant fine-grained access to datasets, and audit access over time across a wide range of analytics and machine learning (ML) tools and services.
Learn more
IOMETE
IOMETE is a self-hosted data lakehouse platform built on Apache Iceberg, Apache Spark, and Kubernetes. Run it on-premises or in your private cloud — your infrastructure, your data, your control.
Built for enterprises in regulated industries, IOMETE eliminates third-party ICT risk at the data layer by architecture — not by contract. No SaaS dependencies. No data leaving your perimeter. Compliance with GDPR, DORA, and NIS2 is structural, not contractual.
Included in one platform:
- Data Lakehouse(s)
- Data Catalog
- SQL Editor
- Apache Spark Jobs
- ML Notebooks
- Orchestration Engine
- Spark Connect
Key capabilities: Apache Iceberg-native storage, Kubernetes-native deployment (K8s + OpenShift), row/column/tag-based access control, Data Mesh support, air-gapped and zero-trust compatible.
Transparent pricing — CPU-based, no per-query fees, no billing surprises.
Learn more