No Apache Parquet videos yet. You could help us improve this page by suggesting one.
Based on our record, Apache Parquet seems to be a lot more popular than Apache Kylin. While we know about 25 links to Apache Parquet, we've tracked only 1 mention of Apache Kylin. We are tracking product recommendations and mentions on various public social media platforms and blogs. They can help you identify which product is more popular and what people think of it.
If there was a way to package and compress the Excel spreadsheet in a web-friendly format, then there's nothing stopping us from loading the entire dataset in the browser!1 Sure enough, the Parquet file format was specifically designed for efficient portability. - Source: dev.to / about 1 month ago
Iceberg decouples storage from compute. That means your data isnโt trapped inside one proprietary system. Instead, it lives in open file formats (like Apache Parquet) and is managed by an open, vendor-neutral metadata layer (Apache Iceberg). - Source: dev.to / 6 months ago
Data prep kit github repository: https://github.com/data-prep-kit/data-prep-kit?tab=readme-ov-file Quick start guide: https://github.com/data-prep-kit/data-prep-kit/blob/dev/doc/quick-start/contribute-your-own-transform.md Provided samples and examples: https://github.com/data-prep-kit/data-prep-kit/tree/dev/examples Parquet: https://parquet.apache.org/. - Source: dev.to / 6 months ago
Deliver nice ready-to-use data as duckdb, parquet and csv. - Source: dev.to / 6 months ago
Push the dataset to hugging face in parquet format. - Source: dev.to / 11 months ago
A Kafka-based data integration platform will be a good fit here. The services can add events to different topics in a broker whenever there is a data update. Kafka consumers corresponding to each of the services can monitor these topics and make updates to the data in real-time. It is also possible to create a unified data store through the same integration platform. Developers can implement a unified store either... - Source: dev.to / about 3 years ago
Apache Arrow - Apache Arrow is a cross-language development platform for in-memory data.
Google BigQuery - A fully managed data warehouse for large-scale data analytics.
Apache Spark - Apache Spark is an engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
ClickHouse - ClickHouse is an open-source column-oriented database management system that allows generating analytical data reports in real time.
DuckDB - DuckDB is an in-process SQL OLAP database management system
Spring Batch - Level up your Java code and explore what Spring can do for you.