Software Alternatives & Reviews

Command-line data analytics made easy

DSQ OctoSQL Observable
  1. 1

    DSQ

    Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more. - GitHub - multiprocessio/dsq: Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and ...
    SPyQL is really cool and its design is very smart, with it being able to leverage normal Python functions! As far as similar tools go, I recommend taking a look at DataFusion[0], dsq[1], and OctoSQL[2]. DataFusion is a very (very very) fast command-line SQL engine but with limited support for data formats. Dsq is based on SQLite which means it has to load data into SQLite first, but then gives you the whole breath of SQLite, it also supports many data formats, but is slower at the same time. OctoSQL is faster, extensible through plugins, and supports incremental query execution, so you can i.e. Calculate a running group by + count while tailing a log file. It also supports normal databases, not just file formats, so you can i.e. Join with a Postgres table. [0]: https://github.com/apache/arrow-datafusion [1]: https://github.com/multiprocessio/dsq [2]: https://github.com/cube2222/octosql Disclaimer: Author of OctoSQL.

    #Application And Data #Languages & Frameworks #Shell Utilities 11 social mentions

  2. OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL. - cube2222/octosql
    SPyQL is really cool and its design is very smart, with it being able to leverage normal Python functions! As far as similar tools go, I recommend taking a look at DataFusion[0], dsq[1], and OctoSQL[2]. DataFusion is a very (very very) fast command-line SQL engine but with limited support for data formats. Dsq is based on SQLite which means it has to load data into SQLite first, but then gives you the whole breath of SQLite, it also supports many data formats, but is slower at the same time. OctoSQL is faster, extensible through plugins, and supports incremental query execution, so you can i.e. Calculate a running group by + count while tailing a log file. It also supports normal databases, not just file formats, so you can i.e. Join with a Postgres table. [0]: https://github.com/apache/arrow-datafusion [1]: https://github.com/multiprocessio/dsq [2]: https://github.com/cube2222/octosql Disclaimer: Author of OctoSQL.

    #Databases #Big Data #Relational Databases 22 social mentions

  3. Interactive code examples/posts
    Pricing:
    • Open Source
    It could be the NDJSON parser (DF source: [0]) or could be a variety of other factors. Looking at the ROAPI release archive [1], it doesn't ship with the definitive `columnq` binary from your comment, so it could also have something to do with compilation-time flags. FWIW, we use the Parquet format with DataFusion and get very good speeds similar to DuckDB [2], e.g. 1.5s to run a more complex aggregation query `SELECT date_trunc('month', tpep_pickup_datetime) AS month, COUNT(*) AS total_trips, SUM(total_amount) FROM tripdata GROUP BY 1 ORDER BY 1 ASC)` on a 55M row subset of NY Taxi trip data. [0]: https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/datasource/file_format/json.rs [1]: https://github.com/roapi/roapi/releases/tag/roapi-v0.8.0 [2]: https://observablehq.com/@seafowl/benchmarks.

    #Data Visualization #Data Dashboard #Data Science Notebooks 286 social mentions

Discuss: Command-line data analytics made easy

Log in or Post with