Integrate this into Grafana as an app plugin and you’d have me. I don’t want to leave Grafana where I have all my other operational dashboards for this.
Love the tool, but its not practical in the enterprise world to have yet another dashboard service to look at just for metrics. It would be great, if this plays well with grafana or Otel collectors.
OTOH, monitoring long running background jobs on CH cluster is very valuable to have. Its real pain to verify, if parent and child queries have executed correctly. I would suggest doubling down on features that users cannot readily get via grafana or Otel.
"not practical" for who? If you need to debug your clickhouse clusters, you look at the clickhouse tool. That's it. This isn't an alerting/monitoring solution, it's a specialized tool for debugging and fixing issues with running clusters.
that kind of thinking (that it's too hard to learn a second tool) is how datadog gets away with charging $$$$ for mediocre versions of 10 different products that cost an order of magnitude more than they would individually. the benefits you get from combining everything into one tool are vastly overstated compared to the benefits you get from having the in-house expertise to use the right tool for the job.
Imagine you have a small business that tracks in the order of 10's - 100's of millions of events (pageviews, clicks, whatever), and you have reporting you want to run. Trying to do this in PG/MySQL would likely need to use materialized views so your reports don't take a long time to run. You could store your event data in CH directly, or use ELT/ETL process to sync/copy it into clickhouse just for reporting. Then, your queries would be very fast. It's must faster (for certain types of queries, mainly timeseries queries or queries involving aggregation of many rows). It's faster because of how the data is stored on disk. It's NOT good for fetching/updating/deleting single rows however.
It's originally designed to handle hundreds of columns, and billions of rows, but I think it can still apply to much smaller use cases that value performance. I'm implementing it currently in a similar scenario, and I'm using AirByte OSS version to ELT from postgres. Then I'm using tableau or some other BI tool to analyze that data much more effectively (I will be trying to perform complex aggregations/group by reports on 100mm rows)
Row based databases are optimized for accessing compete rows and joins. Columnar storage is optimized for accessing all, or many column values across rows. This makes aggregates and applying transformation logic faster with columnar storage than row based storage. Ie they are great for data warehouses and other analytical workloads.
Less about the data itself and more about the specific operations you want to do on it.
Large aggregations, massive datasets, large joins, and workloads that are ready heavy and eschew row-level mutations.
They get used for data analysis frequently, time series data and associated analysis meshes quite nicely too. ClickHouse itself was originally built to support arbitrary analytical queries on clickstream data at pretty massive scale. Cloudflare uses it for live analytics, Uber uses it for logs.
This is an overly simplistic but also correct answer: clickhouse was developed for analytics on clickstreams.
Technically the overall idea is that if you have lots of queries that only read certain columns and your database stores rows contiguously it's a waste to read a whole row and then discard columns.
Also compression (such as run length or delta or even ztsd) often works better if you give it a block of data that's from one column (such as a timestamp or tag value).
MDX describes data through a multidimensional structure, which makes the semantic model it presents closer to the real business, and based on this multidimensional model for more complex queries, SQL models can also provide similar capabilities, but it may be laborious or even extremely difficult to achieve when dealing with complex queries, but MDX also has disadvantages compared to SQL, that is, to thoroughly understand the multidimensional data model than to understand the SQL table model requires more learning costs.