基于 Prometheus、Grafana 的 EMQ X 物联网 MQTT 服务器可视化运维监控
时间: 2020-05-13来源:OSCHINA
前景提要
Prometheus 是由 SoundCloud 开源监控告警解决方案,支持多维 数据模型 (时序由 metric 名字和 k/v 的 labels 构成),具备灵活的查询语句( PromQL ),支持多种数据采集 exporters ;支持告警管理,基于指标实现告警监控;支持多种统计数据模型,图形化展示友好,图形展示除了内置的浏览器,也支持 Grafana 集成。
物联网 MQTT 服务器 EMQ X 提供 emqx_statsd 插件,用于将 EMQ X 运行指标及 Erlang 虚拟机状态数据输出到第三方的监控系统如 Prometheus 中。通过 Prometheus 自带的 node-exporter 还可以采集 Linux 服务器相关指标,实现服务器 + EMQ X 整体运维监控。
本文提供了 Prometheus + Grafana 整套 EMQ X 运维监控方案搭建过程。
安装与准备
Docker 镜像下载 # Docker 镜像包下载 docker pull prom/node-exporter docker pull prom/prometheus docker pull prom/pushgateway
启动 node-exporter
可选,用于收集服务器指标如 CPU、内存、网络等,如果使用 Docker 安装则需要映射目标服务器响应的状态文件: docker run -d -p 9100:9100 -v "/proc:/host/proc:ro" -v "/sys:/host/sys:ro" -v "/:/rootfs:ro" --net="host" prom/node-exporter
启动 pushgateway
pushgateway 用于接收 EMQ X 指标推送数据, 需要保证 EMQ X 能够访问 : docker run -d -p 9091:9091 prom/pushgateway
启动 Prometheus
指定配置文件与监听端口以启动 Prometheus: # 指定配置文件并启动 docker run -p 9090:9090 -v $PWD/prometheus.yaml:/etc/prometheus/prometheus.yaml -d prom/prometheus --config.file=/etc/prometheus/prometheus.yaml
Prometheus 配置文件 prometheus.yaml 样例如下,详细含义请参考 Prometheus 文档 : # prometheus.yaml global: scrape_interval: 10s # 默认抓取时间 evaluation_interval: 10s # 每10秒评估一次rules # 在本机上每一条时间序列上都会默认产生的,主要可以用于联合查询、远程存储、Alertmanager时使用。 external_labels: monitor: 'emqx-monitor' # 加载规则,依据 evaluation_interval 来定期评估rule rule_files: # - "first.rules" # - "second.rules" - "/etc/prometheus/rules/*.rules" # 数据拉取配置 scrape_configs: # 表示在这个配置内的时间序例,每一条都会自动添加上这个{job_name:"prometheus"}的标签 - job_name: 'prometheus' scrape_interval: 5s static_configs: - targets: ['127.0.0.1:9090'] # 服务器物理机监控 - job_name: 'node-exporter' scrape_interval: 5s static_configs: # node-exporter 根据实际情况填写 - targets: ['192.168.6.11:9100'] labels: instance: wivwiv-local # EMQ X Pushgateway 监控 - job_name: 'pushgateway' scrape_interval: 5s honor_labels: true static_configs: # pushgateway 根据实际情况填写 - targets: ['192.168.6.11:9091']
启动 EMQ X statsd 插件
打开 etc/emqx_statsd.conf ,确认以下配置: ## pushgateway 地址 statsd.push.gateway.server = http://127.0.0.1:9091 ## 数据采集/推送周期(毫秒) statsd.interval = 15000
启动插件:
./bin/emqx_ctl load plugins emqx_statsd
效果查看
通过 docker ps -a 命令查看组件是否成功运行,等待数个推送周期后,打开 http://localhost:9090 Prometheus 控制面板查看采集数据。 Prometheus 仅提供简单图表数据展示,如需更精美的可视化展示请结合 Grafana 使用。

集成 Grafana
Grafana 是一个开源、通用的度量分析与可视化展示工具,通过数据源(如各类数据库、开源组件),展示自定义报表、显示图表等。
启动 Grafana
通过 Docker 拉取并启动 Grafana 镜像: docker run -d --name=grafana -p 3000:3000 grafana/grafana
启动成功后,浏览器访问 http://127.0.0.1:3000 打开 Dashboard 控制台。
配置 Prometheus 数据源
在 Grafana 中添加数据源,选择 Prometheus 并填写正确的地址完成数据源添加。

导入 Grafana 模板数据
emqx_statsd 插件提供了 Grafana 的 Dashboard 的模板文件,这些模板包含了大部分 EMQ X 监控数据的展示。用户可直接导入到 Grafana 中,用以显示 EMQ X 的监控状态的图标。
模板文件位于 emqx_statsd/grafana_template 中, 因 EMQ X 版本差异问题,可能存在部分图表数据显示错误的情况,请用户手动调整适配。
点击 Upload.json file 按钮,导入后选择对应的文件夹与数据源即可。

效果展示
完成整套系统搭建并运行一段时间后,Prometheus 收集到的数据将展示在 Grafana 上,默认模板展示效果如下: EMQ Dashboard:包含连接、消息、主题、吞吐量历史统计 EMQ:包含客户端数、订阅数、主题数、消息数、报文数等业务信息历史统计 ErlangVM:每个 EMQ X 节点 Erlang 虚拟机进程/线程数量,ETS/Mnesia 数据库使用情况历史统计
如有其他需求,可以参照 「附:emqx-statsd 所有指标」并结合 Grafana 进行图标数据编排展示。



告警管理
Prometheus 与 Grafana 均支持指标告警功能,配置告警规则后,服务器会不断评估设置的规则与当前指标数据,在规则条件符合的时候发送出通知。
篇幅有限,告警相关配置与实践请关注后续文章。
附:emqx-statsd 所有指标
EMQ X MQTT 服务器通过 Prometheus push gateway 推送指标数据,支持的指标项如下: # TYPE erlang_vm_ets_limit gauge erlang_vm_ets_limit 256000 # TYPE erlang_vm_logical_processors gauge erlang_vm_logical_processors 4 # TYPE erlang_vm_logical_processors_available gauge erlang_vm_logical_processors_available NaN # TYPE erlang_vm_logical_processors_online gauge erlang_vm_logical_processors_online 4 # TYPE erlang_vm_port_count gauge erlang_vm_port_count 16 # TYPE erlang_vm_port_limit gauge erlang_vm_port_limit 1048576 # TYPE erlang_vm_process_count gauge erlang_vm_process_count 320 # TYPE erlang_vm_process_limit gauge erlang_vm_process_limit 2097152 # TYPE erlang_vm_schedulers gauge erlang_vm_schedulers 4 # TYPE erlang_vm_schedulers_online gauge erlang_vm_schedulers_online 4 # TYPE erlang_vm_smp_support untyped erlang_vm_smp_support 1 # TYPE erlang_vm_threads untyped erlang_vm_threads 1 # TYPE erlang_vm_thread_pool_size gauge erlang_vm_thread_pool_size 4 # TYPE erlang_vm_time_correction untyped erlang_vm_time_correction 1 # TYPE erlang_vm_statistics_context_switches counter erlang_vm_statistics_context_switches 20767 # TYPE erlang_vm_statistics_garbage_collection_number_of_gcs counter erlang_vm_statistics_garbage_collection_number_of_gcs 3924 # TYPE erlang_vm_statistics_garbage_collection_words_reclaimed counter erlang_vm_statistics_garbage_collection_words_reclaimed 6751048 # TYPE erlang_vm_statistics_garbage_collection_bytes_reclaimed counter erlang_vm_statistics_garbage_collection_bytes_reclaimed 54008384 # TYPE erlang_vm_statistics_bytes_received_total counter erlang_vm_statistics_bytes_received_total 23332 # TYPE erlang_vm_statistics_bytes_output_total counter erlang_vm_statistics_bytes_output_total 21266 # TYPE erlang_vm_statistics_reductions_total counter erlang_vm_statistics_reductions_total 18413181 # TYPE erlang_vm_statistics_run_queues_length_total gauge erlang_vm_statistics_run_queues_length_total 0 # TYPE erlang_vm_statistics_runtime_milliseconds counter erlang_vm_statistics_runtime_milliseconds 1782 # TYPE erlang_vm_statistics_wallclock_time_milliseconds counter erlang_vm_statistics_wallclock_time_milliseconds 68277 # TYPE erlang_vm_memory_atom_bytes_total gauge erlang_vm_memory_atom_bytes_total{usage="used"} 1507142 erlang_vm_memory_atom_bytes_total{usage="free"} 18787 # TYPE erlang_vm_memory_bytes_total gauge erlang_vm_memory_bytes_total{kind="system"} 63949544 erlang_vm_memory_bytes_total{kind="processes"} 45457848 # TYPE erlang_vm_dets_tables gauge erlang_vm_dets_tables 0 # TYPE erlang_vm_ets_tables gauge erlang_vm_ets_tables 115 # TYPE erlang_vm_memory_processes_bytes_total gauge erlang_vm_memory_processes_bytes_total{usage="used"} 45457696 erlang_vm_memory_processes_bytes_total{usage="free"} 152 # TYPE erlang_vm_memory_system_bytes_total gauge erlang_vm_memory_system_bytes_total{usage="atom"} 1525929 erlang_vm_memory_system_bytes_total{usage="binary"} 104504 erlang_vm_memory_system_bytes_total{usage="code"} 26779999 erlang_vm_memory_system_bytes_total{usage="ets"} 7685312 erlang_vm_memory_system_bytes_total{usage="other"} 27853800 # TYPE erlang_mnesia_held_locks gauge erlang_mnesia_held_locks 0 # TYPE erlang_mnesia_lock_queue gauge erlang_mnesia_lock_queue 0 # TYPE erlang_mnesia_transaction_participants gauge erlang_mnesia_transaction_participants 0 # TYPE erlang_mnesia_transaction_coordinators gauge erlang_mnesia_transaction_coordinators 0 # TYPE erlang_mnesia_failed_transactions counter erlang_mnesia_failed_transactions 21 # TYPE erlang_mnesia_committed_transactions counter erlang_mnesia_committed_transactions 128 # TYPE erlang_mnesia_logged_transactions counter erlang_mnesia_logged_transactions 3 # TYPE erlang_mnesia_restarted_transactions counter erlang_mnesia_restarted_transactions 0 # TYPE emqx_connections_count gauge emqx_connections_count 0 # TYPE emqx_connections_max gauge emqx_connections_max 0 # TYPE emqx_sessions_count gauge emqx_sessions_count 0 # TYPE emqx_sessions_max gauge emqx_sessions_max 0 # TYPE emqx_topics_count gauge emqx_topics_count 0 # TYPE emqx_topics_max gauge emqx_topics_max 0 # TYPE emqx_suboptions_count gauge emqx_suboptions_count 0 # TYPE emqx_suboptions_max gauge emqx_suboptions_max 0 # TYPE emqx_subscribers_count gauge emqx_subscribers_count 0 # TYPE emqx_subscribers_max gauge emqx_subscribers_max 0 # TYPE emqx_subscriptions_count gauge emqx_subscriptions_count 0 # TYPE emqx_subscriptions_max gauge emqx_subscriptions_max 0 # TYPE emqx_subscriptions_shared_count gauge emqx_subscriptions_shared_count 0 # TYPE emqx_subscriptions_shared_max gauge emqx_subscriptions_shared_max 0 # TYPE emqx_routes_count gauge emqx_routes_count 0 # TYPE emqx_routes_max gauge emqx_routes_max 0 # TYPE emqx_retained_count gauge emqx_retained_count 3 # TYPE emqx_retained_max gauge emqx_retained_max 3 # TYPE emqx_vm_cpu_use gauge emqx_vm_cpu_use 12.029950083194677 # TYPE emqx_vm_cpu_idle gauge emqx_vm_cpu_idle 87.97004991680532 # TYPE emqx_vm_run_queue gauge emqx_vm_run_queue 1 # TYPE emqx_vm_process_messages_in_queues gauge emqx_vm_process_messages_in_queues 0 # TYPE emqx_bytes_received counter emqx_bytes_received 0 # TYPE emqx_bytes_sent counter emqx_bytes_sent 0 # TYPE emqx_packets_received counter emqx_packets_received 0 # TYPE emqx_packets_sent counter emqx_packets_sent 0 # TYPE emqx_packets_connect counter emqx_packets_connect 0 # TYPE emqx_packets_connack_sent counter emqx_packets_connack_sent 0 # TYPE emqx_packets_connack_error counter emqx_packets_connack_error 0 # TYPE emqx_packets_connack_auth_error counter emqx_packets_connack_auth_error 0 # TYPE emqx_packets_publish_received counter emqx_packets_publish_received 0 # TYPE emqx_packets_publish_sent counter emqx_packets_publish_sent 0 # TYPE emqx_packets_publish_inuse counter emqx_packets_publish_inuse 0 # TYPE emqx_packets_publish_error counter emqx_packets_publish_error 0 # TYPE emqx_packets_publish_auth_error counter emqx_packets_publish_auth_error 0 # TYPE emqx_packets_publish_dropped counter emqx_packets_publish_dropped 0 # TYPE emqx_packets_puback_received counter emqx_packets_puback_received 0 # TYPE emqx_packets_puback_sent counter emqx_packets_puback_sent 0 # TYPE emqx_packets_puback_inuse counter emqx_packets_puback_inuse 0 # TYPE emqx_packets_puback_missed counter emqx_packets_puback_missed 0 # TYPE emqx_packets_pubrec_received counter emqx_packets_pubrec_received 0 # TYPE emqx_packets_pubrec_sent counter emqx_packets_pubrec_sent 0 # TYPE emqx_packets_pubrec_inuse counter emqx_packets_pubrec_inuse 0 # TYPE emqx_packets_pubrec_missed counter emqx_packets_pubrec_missed 0 # TYPE emqx_packets_pubrel_received counter emqx_packets_pubrel_received 0 # TYPE emqx_packets_pubrel_sent counter emqx_packets_pubrel_sent 0 # TYPE emqx_packets_pubrel_missed counter emqx_packets_pubrel_missed 0 # TYPE emqx_packets_pubcomp_received counter emqx_packets_pubcomp_received 0 # TYPE emqx_packets_pubcomp_sent counter emqx_packets_pubcomp_sent 0 # TYPE emqx_packets_pubcomp_inuse counter emqx_packets_pubcomp_inuse 0 # TYPE emqx_packets_pubcomp_missed counter emqx_packets_pubcomp_missed 0 # TYPE emqx_packets_subscribe_received counter emqx_packets_subscribe_received 0 # TYPE emqx_packets_subscribe_error counter emqx_packets_subscribe_error 0 # TYPE emqx_packets_subscribe_auth_error counter emqx_packets_subscribe_auth_error 0 # TYPE emqx_packets_suback_sent counter emqx_packets_suback_sent 0 # TYPE emqx_packets_unsubscribe_received counter emqx_packets_unsubscribe_received 0 # TYPE emqx_packets_unsubscribe_error counter emqx_packets_unsubscribe_error 0 # TYPE emqx_packets_unsuback_sent counter emqx_packets_unsuback_sent 0 # TYPE emqx_packets_pingreq_received counter emqx_packets_pingreq_received 0 # TYPE emqx_packets_pingresp_sent counter emqx_packets_pingresp_sent 0 # TYPE emqx_packets_disconnect_received counter emqx_packets_disconnect_received 0 # TYPE emqx_packets_disconnect_sent counter emqx_packets_disconnect_sent 0 # TYPE emqx_packets_auth_received counter emqx_packets_auth_received 0 # TYPE emqx_packets_auth_sent counter emqx_packets_auth_sent 0 # TYPE emqx_messages_received counter emqx_messages_received 0 # TYPE emqx_messages_sent counter emqx_messages_sent 0 # TYPE emqx_messages_qos0_received counter emqx_messages_qos0_received 0 # TYPE emqx_messages_qos0_sent counter emqx_messages_qos0_sent 0 # TYPE emqx_messages_qos1_received counter emqx_messages_qos1_received 0 # TYPE emqx_messages_qos1_sent counter emqx_messages_qos1_sent 0 # TYPE emqx_messages_qos2_received counter emqx_messages_qos2_received 0 # TYPE emqx_messages_qos2_sent counter emqx_messages_qos2_sent 0 # TYPE emqx_messages_publish counter emqx_messages_publish 0 # TYPE emqx_messages_dropped counter emqx_messages_dropped 0 # TYPE emqx_messages_dropped_expired counter emqx_messages_dropped_expired 0 # TYPE emqx_messages_dropped_no_subscribers counter emqx_messages_dropped_no_subscribers 0 # TYPE emqx_messages_forward counter emqx_messages_forward 0 # TYPE emqx_messages_retained counter emqx_messages_retained 2 # TYPE emqx_messages_delayed counter emqx_messages_delayed 0 # TYPE emqx_messages_delivered counter emqx_messages_delivered 0 # TYPE emqx_messages_acked counter emqx_messages_acked 0 # TYPE emqx_delivery_dropped counter emqx_delivery_dropped 0 # TYPE emqx_delivery_dropped_no_local counter emqx_delivery_dropped_no_local 0 # TYPE emqx_delivery_dropped_too_large counter emqx_delivery_dropped_too_large 0 # TYPE emqx_delivery_dropped_qos0_msg counter emqx_delivery_dropped_qos0_msg 0 # TYPE emqx_delivery_dropped_queue_full counter emqx_delivery_dropped_queue_full 0 # TYPE emqx_delivery_dropped_expired counter emqx_delivery_dropped_expired 0 # TYPE emqx_client_connected counter emqx_client_connected 0 # TYPE emqx_client_authenticate counter emqx_client_authenticate 0 # TYPE emqx_client_auth_anonymous counter emqx_client_auth_anonymous 0 # TYPE emqx_client_check_acl counter emqx_client_check_acl 0 # TYPE emqx_client_subscribe counter emqx_client_subscribe 0 # TYPE emqx_client_unsubscribe counter emqx_client_unsubscribe 0 # TYPE emqx_client_disconnected counter emqx_client_disconnected 0 # TYPE emqx_session_created counter emqx_session_created 0 # TYPE emqx_session_resumed counter emqx_session_resumed 0 # TYPE emqx_session_takeovered counter emqx_session_takeovered 0 # TYPE emqx_session_discarded counter emqx_session_discarded 0 # TYPE emqx_session_terminated counter emqx_session_terminated 0 版权声明: 本文为 EMQ 原创,转载请注明出处。
原文链接: https://www.emqx.io/cn/blog/emqx-prometheus-grafana

科技资讯:

科技学院:

科技百科:

科技书籍:

网站大全:

软件大全:

热门排行