安装哪吒监控所踩的坑

前言这是一篇便做边写的水文,有一定参考价值。这不是一篇教程,请不要与我文章中所作所为同步。建议读完全篇后作取舍。报错部分可略过,挑取有价值的部分。

安装监控端一开始怎么都安装不上,显示无 TencentOS 分支

1

2

Status code: 404 for http://mirrors.tencentyun.com/tlinux/3/TencentOS/x86_64/repodata/repomd.xml (IP: \*.\*.\*.\*)

Error: Failed to download metadata for repo 'TencentOS': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

自己加docker的镜像后发现报错

1

2

3

Errors during downloading metadata for repository 'docker-ce-stable':

- Status code: 404 for https://mirrors.cloud.tencent.com/docker-ce/linux/centos/3.1/x86_64/stable/repodata/repomd.xml (IP: \*.\*.\*.\*)

Error: Failed to download metadata for repo 'docker-ce-stable': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

然后发现了TencentOS的一个issue,救星!

docker 安装问题

跟着做发现报错。这才发现yum源被我搞炸了。进入yum源配置文件文件夹

1

cd /etc/yum.repos.d

ls 之后发现果然有docker-ce-stable 相关的repo文件,rm删除

yum update 之后根据上面GitHub的issue走

1

2

yum -y install tencentos-release-docker-ce

yum -y install docker-ce

然后再执行

1

sudo ./nezha.sh

不再报无法连接到docker了

1637550967926.png

安装面板端,1

1637551035963.png

跟着走,到目前为止都没什么问题,很开心。

1637551077771.png然后就炸了。

将当前用户加入docker组,切换用户试试

1

2

3

sudo gpasswd -a ${USER} docker

sudo su

su ${USER}

1637551226908.png依旧报错

安装完docker-compose后启动,报错

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

> 启动面板

Traceback (most recent call last):

File "urllib3/connectionpool.py", line 677, in urlopen

File "urllib3/connectionpool.py", line 392, in _make_request

File "http/client.py", line 1277, in request

File "http/client.py", line 1323, in _send_request

File "http/client.py", line 1272, in endheaders

File "http/client.py", line 1032, in _send_output

File "http/client.py", line 972, in send

File "docker/transport/unixconn.py", line 43, in connect

FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "requests/adapters.py", line 449, in send

File "urllib3/connectionpool.py", line 727, in urlopen

File "urllib3/util/retry.py", line 410, in increment

File "urllib3/packages/six.py", line 734, in reraise

File "urllib3/connectionpool.py", line 677, in urlopen

File "urllib3/connectionpool.py", line 392, in _make_request

File "http/client.py", line 1277, in request

File "http/client.py", line 1323, in _send_request

File "http/client.py", line 1272, in endheaders

File "http/client.py", line 1032, in _send_output

File "http/client.py", line 972, in send

File "docker/transport/unixconn.py", line 43, in connect

urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "docker/api/client.py", line 214, in _retrieve_server_version

File "docker/api/daemon.py", line 181, in version

File "docker/utils/decorators.py", line 46, in inner

File "docker/api/client.py", line 237, in _get

File "requests/sessions.py", line 543, in get

File "requests/sessions.py", line 530, in request

File "requests/sessions.py", line 643, in send

File "requests/adapters.py", line 498, in send

requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "docker-compose", line 3, in

File "compose/cli/main.py", line 81, in main

File "compose/cli/main.py", line 200, in perform_command

File "compose/cli/command.py", line 70, in project_from_options

File "compose/cli/command.py", line 153, in get_project

File "compose/cli/docker_client.py", line 43, in get_client

File "compose/cli/docker_client.py", line 170, in docker_client

File "docker/api/client.py", line 197, in __init__

File "docker/api/client.py", line 222, in _retrieve_server_version

docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

[2064673] Failed to execute script docker-compose

启动失败,请稍后查看日志信息

重启。

再次启动发现报错。

1

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

忘记将docker添加到自启了

1

2

sudo systemctl enable docker.service

sudo systemctl enable containerd.service

然后启动docker

1

service docker start

终于,启动成功

1637567938818.png

添加反代需要反代websocket,否则无法实时监控。

在宝塔新建网站,反代配置如下。

自己摸索的反代配置,很可能有更好的配置,但我这块不熟。

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

location /

{

proxy_pass http://127.0.0.1:8008;

proxy_set_header Host $host;

}

location /ws

{

proxy_pass http://127.0.0.1:8008;

proxy_http_version 1.1;

proxy_set_header Upgrade $http_upgrade;

proxy_set_header Connection "Upgrade";

proxy_set_header Host $host;

}

location /terminal

{

proxy_pass http://127.0.0.1:8008;

proxy_http_version 1.1;

proxy_set_header Upgrade $http_upgrade;

proxy_set_header Connection "Upgrade";

proxy_set_header Host $host;

}

安装受控端安装第一台受控端的时候很快,其实就是监控端受控端在同一服务器上跑。

安装第二台服务器的探针时发现怎么弄都没有上线。

打开

nmap扫描端口后发现该端口竟然是关闭的

1637575217400.png

查找宝塔发现端口未使用!说明没有程序在监听。

1637575372251.png

查看日志发现启动成功了。

遇事不决就重启。

未能解决问题

遂怀疑是SE Linux的问题,但是Ubuntu没有啊。

不管了,安装之后再禁用试试是否可行吧。

启动,不行

那就手动运行吧

1

/opt/nezha/agent/nezha-agent -s server2的IP:5555 -p agent密钥 -d

不行。但是有清晰的报错了。

1

2

3

4

5

6

7

➜ admin /opt/nezha/agent/nezha-agent -s server2的IP:5555 -p agent密钥 -d

NEZHA@2021-11-22 19:39:29>> 检查更新: 0.11.6

NEZHA@2021-11-22 19:39:29>> 上报系统信息失败: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp server2的IP:5555: connect: connection refused"

NEZHA@2021-11-22 19:39:29>> Error to close connection ...

NEZHA@2021-11-22 19:39:39>> Try to reconnect ...

NEZHA@2021-11-22 19:39:39>> 上报系统信息失败: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp server2的IP:5555: connect: connection refused"

NEZHA@2021-11-22 19:39:39>> Error to close connection ...

百度发现有类似的问题是防火墙导致的,关闭即可。于是去能正确运行的服务器运行了

1

systemctl status firewalld.service

发现防火墙是在正常工作的,不解。出错的服务器是Ubuntu,运行 sudo ufw status verbose 发现5555端口是开放的。

第二天突发奇想

1

2

➜ admin docker

zsh: command not found: docker

docker没有安装。(其实受控端不需要安装Docker)

安装后执行,报错同样。

1

2

3

4

➜ ~ /opt/nezha/agent/nezha-agent -s server2的IP:5555 -p agent密钥 -d

NEZHA@2021-11-23 18:53:14>> 检查更新: 0.11.6

NEZHA@2021-11-23 18:53:14>> 上报系统信息失败: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp server2的IP:5555: connect: connection refused"

NEZHA@2021-11-23 18:53:14>> Error to close connection ...

最终想起来不对,域名应该是指向面板啊!

所以不应该是 /opt/nezha/agent/nezha-agent -s server2的IP:5555 -p agent密钥 -d 而是 /opt/nezha/agent/nezha-agent -s 监控端的IP:5555 -p agent密钥 -d。

果然正常运行了!我真是个憨憨,在不应该出错的地方浪费的大量的时间。

总结监控端的安装提前安装好Docker(如果你的系统不是常规系统的话跟着提示走简单快捷受控端的安装跟着提示走,很简单。域名/IP应为监控端所在服务器的未套CDN的域名/IP