目录

Filebeat输出日志格式处理

使用filebeat可以在收集过程中进行一些简单的处理,如丢弃日志等,给后面的kafka等减少压力

普通文本日志格式

原始日志格式:

{"log":"2022-03-15 14:53:48.972 [http-nio-8080-exec-10] o.s.c.c.c.ConfigServicePropertySourceLocator-[227]-[INFO]-Connect Timeout Exception on Url - http://localhost:8888. Will be trying the next url if available\n","stream":"stdout","time":"2022-03-15T06:53:48.972854745Z"}

这里的原始日志是指要收集的日志文件的格式,上面的这个日志是被Kubernetes处理过的,真正程序输出的日志应该是log字段。

对应的filebeat配置文件如下:

filebeat.inputs:

- type: log
    symlinks: true
    enabled: true
    json.keys_under_root: false     # keys_under_root可以让字段位于根节点,默认为false
    json.overwrite_keys: false      # 对于同名的key,覆盖原有key值
    json.add_error_key: true        # 将解析错误的消息记录储存在error.message字段中
    tail_files: true                # 如果此选项设置为true,Filebeat开始在每个文件的末尾读取新文件,默认设置是false。
    paths:
    - /var/log/containers/*_dev_*.log
    # exclude_files:                    # 排除的文件路径
    #   - 'ingress-nginx-controller\.*'
    #   - '\.*_dev_\.*

    processors:
    - drop_event:
        when:
            or:
            - regexp:
                json.log: '定时任务task'
            - regexp:
                json.log: '定时任务执行成功'

    fields:
    log_topic: k8s-pod-logs
    type: "kube-logs"
    # multiline.negate: true
    # multiline.match: after

经过filebeat处理后输出的内容:
所以回过头来看上面的配置文件,drop_event regexp 下面 针对 json.log 做正则匹配,包含指定字符就丢弃。

{
    "@timestamp": "2022-03-15T07:04:44.917Z",
    "@metadata": {
        "beat": "filebeat",
        "type": "_doc",
        "version": "7.3.2",
        "topic": "k8s-pod-logs"
    },
    "log": {
        "offset": 6799764,
        "file": {
            "path": "/var/log/containers/idk-oms-server-7d95759554-9xqjs_dev_idk-oms-server-d43d4364fa14ac1657d4ee1d4ef2dda64acce26025bb1c7c605844dc906efbef.log"
        }
    },
    "json": {
        "log": "[2022-03-15 15:04:37.637] http-nio-8080-exec-8 com.ingeek.idk.oms.b-[-1]-[DEBUG]-[TID:null NAME:APPNAME ENV:dev INS:172.20.37.197] CORSFilter is work~~~\n",
        "stream": "stdout",
        "time": "2022-03-15T07:04:37.638273461Z"
    },
    "input": {
        "type": "log"
    },
    "fields": {
        "type": "kube-logs",
        "log_topic": "k8s-pod-logs"
    },
    "ecs": {
        "version": "1.0.1"
    },
    "host": {
        "name": "localhost.localdomain",
        "architecture": "x86_64",
        "os": {
            "platform": "centos",
            "version": "7 (Core)",
            "family": "redhat",
            "name": "CentOS Linux",
            "kernel": "5.4.113-1.el7.elrepo.x86_64",
            "codename": "Core"
        },
        "containerized": false,
        "hostname": "localhost.localdomain"
    },
    "agent": {
        "ephemeral_id": "d004fcc4-901c-4cc9-89c2-88ff0717c16d",
        "hostname": "localhost.localdomain",
        "id": "e7750a81-c8a1-43a6-a49c-db2be0ac4a8b",
        "version": "7.3.2",
        "type": "filebeat"
    }
}

针对json格式日志处理

原始日志:

{"@timestamp": "2022-03-15T14:15:14+08:00","server_addr":"192.168.13.120","remote_addr":"10.200.4.70","scheme":"https","request_method":"POST","request_uri": "/ingeek-analysis/apis/v3/collect","request_length": "25545","uri": "/ingeek-analysis/apis/v3/collect","request_time":0.004,"body_bytes_sent":833,"bytes_sent":1072,"status":"200","upstream_host":"172.20.17.40:8084","domain":"gemalto-tam-dk.ingeek.com","http_referer":"-","http_user_agent":"-","http_app_id":"3bdd8886a6261935","x_forwarded":"-","up_r_time":"0.003","up_status":"200","os_plant":"android","os_version":"11","app_version":"4.0.4","app_build":"97","guid":"b1d05ee5-b45a-40ef-92de-43008ee5eccb","resolution_ratio":"1080*2193","ip":"fe80::8070:92ff:fe99:902a%dummy0","imsi":"b1d05ee5-b45a-40ef-92de-43008ee5eccb","listen_port":"443"}

某一行包含xx就丢弃

比如我想丢弃所有request_uri/actuator/infohttp_user_agentGo-http-client/2.0的日志

filebeat.yml配置文件如下:

- type: log
    symlinks: true
    enabled: true
    json.keys_under_root: true
    json.overwrite_keys: true
    json.add_error_key: true
    tail_files: true
    paths:
    - /var/log/containers/ingress-nginx-controller*.log
    processors:
    - decode_json_fields:           # 解析二级json
        fields: ['log']             # 一级json的log键
        target: ""
        overwrite_keys: false
        process_array: false
        max_depth: 1
    - drop_event:
        when:
            or:
            - regexp:
                http_user_agent: 'Go-http-client/2.0'
            - regexp:
                request_uri: '/actuator/info'
    - drop_fields:
        # when: 可以设置去除的条件
        #   condition
        fields: ["log","host"]     # 要去除的字段
        ignore_missing: false
        
    fields:
    log_topic: "ingress-k8s"
    type: "ingress"

因为日志本身就是个json,经过filebeat后又会包装一个json,所以我们需要对日志做二级解析。