Troubleshooting Harvest

Checklists for Harvest

A set of steps to go through when something goes wrong.

What version of ONTAP do you have?

Run the following, replacing <poller> with the poller from your harvest.yaml

./bin/harvest zapi -p <poller> show system

Copy and paste the output into your issue. Here's an example:

./bin/harvest -p infinity show system
connected to infinity (NetApp Release 9.8P2: Tue Feb 16 03:49:46 UTC 2021)
[results]                             -                                   *
  [build-timestamp]                   -                          1613447386
  [is-clustered]                      -                                true
  [version]                           - NetApp Release 9.8P2: Tue Feb 16 03:49:46 UTC 2021
  [version-tuple]                     -                                   *
    [system-version-tuple]            -                                   *
      [generation]                    -                                   9
      [major]                         -                                   8
      [minor]                         -                                   0

Install fails

I tried to install and ...

How do I tell if Harvest is doing anything?

You believe Harvest is installed fine, but it's not working.

Post the contents of your harvest.yml

Try validating your harvest.yml with yamllint like so: yamllint -d relaxed harvest.yml If you do not have yamllint installed, look here.

There should be no errors - warnings like the following are fine:

harvest.yml
  64:1      warning  too many blank lines (3 > 0)  (empty-lines)

How did you start Harvest?
What do you see in /var/log/harvest/*
What does ps aux | grep poller show?
If you are using Prometheus, try hitting Harvest's Prometheus endpoint like so:

curl http://machine-this-is-running-harvest:prometheus-port-in-harvest-yaml/metrics

How do I start Harvest in debug mode?

Use the --debug flag when starting a poller. In debug mode, the poller will only collect metrics, but not write to databases. Another useful flag is --foreground, in which case all log messages are written to the terminal. Note that you can only start one poller in foreground mode.

Finally, you can use --loglevel=1 or --verbose, if you want to see a lot of log messages. For even more, you can use --loglevel=0 or --trace.

Example:

harvest start my_poller --foreground --debug --loglevel=0

which is equal to:

harvest start my_poller -fdt

How do I start Harvest in foreground mode?

See How do I start Harvest in debug mode?

How do I start my poller with only one collector?

Since a poller will start a large number of collectors (each collector-object pair is treated as a collector), it is often hard to find the issue you are looking for in the abundance of log messages. It might be therefore useful to start one single collector-object pair when troubleshooting. You can use the --collectors and --objects flags for that. For example, start only the ZapiPerf collector with the SystemNode object:

harvest start my_poller --collectors ZapiPerf --objects SystemNode

(To find to correct object name, check conf/COLLECTOR/default.yaml file of the collector).

Errors in the log file

Some of my clusters are not showing up in Grafana

The logs show these errors:

context deadline exceeded (Client.Timeout or context cancellation while reading body)

and then for each volume

skipped instance [9c90facd-3730-48f1-b55c-afacc35c6dbe]: not found in cache

Workarounds

context deadline exceeded (Client.Timeout or context cancellation while reading body)

means Harvest is timing out when talking to your cluster. This sometimes happens when you have a large number of resources (e.g. volumes).

There are a few parameters that you can change to avoid this from happening. You can do this by editing the subtemplate of the resource affected. E.g. you can add the parameters in conf/zapiperf/cdot/9.8.0/volume.yaml or conf/zapi/cdot/9.8.0/volume.yaml. If the errors happen for most of the resources, you can add them in the main template of the collector (conf/zapi/default.yaml or conf/zapiperf/default.yaml) to apply it on all objects.

`client_timeout`

Increase the client_timeout value by adding a client_timeout line at the beginning of the template, like so:

# increase the timeout to 60 seconds
client_timeout: 60

`batch_size`

Decrease the batch_size value by adding a batch_size line at the beginning of the template. The default value of this parameter is 500. By decreasing it, the collector will fetch less instances during each API request. Example:

# decrease number of instances to 200 for each API request
batch_size: 200

`schedule`

If nothing else helps, you can increase the data poll interval of the collector (default is 60s for ZapiPerf and 180s for Zapi). You can do this either by adding a schedule attribute to the template or, if it already exists, by changing the - data line.

Example for ZapiPerf:

# increase data poll frequency to 2 minutes
schedule:
  - counter: 1200s
  - instance: 600s
  - data: 120s

Example for Zapi:

# increase data poll frequency to 5 minutes
schedule:
  - instance: 600s
  - data: 300s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting Harvest

Checklists for Harvest

What version of ONTAP do you have?

Install fails

How do I tell if Harvest is doing anything?

How do I start Harvest in debug mode?

How do I start Harvest in foreground mode?

How do I start my poller with only one collector?

Errors in the log file

Some of my clusters are not showing up in Grafana

Workarounds

`client_timeout`

`batch_size`

`schedule`

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally