Skip to content

Troubleshooting Harvest

Chris Grindstaff edited this page Jul 23, 2021 · 28 revisions

Checklists for Harvest

A set of steps to go through when something goes wrong.

What version of ONTAP do you have?

Run the following, replacing <poller> with the poller from your harvest.yaml

./bin/harvest zapi -p <poller> show system

Copy and paste the output into your issue. Here's an example:

./bin/harvest --config -p infinity show system
connected to infinity (NetApp Release 9.8P2: Tue Feb 16 03:49:46 UTC 2021)
[results]                             -                                   *
  [build-timestamp]                   -                          1613447386
  [is-clustered]                      -                                true
  [version]                           - NetApp Release 9.8P2: Tue Feb 16 03:49:46 UTC 2021
  [version-tuple]                     -                                   *
    [system-version-tuple]            -                                   *
      [generation]                    -                                   9
      [major]                         -                                   8
      [minor]                         -                                   0

Install fails

I tried to install and ...

How do I tell if Harvest is doing anything?

You believe Harvest is installed fine, but it's not working.

  • Post the contents of your harvest.yml

Try validating your harvest.yml with yamllint like so: yamllint -d relaxed harvest.yml If you do not have yamllint installed, look here.

There should be no errors - warnings like the following are fine:

harvest.yml
  64:1      warning  too many blank lines (3 > 0)  (empty-lines)
  • How did you start Harvest?

  • What do you see in /var/log/harvest/*

  • What does ps aux | grep poller show?

  • If you are using Prometheus, try hitting Harvest's Prometheus endpoint like so:

curl http://machine-this-is-running-harvest:prometheus-port-in-harvest-yaml/metrics

How do I start Harvest in debug mode?

Use the --debug flag when starting a poller. In debug mode, the poller will only collect metrics, but not write to databases. Another useful flag is --foreground, in which case all log messages are written to the terminal. Note that you can only start one poller in foreground mode.

Finally, you can use --loglevel=1 or --verbose, if you want to see a lot of log messages. For even more, you can use --loglevel=0 or --trace.

Example:

harvest start my_poller --foreground --debug --loglevel=0

which is equal to:

harvest start my_poller -fdt

How do I start Harvest in foreground mode?

See How do I start Harvest in debug mode?

How do I start my poller with only one collector?

Since a poller will start a large number of collectors (each collector-object pair is treated as a collector), it is often hard to find the issue you are looking for in the abundance of log messages. It might be therefore useful to start one single collector-object pair when troubleshooting. You can use the --collectors and --objects flags for that. For example, start only the ZapiPerf collector with the SystemNode object:

harvest start my_poller --collectors ZapiPerf --objects SystemNode

(To find to correct object name, check conf/COLLECTOR/default.yaml file of the collector).

Errors in the log file

Some of my clusters are not showing up in Grafana

The logs show these errors:

context deadline exceeded (Client.Timeout or context cancellation while reading body)

and then for each volume

skipped instance [9c90facd-3730-48f1-b55c-afacc35c6dbe]: not found in cache

Workaround

context deadline exceeded (Client.Timeout or context cancellation while reading body)

means Harvest is timing out when talking to your cluster. This sometimes happens when you have a large number of resources (e.g. volumes).

You can increase Harvest's client_timeout by editing conf/zapi/default.yaml and adding a client_timeout line around line 9, like so:

# increase the timeout to 60 seconds
client_timeout: 60

Clone this wiki locally