-
Notifications
You must be signed in to change notification settings - Fork 55
Troubleshooting Harvest
A set of steps to go through when something goes wrong.
Run the following, replacing <poller> with the poller from your harvest.yaml
./bin/harvest zapi -p <poller> show system
Copy and paste the output into your issue. Here's an example:
./bin/harvest --config -p infinity show system
connected to infinity (NetApp Release 9.8P2: Tue Feb 16 03:49:46 UTC 2021)
[results] - *
[build-timestamp] - 1613447386
[is-clustered] - true
[version] - NetApp Release 9.8P2: Tue Feb 16 03:49:46 UTC 2021
[version-tuple] - *
[system-version-tuple] - *
[generation] - 9
[major] - 8
[minor] - 0
I tried to install and ...
You believe Harvest is installed fine, but it's not working.
- Post the contents of your
harvest.yml
Try validating your harvest.yml with yamllint like so: yamllint -d relaxed harvest.yml
If you do not have yamllint installed, look here.
There should be no errors - warnings like the following are fine:
harvest.yml
64:1 warning too many blank lines (3 > 0) (empty-lines)
-
How did you start Harvest?
-
What do you see in
/var/log/harvest/* -
What does
ps aux | grep pollershow? -
If you are using Prometheus, try hitting Harvest's Prometheus endpoint like so:
curl http://machine-this-is-running-harvest:prometheus-port-in-harvest-yaml/metrics
Use the --debug flag when starting a poller. In debug mode, the poller will only collect metrics, but not write to databases. Another useful flag is --foreground, in which case all log messages are written to the terminal. Note that you can only start one poller in foreground mode.
Finally, you can use --loglevel=1 or --verbose, if you want to see a lot of log messages. For even more, you can use --loglevel=0 or --trace.
Example:
harvest start my_poller --foreground --debug --loglevel=0
which is equal to:
harvest start my_poller -fdt
See How do I start Harvest in debug mode?
Since a poller will start a large number of collectors (each collector-object pair is treated as a collector), it is often hard to find the issue you are looking for in the abundance of log messages. It might be therefore useful to start one single collector-object pair when troubleshooting. You can use the --collectors and --objects flags for that. For example, start only the ZapiPerf collector with the SystemNode object:
harvest start my_poller --collectors ZapiPerf --objects SystemNode
(To find to correct object name, check conf/COLLECTOR/default.yaml file of the collector).
The logs show these errors:
context deadline exceeded (Client.Timeout or context cancellation while reading body)
and then for each volume
skipped instance [9c90facd-3730-48f1-b55c-afacc35c6dbe]: not found in cache
context deadline exceeded (Client.Timeout or context cancellation while reading body)
means Harvest is timing out when talking to your cluster. This sometimes happens when you have a large number of resources (e.g. volumes).
You can increase Harvest's client_timeout by editing conf/zapi/default.yaml and adding a client_timeout line around line 9, like so:
# increase the timeout to 60 seconds
client_timeout: 60