stats/opentelemetry: Introduce Tracing API#7852
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7852 +/- ##
==========================================
+ Coverage 82.09% 82.25% +0.16%
==========================================
Files 379 387 +8
Lines 38261 39041 +780
==========================================
+ Hits 31409 32114 +705
- Misses 5549 5595 +46
- Partials 1303 1332 +29
|
|
@aranjans please update this with latest main branch to get rid of all go.mod and go.sum changes |
378ddd3 to
8cb8222
Compare
|
@purnesh42H Thanks for your review, I have addressed all your comments and this PR is ready for another pass. |
purnesh42H
left a comment
There was a problem hiding this comment.
Some more non-test comments. Will review tests in next pass
purnesh42H
left a comment
There was a problem hiding this comment.
Some more non-test comments. Will review tests in next pass
|
@aranjans you should link the gRFC and concise your description |
|
@purnesh42H I have addressed all the comments, and updated the description to link the grfc proposal. |
|
@purnesh42H I have addressed all your comments, and updated the PR. Kindly review the PR. |
| * limitations under the License. | ||
| */ | ||
|
|
||
| package experimental |
There was a problem hiding this comment.
The directory name is opentelemetry so the package name should match.
This package should clearly indicate that it will be removed, with the contents moved to stats/opentelemetry, and should also indicate when that will happen.
| } | ||
| err := invoker(ctx, method, req, reply, cc, opts...) | ||
| h.perCallMetrics(ctx, err, startTime, ci) | ||
| h.perCallTracesAndMetrics(ctx, err, startTime, ci, span) |
There was a problem hiding this comment.
How much code & data is shared between the tracing and metrics implementations? I don't believe they share almost anything?
Would it make sense to implement a separate set of interceptors for tracing & metrics so that we can avoid checking to see which one is implemented for every event in the life of an RPC? I think the code would end up simpler/cleaner/easier to review that way, too.
There was a problem hiding this comment.
@dfawley they only share the DialOption and ServerOption as per proposal. Are you suggesting to have 4 interceptors internally? Both stream and unary each for metrics and traces? If/When we add logging, we will add 2 more then. But yeah we will have separate path for each and then don't have to check if the other one is implemented as well.
There was a problem hiding this comment.
Yes, I think it would helpful to have separate handlers for the different functionality, since they share nothing in common and can be enabled/disabled independently.
| if h.options.isTracingEnabled() && ts != nil { | ||
| s := status.Convert(err) | ||
| if s.Code() == grpccodes.OK { | ||
| (*ts).SetStatus(otelcodes.Ok, s.Message()) |
There was a problem hiding this comment.
Does ts.SetStatus not work? Go should auto-deref.
There was a problem hiding this comment.
No, it does not auto dereference them.
There was a problem hiding this comment.
Oh, that's because trace.Span is an interface. Why are you passing it by pointer? That seems wrong.
| return ctx, nil | ||
| } | ||
|
|
||
| mn := "Attempt." + strings.Replace(removeLeadingSlash(rti.FullMethodName), "/", ".", -1) |
There was a problem hiding this comment.
attemptInfo appears to already have the method name without the leading slash. Use that one instead?
| if h.options.isTracingEnabled() && ts != nil { | ||
| s := status.Convert(err) | ||
| if s.Code() == grpccodes.OK { | ||
| (*ts).SetStatus(otelcodes.Ok, s.Message()) |
There was a problem hiding this comment.
Oh, that's because trace.Span is an interface. Why are you passing it by pointer? That seems wrong.
| // span into gRPC Metadata, if TextMapPropagator is provided in the trace | ||
| // options. if TextMapPropagator is not provided, it returns the context as is. | ||
| func (h *clientStatsHandler) traceTagRPC(ctx context.Context, rti *stats.RPCTagInfo, ai *attemptInfo) (context.Context, *attemptInfo) { | ||
| mn := "Attempt." + strings.Replace(removeLeadingSlash(ai.method), "/", ".", -1) |
There was a problem hiding this comment.
ai.method already had the leading slash removed, so remove the call to attempt it again?
Oh wait.. it looks like it's removed in the server stats handler but not the client stats handler?
Is that intentional or just a mistake?
There was a problem hiding this comment.
Ah I see, it got missed. I have corrected it now, PTAL.
| // message id - "must be calculated as two different counters starting | ||
| // from one for sent messages and one for received messages." | ||
| mi := ai.countRecvMsg + 1 | ||
| ai.countRecvMsg = ai.countRecvMsg + 1 |
| case *stats.InPayload: | ||
| // message id - "must be calculated as two different counters starting | ||
| // from one for sent messages and one for received messages." | ||
| mi := ai.countRecvMsg + 1 |
There was a problem hiding this comment.
Why? Just read ai.countRecvMsg later directly?
| mi := ai.countSentMsg + 1 | ||
| ai.countSentMsg = ai.countSentMsg + 1 |
|
@dfawley I have addressed all your comments including the nits. PTAL. |
| } | ||
|
|
||
| func (o *Options) isTracingEnabled() bool { | ||
| return o.TraceOptions.TextMapPropagator != nil |
There was a problem hiding this comment.
This is the same topic as this other comment thread, right?:
dfawley
left a comment
There was a problem hiding this comment.
Just a few more small things otherwise LGTM
| case *stats.End: | ||
| if rs.Error != nil { | ||
| s := status.Convert(rs.Error) | ||
| span.SetStatus(otelcodes.Error, s.Message()) |
There was a problem hiding this comment.
It seems there is missing coverage for this (RPC error) and PickerUpdated - could you add test cases for them, please?
There was a problem hiding this comment.
@dfawley I have added the test for rpc error, PTAL.
| } | ||
|
|
||
| // Log an error if one of the options is missing. | ||
| if (h.options.TraceOptions.TextMapPropagator == nil) != (h.options.TraceOptions.TracerProvider == nil) { |
There was a problem hiding this comment.
This check should happen in the constructor for the otel plugin. Not on every RPC.
| o.MetricsOptions.pluginOption = po | ||
| // Log an error if one of the options is missing. | ||
| if (o.TraceOptions.TextMapPropagator == nil) != (o.TraceOptions.TracerProvider == nil) { | ||
| logger.Error("traceOptions are not set properly: one of TextMapPropagator or TracerProvider is missing.") |
There was a problem hiding this comment.
This pull request adds the OpenTelemetry tracing API to the grpc-go opentelemetry plugin as outlined in proposal A72.
RELEASE NOTES:
stats/opentelemetry/experimental. This includes the addition ofTraceOptionsin theOptionsstruct to allow users to specify theTraceProviderandTextMapPropagator.