Skip to content

stats/opentelemetry: Introduce Tracing API#7852

Merged
purnesh42H merged 31 commits into
grpc:masterfrom
aranjans:a72
Jan 30, 2025
Merged

stats/opentelemetry: Introduce Tracing API#7852
purnesh42H merged 31 commits into
grpc:masterfrom
aranjans:a72

Conversation

@aranjans

@aranjans aranjans commented Nov 18, 2024

Copy link
Copy Markdown
Contributor

This pull request adds the OpenTelemetry tracing API to the grpc-go opentelemetry plugin as outlined in proposal A72.

RELEASE NOTES:

  • stats/opentelemetry: Introduces an experimental API for enabling and configuring OpenTelemetry tracing within gRPC-go opentelemetry plugin under stats/opentelemetry/experimental. This includes the addition of TraceOptions in the Options struct to allow users to specify the TraceProvider and TextMapPropagator.

@aranjans aranjans added this to the 1.69 Release milestone Nov 18, 2024
@aranjans aranjans added Type: Feature New features or improvements in behavior Area: Observability Includes Stats, Tracing, Channelz, Healthz, Binlog, Reflection, Admin, GCP Observability labels Nov 18, 2024
@codecov

codecov Bot commented Nov 18, 2024

Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 88.97059% with 15 lines in your changes missing coverage. Please review.

Project coverage is 82.25%. Comparing base (56a14ba) to head (374b933).
Report is 46 commits behind head on master.

Files with missing lines Patch % Lines
stats/opentelemetry/trace.go 83.33% 7 Missing and 1 partial ⚠️
stats/opentelemetry/client_tracing.go 76.47% 3 Missing and 1 partial ⚠️
stats/opentelemetry/opentelemetry.go 62.50% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7852      +/-   ##
==========================================
+ Coverage   82.09%   82.25%   +0.16%     
==========================================
  Files         379      387       +8     
  Lines       38261    39041     +780     
==========================================
+ Hits        31409    32114     +705     
- Misses       5549     5595      +46     
- Partials     1303     1332      +29     
Files with missing lines Coverage Δ
stats/opentelemetry/client_metrics.go 89.39% <100.00%> (+1.46%) ⬆️
stats/opentelemetry/server_metrics.go 89.82% <100.00%> (+0.44%) ⬆️
stats/opentelemetry/server_tracing.go 100.00% <100.00%> (ø)
stats/opentelemetry/opentelemetry.go 75.00% <62.50%> (-0.87%) ⬇️
stats/opentelemetry/client_tracing.go 76.47% <76.47%> (ø)
stats/opentelemetry/trace.go 83.33% <83.33%> (ø)

... and 59 files with indirect coverage changes

@aranjans aranjans requested a review from purnesh42H November 18, 2024 08:22
@purnesh42H purnesh42H self-assigned this Nov 19, 2024
Comment thread examples/features/opentelemetry/client/main.go Outdated
Comment thread gcp/observability/go.mod Outdated
Comment thread clientconn.go Outdated
Comment thread interop/xds/go.sum Outdated
Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stats/opentelemetry/trace.go Outdated
Comment thread stats/opentelemetry/opentelemetry.go Outdated
@purnesh42H

Copy link
Copy Markdown
Contributor

@aranjans please update this with latest main branch to get rid of all go.mod and go.sum changes

@purnesh42H purnesh42H assigned aranjans and unassigned purnesh42H Nov 20, 2024
@aranjans aranjans force-pushed the a72 branch 3 times, most recently from 378ddd3 to 8cb8222 Compare November 21, 2024 17:44
@aranjans aranjans assigned purnesh42H and unassigned aranjans Nov 22, 2024
@aranjans

Copy link
Copy Markdown
Contributor Author

@purnesh42H Thanks for your review, I have addressed all your comments and this PR is ready for another pass.

@purnesh42H purnesh42H left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more non-test comments. Will review tests in next pass

Comment thread clientconn.go Outdated
Comment thread stats/handlers.go Outdated
Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stream.go Outdated
Comment thread stats/opentelemetry/client_metrics.go
Comment thread stats/opentelemetry/client_metrics.go
Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stats/opentelemetry/client_metrics.go
Comment thread stats/opentelemetry/client_metrics.go Outdated

@purnesh42H purnesh42H left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more non-test comments. Will review tests in next pass

@purnesh42H purnesh42H assigned aranjans and unassigned purnesh42H Nov 22, 2024
@purnesh42H purnesh42H changed the title Implement A72: OpenTelemetry Tracing stats/opentelemetry: Introduce Tracing API Nov 22, 2024
@purnesh42H

Copy link
Copy Markdown
Contributor

@aranjans you should link the gRFC and concise your description

@aranjans

aranjans commented Nov 23, 2024

Copy link
Copy Markdown
Contributor Author

@purnesh42H I have addressed all the comments, and updated the description to link the grfc proposal.
Feel free to close the thread which are resolved now.

@aranjans aranjans assigned purnesh42H and unassigned aranjans Nov 23, 2024
Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stats/opentelemetry/opentelemetry.go Outdated
Comment thread stats/opentelemetry/server_metrics.go Outdated
Comment thread stats/opentelemetry/trace.go Outdated
Comment thread stats/opentelemetry/trace.go Outdated
@aranjans

Copy link
Copy Markdown
Contributor Author

@purnesh42H I have addressed all your comments, and updated the PR. Kindly review the PR.

@purnesh42H purnesh42H requested a review from dfawley January 9, 2025 05:10
* limitations under the License.
*/

package experimental

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory name is opentelemetry so the package name should match.

This package should clearly indicate that it will be removed, with the contents moved to stats/opentelemetry, and should also indicate when that will happen.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment thread experimental/opentelemetry/trace_options.go
}
err := invoker(ctx, method, req, reply, cc, opts...)
h.perCallMetrics(ctx, err, startTime, ci)
h.perCallTracesAndMetrics(ctx, err, startTime, ci, span)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@purnesh42H

How much code & data is shared between the tracing and metrics implementations? I don't believe they share almost anything?

Would it make sense to implement a separate set of interceptors for tracing & metrics so that we can avoid checking to see which one is implemented for every event in the life of an RPC? I think the code would end up simpler/cleaner/easier to review that way, too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfawley they only share the DialOption and ServerOption as per proposal. Are you suggesting to have 4 interceptors internally? Both stream and unary each for metrics and traces? If/When we add logging, we will add 2 more then. But yeah we will have separate path for each and then don't have to check if the other one is implemented as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it would helpful to have separate handlers for the different functionality, since they share nothing in common and can be enabled/disabled independently.

Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stats/opentelemetry/client_metrics.go Outdated
if h.options.isTracingEnabled() && ts != nil {
s := status.Convert(err)
if s.Code() == grpccodes.OK {
(*ts).SetStatus(otelcodes.Ok, s.Message())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ts.SetStatus not work? Go should auto-deref.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it does not auto dereference them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's because trace.Span is an interface. Why are you passing it by pointer? That seems wrong.

Comment thread stats/opentelemetry/client_tracing.go Outdated
return ctx, nil
}

mn := "Attempt." + strings.Replace(removeLeadingSlash(rti.FullMethodName), "/", ".", -1)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attemptInfo appears to already have the method name without the leading slash. Use that one instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment thread stats/opentelemetry/experimental/grpc_trace_bin_propagator.go
Comment thread stats/opentelemetry/trace.go Outdated
Comment thread stats/opentelemetry/trace.go Outdated
Comment thread stats/opentelemetry/server_tracing.go Outdated
@dfawley dfawley assigned purnesh42H and aranjans and unassigned dfawley Jan 9, 2025
@aranjans aranjans assigned dfawley and purnesh42H and unassigned purnesh42H and aranjans Jan 13, 2025
Comment thread experimental/opentelemetry/trace_options.go Outdated
Comment thread stats/opentelemetry/client_metrics.go Outdated
if h.options.isTracingEnabled() && ts != nil {
s := status.Convert(err)
if s.Code() == grpccodes.OK {
(*ts).SetStatus(otelcodes.Ok, s.Message())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's because trace.Span is an interface. Why are you passing it by pointer? That seems wrong.

Comment thread stats/opentelemetry/client_tracing.go Outdated
// span into gRPC Metadata, if TextMapPropagator is provided in the trace
// options. if TextMapPropagator is not provided, it returns the context as is.
func (h *clientStatsHandler) traceTagRPC(ctx context.Context, rti *stats.RPCTagInfo, ai *attemptInfo) (context.Context, *attemptInfo) {
mn := "Attempt." + strings.Replace(removeLeadingSlash(ai.method), "/", ".", -1)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ai.method already had the leading slash removed, so remove the call to attempt it again?

Oh wait.. it looks like it's removed in the server stats handler but not the client stats handler?

Is that intentional or just a mistake?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, it got missed. I have corrected it now, PTAL.

Comment thread stats/opentelemetry/trace.go Outdated
// message id - "must be calculated as two different counters starting
// from one for sent messages and one for received messages."
mi := ai.countRecvMsg + 1
ai.countRecvMsg = ai.countRecvMsg + 1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ai.countRecvMsg++ ?

Comment thread stats/opentelemetry/trace.go Outdated
case *stats.InPayload:
// message id - "must be calculated as two different counters starting
// from one for sent messages and one for received messages."
mi := ai.countRecvMsg + 1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Just read ai.countRecvMsg later directly?

Comment thread stats/opentelemetry/trace.go Outdated
Comment on lines +68 to +69
mi := ai.countSentMsg + 1
ai.countSentMsg = ai.countSentMsg + 1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

@dfawley dfawley assigned aranjans and unassigned dfawley Jan 14, 2025
@aranjans aranjans removed their assignment Jan 14, 2025
@aranjans

Copy link
Copy Markdown
Contributor Author

@dfawley I have addressed all your comments including the nits. PTAL.

Comment thread stats/opentelemetry/opentelemetry.go Outdated
}

func (o *Options) isTracingEnabled() bool {
return o.TraceOptions.TextMapPropagator != nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfawley @aranjans did we decide to remove checking for TracerProvider? Both are mandatory.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same topic as this other comment thread, right?:

#7852 (comment)

@dfawley dfawley left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more small things otherwise LGTM

Comment thread experimental/opentelemetry/trace_options.go Outdated
Comment thread stats/opentelemetry/client_tracing.go Outdated
Comment thread stats/opentelemetry/opentelemetry.go Outdated
Comment thread stats/opentelemetry/server_tracing.go Outdated
Comment thread experimental/opentelemetry/trace_options.go Outdated
Comment thread experimental/opentelemetry/trace_options.go Outdated
case *stats.End:
if rs.Error != nil {
s := status.Convert(rs.Error)
span.SetStatus(otelcodes.Error, s.Message())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is missing coverage for this (RPC error) and PickerUpdated - could you add test cases for them, please?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfawley I have added the test for rpc error, PTAL.

Comment thread stats/opentelemetry/client_metrics.go Outdated
}

// Log an error if one of the options is missing.
if (h.options.TraceOptions.TextMapPropagator == nil) != (h.options.TraceOptions.TracerProvider == nil) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should happen in the constructor for the otel plugin. Not on every RPC.

Comment thread stats/opentelemetry/client_metrics.go Outdated
Comment thread stats/opentelemetry/e2e_test.go
Comment thread stats/opentelemetry/opentelemetry.go Outdated
o.MetricsOptions.pluginOption = po
// Log an error if one of the options is missing.
if (o.TraceOptions.TextMapPropagator == nil) != (o.TraceOptions.TracerProvider == nil) {
logger.Error("traceOptions are not set properly: one of TextMapPropagator or TracerProvider is missing.")

@purnesh42H purnesh42H Jan 28, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aranjans How about "Tracing will not be recorded because traceOptions are not set properly: one of TextMapPropagator or TracerProvider is missing"? Also, may be it should be warning instead of error because we are only treating it as no-op for tracing but continuing with setup? @dfawley cc

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Area: Observability Includes Stats, Tracing, Channelz, Healthz, Binlog, Reflection, Admin, GCP Observability Type: Feature New features or improvements in behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants