mirror of
https://github.com/klzgrad/naiveproxy.git
synced 2024-11-28 00:06:09 +03:00
229 lines
11 KiB
Markdown
229 lines
11 KiB
Markdown
|
# Chrome Network Bug Triage : Suggested Workflow
|
||
|
|
||
|
[TOC]
|
||
|
|
||
|
## Identifying unlabeled network bugs on the tracker
|
||
|
|
||
|
* Look at new unconfirmed bugs since noon PST on the last triager's rotation.
|
||
|
[Use this issue tracker
|
||
|
query](https://bugs.chromium.org/p/chromium/issues/list?q=status%3Aunconfirmed&sort=-id&num=1000).
|
||
|
|
||
|
* Read the title of the bug.
|
||
|
|
||
|
* If a bug looks like it might be network related, middle click (or
|
||
|
command-click on OSX) to open it in a new tab.
|
||
|
|
||
|
* If a user provides a crash ID for a crasher for a bug that could be
|
||
|
net-related, look at the crash stack at
|
||
|
[go/crash](https://goto.google.com/crash), and see if it looks to be network
|
||
|
related. Be sure to check if other bug reports have that stack trace, and
|
||
|
mark as a dupe if so. Even if the bug isn't network related, paste the stack
|
||
|
trace in the bug, so no one else has to look up the crash stack from the ID.
|
||
|
* If there's just a blank form and a crash ID, just ignore the bug.
|
||
|
|
||
|
* If network causes are possible, ask for a net-internals log (If it's not a
|
||
|
browser crash) and attach the most specific internals-network label that's
|
||
|
applicable. If there isn't an applicable narrower component, a clear owner
|
||
|
for the issue, or there are multiple possibilities, attach the
|
||
|
Internals>Network component and proceed with further investigation.
|
||
|
|
||
|
* If non-network causes also seem possible, attach those components as well.
|
||
|
|
||
|
## Investigate UMA notifications
|
||
|
|
||
|
For each alert that fires, determine if it's a real alert and file a bug if so.
|
||
|
|
||
|
* Don't file if the alert is coincident with a major volume change. The volume
|
||
|
at a particular date can be determined by hovering the mouse over the
|
||
|
appropriate location on the alert line.
|
||
|
|
||
|
* Don't file if the alert is on a graph with very low volume (< ~200 data
|
||
|
points); it's probably noise, and we probably don't care even if it isn't.
|
||
|
|
||
|
* Don't file if the graph is really noisy (but eyeball it to decide if there is
|
||
|
an underlying important shift under the noise).
|
||
|
|
||
|
* Don't file if the alert is in the "Known Ignorable" list:
|
||
|
* SimpleCache on Windows
|
||
|
* DiskCache on Android.
|
||
|
|
||
|
## Investigating component=Internals>Network bugs
|
||
|
|
||
|
* Note that you may want to investigate Needs-Feedback bugs first, as
|
||
|
that may result in some bugs being added to this list.
|
||
|
|
||
|
* It's recommended that while on triage duty, you subscribe to the
|
||
|
Internals>Network component (but not its subcomponents). To do this, go
|
||
|
to the issue tracker and then click "Saved Queries".
|
||
|
Add a query with these settings:
|
||
|
* Saved query name: Network Bug Triage
|
||
|
* Project: chromium
|
||
|
* Query: component=Internals>Network
|
||
|
* Subscription options: Notify Immediately
|
||
|
|
||
|
* Look through unconfirmed and untriaged component=Internals>Network bugs,
|
||
|
prioritizing those updated within the last week. [Use this issue tracker
|
||
|
query](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=component%3DInternals%3ENetwork+status%3AUnconfirmed,Untriaged+-label:Needs-Feedback&sort=-modified).
|
||
|
|
||
|
* If more information is needed from the reporter, ask for it and add the
|
||
|
Needs-Feedback label.
|
||
|
|
||
|
* While investigating a new issue, change the status to Untriaged.
|
||
|
|
||
|
* If a bug is a potential security issue (Allows for code execution from remote
|
||
|
site, allows crossing security boundaries, unchecked array bounds, etc) mark
|
||
|
it Type-Bug-Security. If it has privacy implication (History, cookies
|
||
|
discoverable by an entity that shouldn't be able to do so, incognito state
|
||
|
being saved in memory or on disk beyond the lifetime of incognito tabs, etc),
|
||
|
mark it with component Privacy.
|
||
|
|
||
|
* For bugs that already have a more specific network component, go ahead and
|
||
|
remove the Internals>Network component to get them off the next triager's
|
||
|
radar and move on.
|
||
|
|
||
|
* Try to figure out if it's really a network bug. See common non-network
|
||
|
components section for description of common components for issues incorrectly
|
||
|
tagged as Internals>Network.
|
||
|
|
||
|
* If it's not, attach appropriate labels/components and go no further.
|
||
|
|
||
|
* If it may be a network bug, attach additional possibly relevant component if
|
||
|
any, and continue investigating. Once you either determine it's a
|
||
|
non-network bug, or figure out accurate more specific network components, your
|
||
|
job is done, though you should still ask for a net-internals dump if it seems
|
||
|
likely to be useful.
|
||
|
|
||
|
* Note that Chrome-OS-specific network-related code (Captive portal detection,
|
||
|
connectivity detection, login, etc) may not all have appropriate more
|
||
|
specific subcomponents, but are not in areas handled by the network stack
|
||
|
team. Just make sure those have the OS-Chrome label, and any more specific
|
||
|
labels if applicable, and then move on.
|
||
|
|
||
|
* Gather data and investigate.
|
||
|
* Remember to add the Needs-Feedback label whenever waiting for the user to
|
||
|
respond with more information, and remove it when not waiting on the
|
||
|
user.
|
||
|
* Try to reproduce locally. If you can, and it's a regression, use
|
||
|
src/tools/bisect-builds.py to figure out when it regressed.
|
||
|
* Ask more data from the user as needed (net-internals dumps, repro case,
|
||
|
crash ID from about:crashes, run tests, etc).
|
||
|
* If asking for an about:net-internals dump, provide this link:
|
||
|
https://sites.google.com/a/chromium.org/dev/for-testers/providing-network-details.
|
||
|
Can just grab the link from about:net-internals, as needed.
|
||
|
|
||
|
* Try to figure out what's going on, and which more specific network component
|
||
|
is most appropriate.
|
||
|
|
||
|
* If it's a regression, browse through the git history of relevant files to try
|
||
|
and figure out when it regressed. CC authors / primary reviewers of any
|
||
|
strongly suspect CLs.
|
||
|
|
||
|
* If you are having trouble with an issue, particularly for help understanding
|
||
|
net-internals logs, email the public net-dev@chromium.org list for help
|
||
|
debugging. If it's a crasher, or for some other reason discussion needs to
|
||
|
be done in private, use chrome-network-debugging@google.com. TODO(mmenke):
|
||
|
Write up a net-internals tips and tricks docs.
|
||
|
|
||
|
* If it appears to be a bug in the unowned core of the network stack (i.e. no
|
||
|
subcomponent applies, or only the Internals>Network>HTTP subcomponent
|
||
|
applies, and there's no clear owner), try to figure out the exact cause.
|
||
|
|
||
|
## Looking for new crashers
|
||
|
|
||
|
1. Go to [go/chromecrash](https://goto.google.com/chromecrash).
|
||
|
|
||
|
2. For each platform, look through the releases for which releases to
|
||
|
investigate. As per [bug-triage.md](bug-triage.md), this should be the most
|
||
|
recent canary, the previous canary (if the most recent is less than a day
|
||
|
old), and any of dev/beta/stable that were released in the last couple of
|
||
|
days.
|
||
|
|
||
|
3. For each release, in the "Process Type" frame, click on "browser".
|
||
|
|
||
|
4. At the bottom of the "Magic Signature" frame, click "limit 1000" (Or reduce
|
||
|
the limit to 100 first, as that's all the triager needs to look at).
|
||
|
Reported crashers are sorted in decreasing order of the number of reports for
|
||
|
that crash signature.
|
||
|
|
||
|
5. Search the page for *"net::"*.
|
||
|
|
||
|
6. For each found signature:
|
||
|
* Ignore signatures that only occur once or twice, as memory corruption can
|
||
|
easily cause one-off failures when the sample size is large enough. Also
|
||
|
ignore crashers that are not in the top 100 for that platform / release.
|
||
|
* If there is a bug already filed, make sure it is correctly describing the
|
||
|
current bug (e.g. not closed, or not describing a long-past issue), and
|
||
|
make sure that if it is a *net* bug, that it is labeled as such.
|
||
|
* Ignore signatures that only come from one or two client IDs, as individual
|
||
|
machine malware and breakage can cause one-off failures.
|
||
|
* Click on the number of reports field to see details of crash. Ignore it
|
||
|
if it doesn't appear to be a network bug.
|
||
|
* Otherwise, file a new bug directly from chromecrash.
|
||
|
* For each bug you file, include the following information:
|
||
|
* The backtrace. Note that the backtrace should not be added to the
|
||
|
bug if Restrict-View-Google isn't set on the bug as it may contain
|
||
|
PII. Filing the bug from the crash reporter should do this
|
||
|
automatically, but check.
|
||
|
* The channel in which the bug is seen (canary/dev/beta/stable), and its
|
||
|
rank among crashers in the channel.
|
||
|
* The frequency of this signature in recent releases. This information
|
||
|
is available by:
|
||
|
1. Clicking on the signature in the "Magic Signature" list
|
||
|
2. Clicking "Edit" on the dremel query at the top of the page
|
||
|
3. Removing the "product.version='X.Y.Z.W' AND" string and clicking
|
||
|
"Update".
|
||
|
4. Clicking "Limit 1000" in the Product Version list in the
|
||
|
resulting page (without this, the listing will be restricted to
|
||
|
the releases in which the signature is most common, which will
|
||
|
often not include the canary/dev release being investigated).
|
||
|
5. Choose some subset of that list, or all of it, to include in the
|
||
|
bug. Make sure to indicate if there is a defined point in the
|
||
|
past before which the signature is not present.
|
||
|
|
||
|
As an alternative to the above, you can use [Eric Roman's new crash
|
||
|
tool](https://ericroman.users.x20web.corp.google.com/www/net-crash-triage/index.html)
|
||
|
(internal link). Note that it isn't a perfect fit with the triage
|
||
|
responsibilities, specifically:
|
||
|
|
||
|
* It's only showing Windows releases; Android, iOS, and WebView are
|
||
|
usually different, and Mac is sometimes different.
|
||
|
* The instructions are to look at the latest canary which has a days
|
||
|
worth of data. If canaries are being pushed fast, that may be more
|
||
|
than one canary into the past, and hence not visible on the tool.
|
||
|
* Eric's tool filters based on files in "src/net" rather than looking
|
||
|
for magic signature's including the string "net::" ("src/net" is
|
||
|
probably the better filter).
|
||
|
|
||
|
## Investigating crashers
|
||
|
|
||
|
* Only investigate crashers that are still occurring, as identified by above
|
||
|
section. If a search on go/crash indicates a crasher is no longer occurring,
|
||
|
mark it as WontFix.
|
||
|
|
||
|
* On Windows, you may want to look for weird dlls associated with the crashes.
|
||
|
This generally needs crashes from a fair number of different users to reach
|
||
|
any conclusions.
|
||
|
* To get a list of loaded modules in related crash dumps, select
|
||
|
modules->3rd party in the left pane. It can be difficult to distinguish
|
||
|
between safe dlls and those likely to cause problems, but even if you're
|
||
|
not that familiar with windows, some may stick out. Anti-virus programs,
|
||
|
download managers, and more gray hat badware often have meaningful dll
|
||
|
names or dll paths (Generally product names or company names). If you
|
||
|
see one of these in a significant number of the crash dumps, it may well
|
||
|
be the cause.
|
||
|
* You can also try selecting the "has malware" option, though that's much
|
||
|
less reliable than looking manually.
|
||
|
|
||
|
* See if the same users are repeatedly running into the same issue. This can
|
||
|
be accomplished by search for (Or clicking on) the client ID associated with
|
||
|
a crash report, and seeing if there are multiple reports for the same crash.
|
||
|
If this is the case, it may be also be malware, or an issue with an unusual
|
||
|
system/chrome/network config.
|
||
|
|
||
|
* Dig through crash reports to figure out when the crash first appeared, and
|
||
|
dig through revision history in related files to try and locate a suspect CL.
|
||
|
TODO(mmenke): Add more detail here.
|
||
|
|
||
|
* Load crash dumps, try to figure out a cause. See
|
||
|
http://www.chromium.org/developers/crash-reports for more information
|