Closed Bug 1087605 Opened 10 years ago Closed 10 years ago

WebRTC video freezes after 5mins

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

()

VERIFIED FIXED
mozilla36
blocking-b2g 2.1+
Tracking Status
firefox34 --- fixed
firefox35 --- fixed
firefox36 --- fixed
b2g-v2.1 --- verified
b2g-v2.2 --- verified

People

(Reporter: jaywang, Assigned: jesup)

References

Details

Attachments

(20 files, 5 obsolete files)

3.14 MB, application/x-zip-compressed
Details
3.66 KB, patch
Details | Diff | Splinter Review
9.34 KB, patch
Details | Diff | Splinter Review
6.12 KB, patch
Details | Diff | Splinter Review
1.79 KB, text/plain
Details
1.06 MB, application/x-zip-compressed
Details
2.75 MB, application/x-zip-compressed
Details
15.18 KB, text/plain
Details
16.99 KB, patch
Details | Diff | Splinter Review
27.78 KB, patch
Details | Diff | Splinter Review
2.09 KB, patch
Details | Diff | Splinter Review
183.01 KB, text/plain
Details
5.25 KB, text/plain
Details
5.02 KB, text/plain
Details
439.02 KB, text/plain
Details
49.32 KB, text/plain
Details
7.08 KB, text/plain
Details
1.79 KB, patch
Details | Diff | Splinter Review
1.25 KB, patch
bwc
: review+
Details | Diff | Splinter Review
1.87 KB, patch
Details | Diff | Splinter Review
Attached file webrtc.zip
[Blocking Requested - why for this release]:

Test case:
- Enable the webRTC CSF logging
- Start the webRTC video call
- Let it run for 5 mins
- Observe that the video freezes

I looks like the encoded frame was re-packed to different size before getting to decoder and causes the freeze.

Here, there are four encoded frame sends from the 8x10 device with v2.0. It has the size of 1214, 1053, 962, 992, respectively. When the frame gets to 8x26 device with v2.1, the size becomes 1214, 1053, 975 and 1007 instead

For some reason the last two frame size was altered. The happens occasionally throughout the call duration. This problem was only seen on v2.1 decoding path not on the v2.0.

You can see this from the attached log. The log was captured from two devices simultaneously during the webRTC call.
 
8x26_v2.1.txt: 8x26 with v2.1 SW
8x10_v2.0.txt: 8x10 with v2.0 SW

I am suspecting that the fix for https://bugzilla.mozilla.org/show_bug.cgi?id=1068394 was not completely fixing the issue or having some corner case that wasn't taken care.

Here is the partial log that shows the discrepancy of frame size between 8x10 encoder output and 8x26 decoder input.

I/PRLog   ( 1444): 2014-10-22 20:19:53.318228 UTC - 11009168[af268b00]: [OMXOutputDrain|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:699: Encoded frame: 1214 bytes, 240x320, is_param 0, is_iframe 0, timestamp 946415800, captureTimeMs 0
I/PRLog   ( 1444): 2014-10-22 20:19:53.354942 UTC - 11009168[af268b00]: [OMXOutputDrain|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:699: Encoded frame: 1053 bytes, 240x320, is_param 0, is_iframe 0, timestamp 946418860, captureTimeMs 0
I/PRLog   ( 1444): 2014-10-22 20:19:53.387776 UTC - 11009168[af268b00]: [OMXOutputDrain|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:699: Encoded frame: 962 bytes, 240x320, is_param 0, is_iframe 0, timestamp 946421560, captureTimeMs 0
I/PRLog   ( 1444): 2014-10-22 20:19:53.409812 UTC - 11009168[af268b00]: [OMXOutputDrain|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:699: Encoded frame: 992 bytes, 240x320, is_param 0, is_iframe 0, timestamp 946424530, captureTimeMs 0


I/PRLog   ( 4179): 2014-10-22 20:19:54.195645 UTC - 20090448[b122f700]: [DecodingThread|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:403: Decoder input: 1214 bytes (NAL 0x41), time 12705511966 (1143496077), flags 0x0
I/PRLog   ( 4179): 2014-10-22 20:19:54.223383 UTC - 20090448[b122f700]: [DecodingThread|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:403: Decoder input: 1053 bytes (NAL 0x41), time 12705545966 (1143499137), flags 0x0
I/PRLog   ( 4179): 2014-10-22 20:19:54.329301 UTC - 20090448[b122f700]: [DecodingThread|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:403: Decoder input: 975 bytes (NAL 0x41), time 12705705966 (1143513537), flags 0x0
I/PRLog   ( 4179): 2014-10-22 20:19:54.351217 UTC - 20090448[b122f700]: [DecodingThread|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:403: Decoder input: 1007 bytes (NAL 0x41), time 12705737966 (1143516417), flags 0x0
Flags: needinfo?(rjesup)
I looked carefully at this, and am still looking.

Using OpenH264, I never see a disparity on sizes.

With OMX, there are two main things going on that can affect the size: on Encode(), we call getNextNALUnit() to separately send each NAL.  This will grab NALs, and may strip extra/long start codes (like 16 0's followed by a 1).  All of these appear to be a single NAL for the frame, and most are smaller than the packet size, so there's no FU-A breaking and reassembly.

Data transferred in RTP has no start codes; they're re-inserted at the receiving end.

After start-code insertion, it should be passed to the decoder, which again (in 2.1) uses getNextNALUnit() to break apart multiple NALs - normally this will occur only on iframes, and the frames where we're seeing this are not iframes.

I'll check more closely into the size generation for reception, but other than start-code insertion it should be straightforward.  

Is this only seen in 2.0->2.1 calls, or 2.1->2.1?
Flags: needinfo?(rjesup)
> Using OpenH264, I never see a disparity on sizes.

Note: that's desktop, m-c -> m-c, though there's no real difference between m-c and beta/34/2.1 here I believe.
> 
> Is this only seen in 2.0->2.1 calls, or 2.1->2.1?

The problem is seen on both 2.0->2.1 and 2.1->2.1.
Ok, in my self-call test, I never see the size disparities you're seeing, so we may still need to look at it.

However, on a Flame running beta (very close to b2g34 still), I see a build up of in-decode frames.  Eventually, the circular map of 30 entries in generic_encoder.cc (_timestampMap) overflows, and newly-decoded frames can't be found in it, and they get dropped on the floor (causing all video output to cease).  (And you can see increasing delay as well)

The cause of the decode buildup is the important part:
At the start of the call, each frame is decoded and rendered almost immediately.  The sequence is:

1) Start decode of frame X: mCodec->queueInputBuffer( ... <timestamp N> )
2) Find that frame X was decoded: mCodec->dequeueOutputBuffer(... <timestamp N> ...)
3) Ask OMX codec to render it in a buffer and call us back:mCodec->renderOutputBufferAndRelease(index);
4) Codec calls OnNewFrame():
  // Will be called when MediaCodec::RenderOutputBufferAndRelease() returns
  // buffers back to native window for rendering.
  void OnNewFrame() {...}
5) OnNewFrame passes the frame up to generic_encoder's callback.

So all that happened very quickly to start.  As time went by, the delay before 4 happens (OnNewFrame callback) gets longer, and longer.  Eventually it's happening with more than 30 other frames queued, and at that point the map in generic_encoder overflows and video stops.

The current log I have doesn't have timestamps, but you can see the frames coming and going between #3 and #4, and in lines in my log OnNewFrame's log goes from being the next line after #2/3's log to being 2500 lines later in the log (and that's just logging some PRlog and jitter buffer stuff, with all b2g OS/etc stuff filtered out.)  There were 31 "Decoder output" logs between them, which matches the analysis.

Sotaro: any idea what could cause this?  (Also adding jhlin in case he has any thoughts)  Why would this be different in 2.1?  (and we should verify it is, and that it's not somehow an artifact of my test setup, but I can't see how it would be).

#2/3: media/webrtc/signaling/src/media-conduit/WebrtcOMXH264VideoDecoder.cpp:485 (DrainOutput()) and
#4:   media/webrtc/signaling/src/media-conduit/WebrtcOMXH264VideoDecoder.cpp:527 (OnNewFrame())
Flags: needinfo?(sotaro.ikeda.g)
first of three patches to help debug packetization issues
To use this and debug packetization issues, add this to b2g.sh:

export NSPR_LOG_MODULES=mediapipeline:5,signaling:6
export COMMAND_PREFIX=logwrapper

I use "adb logcat | egrep "(/b2g|/PRLog)" >/tmp/log to avoid tons of random stuff
Attached file frame_match.el
emacs keyboard macro to process a logfile per above into two buffers, one of Encoded frame lines, one a Decoder input lines.

I then use "cut -b 135-139 lll > decoder2" and "cut -b 135-139 kkk >encoded2" to get the frame sizes out of there for diffing (adjust the 'cut' params as needed)

Note that iframes will show as NNNN on encoder side, and "17 8 MMMM" on the decode side, where MMMM is ~25 less than NNNN; this is expected.

I then use "ediff-buffers" to compare.
Randell,
This log has your trace_packetization patch. Also, this time I see there is 1sec delay from output to newframe.

I/PRLog   ( 1424): 2014-10-24 18:05:09.907252 UTC - 24825424[b1205380]: [DecodingThread|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:403: Decoder input: 1139 bytes (NAL 0x41), time 7922389533 (713015058), flags 0x0
I/PRLog   ( 1424): 2014-10-24 18:05:09.952827 UTC - 24831264[b1329500]: [OMXOutputDrain|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:466: Decoder output: 8 bytes, offset 0, time 7922389533, flags 0x0
I/PRLog   ( 1424): 2014-10-24 18:05:10.879652 UTC - 24830096[b1329100]: [CodecLooper|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:528: Decoder NewFrame: 240 bytes, 320x240, timestamp -5761779888365240000, renderTimeMs 713015058
(In reply to Randell Jesup [:jesup] from comment #4)
> 
> Sotaro: any idea what could cause this?  (Also adding jhlin in case he has
> any thoughts)  Why would this be different in 2.1?  (and we should verify it
> is, and that it's not somehow an artifact of my test setup, but I can't see
> how it would be).

I am going to check what is going on on v2.1 flame.
Flags: needinfo?(sotaro.ikeda.g)
Sotaro: you can test on one device by using http://mozilla.github.io/webrtc-landing/pc_test.html and selecting "one way call" and "Require H.264" before hitting Start.
Attached file webrtc_trace2.zip (obsolete) —
Attached file webrtc_trace3.zip
Attachment #8511309 - Attachment is obsolete: true
This is a log when MediaCodec::renderOutputBufferAndRelease() takes 178 ms. I added some additional logcat logs for investigation.

MediaCodec::renderOutputBufferAndRelease() synchronously wait that MediaCodec::onMessageReceived() in ALooper thread handles kWhatReleaseOutputBuffer messages. The kWhatReleaseOutputBuffer message handling completes soon if the message is dispatched by MediaCodec::onMessageReceived().

From the log, MediaCodec already had some messages and handling other message takes time. And other WebRTC related threads seems took cpu time. Then kWhatReleaseOutputBuffer messages handling seems put off.
A lot of thread's priority is -14. It is too high for an application. On WebRTC, a lot of threads are using a lot of cpu time with very high thread priority(nice=-14). It could cause all other parts of the b2g system bad effect. They might not work correctly during WebRTC.
Withing the threads ALooper's thread priority is very low(nice=0). It is a ALooper's default thread priority. It seems to cause the problem in comment 14.
By applying the patch, latency becomes smaller. Before applying the patch, MediaCodec's function often took more than 200ms. After applying the patch, the becomes normally less than 20ms.

But this patch should not be applying to gecko, the threads's priority is too high.
> 
> But this patch should not be applying to gecko, the threads's priority is
> too high.

Correction:
This patch should not me committed to gecko. threads's priority becomes too high.
WebRTC related thread's priority(nice=-14) seems too high. Even compositor thread is nice=-4. Further WebRTC uses a lot of cpu time. It could cause very bad effect to other parts of systems.
(In reply to Sotaro Ikeda [:sotaro] from comment #22)
> WebRTC related thread's priority(nice=-14) seems too high. Even compositor
> thread is nice=-4. Further WebRTC uses a lot of cpu time. It could cause
> very bad effect to other parts of systems.

The very high priority bites even WebRTC related tasks like comment 14.
Realtime applications often need to manage thread priorities to maintain quality when there's an overload or to reduce latency/delay. 

On desktop Linux since Firefox has no special privileges, it's class is SCHED_OTHER, and the thread NICE values can't be changed by the application.  On Fedora, basically all the webrtc threads run SCHED_OTHER, nice 0.  I breakpointed webrtc and ThreadPosix::Start() grabs the min and max priorities for SCHED_OTHER - and they're 0 on b2g.  So we aren't even calling pthread_setschedparam().  (thread_posix.cc:258)

WebRTC does *request* high priorities from thread_posix.cc (even though it *doesn't* get them on linux), and it does so for a reason - most of the things it's doing are near-realtime or true hard realtime tasks where missing a deadline will cause audio to glitch or break up, or cause latency to build up (as it did here).  The core WebRTC code was designed to be run in embedded devices or on OS's which allow some level of flagging of realtime threads, and so it makes the request to the OS.

There are a bunch of webrtc threads running high nice values (18).   The -14 threads are things like opensl_rec_thread, AudioRecord, CameraThread, AudioTrack, AudioGUM and CodecLooper.  Trace -14 as well (but is idle), as is LatencyLogger.  The only ones that aren't obviously realtime/low-latency thread (and don't use a lot of CPU anyways) are MediaManager & (one) ProcessThread.

The audio chain really has to run at high-ish priority.  One difference in 2.1 from 2.0 is that MediaStreamGraph is run off audio callbacks from opensl instead of a timer, so somewhat more work is done at high priority.

The other possible priority-inversion is CodecLooper vs ALooper's priorities (nice of -16 and 0).

Summary:
=======
All the prioritized (negative nice) threads are that way because something in the system made them so.  All but a couple of negative-nice (high priority) threads are system threads we have no control over, and most of the rest don't use much CPU time.

I do suggest bumping the priority of ALooper.

It's late, I'm tired, but this does seem like at least part of the answer. I'm not sure yet it's the whole answer, but it may be.


Other notes:

The webrtc.org code would try to run a lot as SCHED_RR, however we disabled this over a year ago due to the impact on FxOS; see https://code.google.com/p/webrtc/issues/detail?id=1956
One could be more selective and only run a few critical threads as RR.
(In reply to Randell Jesup [:jesup] from comment #24)
> Realtime applications often need to manage thread priorities to maintain
> quality when there's an overload or to reduce latency/delay. 
> 
> On desktop Linux since Firefox has no special privileges, it's class is
> SCHED_OTHER, and the thread NICE values can't be changed by the application.
> On Fedora, basically all the webrtc threads run SCHED_OTHER, nice 0.  I
> breakpointed webrtc and ThreadPosix::Start() grabs the min and max
> priorities for SCHED_OTHER - and they're 0 on b2g.  So we aren't even
> calling pthread_setschedparam().  (thread_posix.cc:258)
> 
> WebRTC does *request* high priorities from thread_posix.cc (even though it
> *doesn't* get them on linux), and it does so for a reason - most of the
> things it's doing are near-realtime or true hard realtime tasks where
> missing a deadline will cause audio to glitch or break up, or cause latency
> to build up (as it did here).  The core WebRTC code was designed to be run
> in embedded devices or on OS's which allow some level of flagging of
> realtime threads, and so it makes the request to the OS.
> 
> There are a bunch of webrtc threads running high nice values (18).

It seems default thread's priority when application is in background. When application is in foreground it becomes nice=1. attachment 8511432 [details] it when Firefox Hello is in foreground. The process does not have nice=18.
One of my concern is application's main thread's nice value becomes also -14 when WebRTC is used.
(In reply to Randell Jesup [:jesup] from comment #24)
> Summary:
> =======
> All the prioritized (negative nice) threads are that way because something
> in the system made them so.  All but a couple of negative-nice (high
> priority) threads are system threads we have no control over, and most of
> the rest don't use much CPU time.
> 
> I do suggest bumping the priority of ALooper.
> 
> It's late, I'm tired, but this does seem like at least part of the answer.
> I'm not sure yet it's the whole answer, but it may be.

Yeah, anyway we need to raise ALooper to some value. MediaCodec's ALooper. This actually does not consume a lot of cpu time, the raising value should not badly affect to other systems.
It seems better also raise ImageBridgeChild thread priority as to less than 0. Its current value is default(nice=1 in foreground). It connect video rendering to compositor.
(In reply to Sotaro Ikeda [:sotaro] from comment #20)
> Created attachment 8511439 [details] [diff] [review]
> temporary patch - Raise ALooper thread priority
> 
> By applying the patch, latency becomes smaller. Before applying the patch,
> MediaCodec's function often took more than 200ms. After applying the patch,
> the becomes normally less than 20ms.
> 
> But this patch should not be applying to gecko, the threads's priority is
> too high.

Hmm, this priority seems unavoidable. ANDROID_PRIORITY_DISPLAY=-4 does not reduce the MediaCodec's function latency enough. This looper thread does not consume many cpu time. Therefore it should not badly affect to the system.
(In reply to Sotaro Ikeda [:sotaro] from comment #28)
> 
> Hmm, this priority seems unavoidable. ANDROID_PRIORITY_DISPLAY=-4 does not
> reduce the MediaCodec's function latency enough. This looper thread does not
> consume many cpu time. Therefore it should not badly affect to the system.

There is another thread that is better to change the thread priority.
Attachment #8511439 - Attachment description: temporary patch - Raise ALooper thread priority → patch - Raise ALooper thread priority
By attachment 8511439 [details] [diff] [review] and attachment 8511439 [details] [diff] [review], MediaCodec's delay becomes much shorter than default. But all problems might not be fixed yet. I tested WebRTC by using Firefox hello app on flame device. I used 2 flame devices. I still saw receiving video decoding stop.
Get logout by applying attachment 8511439 [details] [diff] [review] [diff] [review] and attachment 8511439 [details] [diff] [review]. From log, the following functions still intermittently takes longer time.
- MediaCodec::dequeueInputBuffer()
- MediaCodec::dequeueOutputBuffer()
Attachment #8511432 - Attachment description: log - thread priority of Firefox Hello app durin WebRTC → thread priority of Firefox Hello app durin WebRTC
Sotaro: thanks for digging into this.

Did you see decoding stop, or did you see displayed video stop?  The bug we've identified (increasing decode delay due to thread priorities) causes decode to continue, but playback to stop (after the 30-frame queue for in-decode frames overflows).

NOTE: the Loop app currently runs a CPU overload (90+%).  Per discussions in the bug on loop app performance, running tests of simple webrtc code in a browser uses much less CPU; adding a framework like tokbox (loop) or simpleWebrtc adds more overhead, and the loop app itself seems to add more on top of those, ending in 95%+ while a simple webrtc app in the browser might be 80% and much smoother.

Also note on 2.1 I found this bug (priorities) in a simple loopback browser test (http://mozilla.github.io/webrtc-landing/pc_test.html - set "one way call" and "Require H.264"); no overload there.

Production builds for 8x10's will also use the Hardware echo cancellation (AEC) and Noise Suppression, likely cutting at least 5% and likely more like 10% off CPU use (and in particular cutting it off the high-priority audio threads).

(The phone status bar is also using >10% of the CPU due to the Network Activity indicator causing constant updates and reflows - apparently even when hidden by a full-screen app.)
This patch temporarily extends ACodec input buffer size.
Jay, can you do re-evaluation by applying the following patches?
- Raise ALooper thread priority : attachment 8511439 [details] [diff] [review]
- Raise OMX CallbackDispatcherThread thread priority: attachment 8511692 [details] [diff] [review] 
- Increase ACodec input buffer: attachment 8511754 [details] [diff] [review]
- Raise OMX hal thread priority: attachment 8511755 [details] [diff] [review]
Flags: needinfo?(jaywang)
This problem seems caused by very high cpu usage(more than 90%). b2g main thread still consume a lot of cpu time on v2.1, it need to be reduced.
(In reply to Sotaro Ikeda [:sotaro] from comment #37)
> This problem seems caused by very high cpu usage(more than 90%). b2g main
> thread still consume a lot of cpu time on v2.1, it need to be reduced.

MediaCodec's decoding doing a lot of thread dispatch tasks. High cpu usage seems to affect to it.
"adb shell top -m 10 -d 2 -t" during H.264 WebRTC with Firefox Hello is like the following.

User 59%, System 31%, IOW 0%, IRQ 0%
User 248 + Nice 29 + Sys 148 + Idle 35 + IOW 0 + IRQ 0 + SIRQ 4 = 464

  PID   TID PR CPU% S     VSS     RSS PCY UID      Thread          Proc
  207   710  1  12% R 191556K  79344K     root     Compositor      /system/b2g/b2g
  207   207  0  12% R 191564K  79456K     root     b2g             /system/b2g/b2g
 1394  1394  0   6% S 128444K  41640K     u0_a1394 Firefox Hello:  /system/b2g/b2g
 1394  1905  1   6% S 128444K  41640K  fg u0_a1394 AudioTrack      /system/b2g/b2g
 1394  1896  1   4% S 128444K  41640K     u0_a1394 AudioGUM        /system/b2g/b2g
 1394  1907  1   4% S 128444K  41640K     u0_a1394 opensl_rec_thre /system/b2g/b2g
 1394  1402  0   4% S 128444K  41640K     u0_a1394 Socket Thread   /system/b2g/b2g
  281  1935  0   3% S  35792K   9832K     camera   mm-qcamera-daem /system/bin/mm-qcamera-daemon
  207   524  0   2% S 191564K  79456K     root     Socket Thread   /system/b2g/b2g
 2078  2078  1   2% R   1412K    652K     root     top             top

User 54%, System 36%, IOW 0%, IRQ 1%
User 213 + Nice 34 + Sys 168 + Idle 36 + IOW 0 + IRQ 0 + SIRQ 5 = 456

  PID   TID PR CPU% S     VSS     RSS PCY UID      Thread          Proc
  207   207  0  13% R 191564K  79524K     root     b2g             /system/b2g/b2g
  207   710  1  12% S 191564K  79524K     root     Compositor      /system/b2g/b2g
 1394  1394  1   7% S 128444K  41660K     u0_a1394 Firefox Hello:  /system/b2g/b2g
 1394  1905  1   7% R 128444K  41660K  fg u0_a1394 AudioTrack      /system/b2g/b2g
 1394  1907  1   5% S 128444K  41660K     u0_a1394 opensl_rec_thre /system/b2g/b2g
 1394  1896  0   5% S 128444K  41660K     u0_a1394 AudioGUM        /system/b2g/b2g
 1394  1402  1   5% S 128444K  41660K     u0_a1394 Socket Thread   /system/b2g/b2g
  281  1935  1   3% S  35792K   9832K     camera   mm-qcamera-daem /system/bin/mm-qcamera-daemon
  207   524  0   3% S 191564K  79524K     root     Socket Thread   /system/b2g/b2g
 2078  2078  0   2% R   1412K    652K     root     top             top
Flags: needinfo?(jaywang)
(In reply to Sotaro Ikeda [:sotaro] from comment #38)
> (In reply to Sotaro Ikeda [:sotaro] from comment #37)
> > This problem seems caused by very high cpu usage(more than 90%). b2g main
> > thread still consume a lot of cpu time on v2.1, it need to be reduced.
> 
> MediaCodec's decoding doing a lot of thread dispatch tasks. High cpu usage
> seems to affect to it.

I am going to think there is another cause of the problem. By applying the patches, the symptom seems better, but the problem is still exist.
With sotaro's patches (but not a full flash, just ./flash gaia && ./flash gecko), I no longer see increasing delay on video decode.  On the order of 4ms from Decoder input to Decoder NewFrame.  (yeah!)   

I do see some fixed delay, and from looking at SendPacket/ReceivePacket times in VideoConduit, it appears it's taking 250-850ms for a packet to go from SendPacket through encryption and STS thread, through e10s, loopback (never hits a network) and down through e10s, STS, decryption and down to VideoConduit.  I doubt encryption or STS thread are a significant hit here; not sure about e10s.

It's possible some of this is caused by higher-priority processes using significant CPU causing problems elsewhere in the system (still not tons, but >15% probably).  We could try proxying higher-priority audio requests down to lower-priority threads, or re-introduce a small buffer to allow most of the processing on a lower priority though continue to drive it off the output.  I'd really like to avoid this.

The worst thing I see is the b2g thread using *22%* of the CPU... and it seems to keep going up; right now a few minutes later I'm seeing *26-33%* in b2g(!)  Compositor, however, is around 7-8% which is a nice improvement.  In later calls it started at 27-30%.

Note: I'm likely doing some mildly unsupported stuff here; I'm running v188 with a beta gecko, and with an (older) trunk B2G pull with sotaro's patches applied.  But that wouldn't explain this I don't think.  I need to make a 2.1 pull of B2G and dig out some disk space.   I *hope* some of what I'm seeing is due to this.  But I don't expect it.


After 1/2 hour plus, there's increased delay, but no decode failures.  Frames are building up somewhere, though.  GetUserMedia stays low-delay.  On a second call, all stayed pretty good for 20 minutes, then suddenly the decode times fell off a cliff - 4ms had drifted up to 60ms, and then over a few minutes went to 3.5 seconds, which did break output.  (Still trying to find if it was abrupt or gradual).  Decoder input to Decoder output was still fast (5-7ms); NewFrame was 3.5 seconds later.  CPU use is still around the same (90%); it could be the b2g process slowly squeezed us until things broke.
I simply compared the b2g v2.0 H.264 WebRTC and v2.1 H.264 WebRTC. I used Firefox Hello app's SimpleWebRTC for comparison.

- b2g-v2.0: video size=(320*240) average cpu usage=around 95%
            4 WebRTC related thread's("Firefox Hello:") priorities are raised to "nice=-14".

- b2g-v2.1 video size=(320*240) average cpu usage=around 98%
            20 WebRTC related thread's("Firefox Hello:") priorities are raised to "nice=-14".
Attachment #8511432 - Attachment is obsolete: true
Got the log by using "adb shell top -m 20 -t -d 2".
(In reply to Randell Jesup [:jesup] from comment #41)
> With sotaro's patches (but not a full flash, just ./flash gaia && ./flash
> gecko), I no longer see increasing delay on video decode.  On the order of
> 4ms from Decoder input to Decoder NewFrame.  (yeah!)   

My gecko patch seems relatively safe. But other gonk and OMX hal patch are very dangerous. It seems better do not use them. Instead, it seems better to change WebRTC related threads to lower priority. The followings are b2g v2.0 and b2g v2.1's WebRTC related threads. The v2.1 threads are very high priority compared to b2g v2.0.
- Application's threads during H.264 WebRTC on b2g v2.0 attachment 8512226 [details]
- Application's threads during H.264 WebRTC on b2g v2.0 attachment 8512227 [details]
Sadly, even when I applied the following all patches. Firefox Hello app's SimpleWebRTC call between 2 b2gv2.1 flame devices's problem did not disappear. MediaCodec's function call's latency problem seems fixed. But video encoding under flow happened occasionally. And during long WebRTC call, decoding video frame result becomes static(do not move anymore) even when MediaCodec doing decoding. 

- Raise ALooper thread priority : attachment 8511439 [details] [diff] [review]
- Raise OMX CallbackDispatcherThread thread priority: attachment 8511692 [details] [diff] [review]
- Increase ACodec input buffer: attachment 8511754 [details] [diff] [review]
- Raise OMX hal thread priority: attachment 8511755 [details] [diff] [review]

I feel that lowering some WebRTC related threads are necessary.
During WebRTC, cpu usage is very very high(95%-98%). Even 1% usage reduction is important. I confirmed that Bug 1067455 reduced about 1% cpu usage reduction at "Firefox Hello app's SimpleWebRTC" use case. It might be better to up lift the fix.
Depends on: 1067455
Agreed. I want to know why I see b2g (in pc_test using 22-33% of the CPU(!!)

Also: I'm wondering if threads are being started from an audio callback, and inheriting the audio callback's priority....  Since we can't make threads be high priority on b2g.
(In reply to Randell Jesup [:jesup] from comment #50)
> Agreed. I want to know why I see b2g (in pc_test using 22-33% of the CPU(!!)
> 
> Also: I'm wondering if threads are being started from an audio callback, and
> inheriting the audio callback's priority....  Since we can't make threads be
> high priority on b2g.

In content process, we can change the thread priority higher just by setpriority() like the following. It is adnroid application process's characteristic. I is not a recommended way for b2g. This way could be collide with ProcessPriorityManager's priority control, if the timing is bad. ProcessPriorityManager basically does not override the thread priority if nice is less than or equal to 0.

> setpriority(PRIO_PROCESS, 0, ANDROID_PRIORITY_NORMAL);

Using a hal function is a b2g recommended way like the following.

> hal::SetCurrentThreadPriority(hal::THREAD_PRIORITY_COMPOSITOR);

http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp#
AThough normal linux system does not permit it, android application process permit it.
Comment on attachment 8511754 [details] [diff] [review]
patch - Increase ACodec input buffer

Set to obsolete based on Comment 47.
Attachment #8511754 - Attachment is obsolete: true
Comment on attachment 8511755 [details] [diff] [review]
patch - Raise OMX hal thread priority

Set to obsolete based on Comment 47.
Attachment #8511755 - Attachment is obsolete: true
Comment on attachment 8511692 [details] [diff] [review]
patch - Raise OMX CallbackDispatcherThread thread priority

Set to obsolete based on Comment 47.
Attachment #8511692 - Attachment is obsolete: true
During sipcc initialization, cprAdjustRelativeThreadPriority() is called and applicaiton's main thread nice is changed to -14. This main thread's priority seems to be used by other newly created threads.

http://mxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/src/sipcc/cpr/linux/cpr_linux_threads.c
(In reply to Sotaro Ikeda [:sotaro] from comment #52)
> AThough normal linux system does not permit it, android application process
> permit it.

android limits the application's cpu usage by using cgroup.
 http://androidxref.com/4.4.4_r1/xref/system/core/rootdir/init.rc#100

FYI:
 http://www.androiddesignpatterns.com/2014/01/thread-scheduling-in-android.html
 http://gizmodo.com/5527407/giz-explains-how-multitasking-works-on-a-phone
This sets all SIPCC thread priorities other than the timer to 0 for Gonk, since on B2G it actually will let us set aggressive priorities and spawned threads appear to inherit them - and I think after the thread flattening, this might be more of a problem than before.  The thread flattening landed in 34, so this would match it be a b2g 2.1 regression.  Byron: this might be the cause of the b2g perma-orange you were getting, maybe.
There is some possibility of that being the case. I would just remove the thread priority stuff altogether.
So ps -p -t shows that only AudioRecord and AudioTrack are -16 now, and virtually all the webrtc-related threads are 0.

(Note: I still have you patches installed, so ALooper and CodecLooper are -16 also).  With this sipcc thread pri patch, it seems to work generally well, but we have eventual buildup of delay between Decoder ouput and NewFrame (i.e. delay calling renderOutputBufferAndRelease() to get the frame to be rendered and calling OnNewFrame().  This goes from 4ms to 1.8 seconds over the lifetime of a call.

Bryon: comments on what the flattening might have changed from 33 to 34 on b2g?
Comment on attachment 8512420 [details] [diff] [review]
don't bump sipcc thread priorities on B2G

Also, this may affect Android in the same way....
Attachment #8512420 - Flags: feedback?(gpascutto)
Attachment #8512420 - Flags: feedback?(docfaraday)
I'm guessing that once the sipcc logic was moved to main, this priority setting stuff applied there for the first time. I recall removing at least one such call, but I guess I missed this one.
Comment on attachment 8512420 [details] [diff] [review]
don't bump sipcc thread priorities on B2G

Review of attachment 8512420 [details] [diff] [review]:
-----------------------------------------------------------------

Just rip out the thread priority stuff entirely. We're gutting all of this anyway, there's no reason to fix it up.
Attachment #8512420 - Flags: feedback?(docfaraday)
Comment on attachment 8512420 [details] [diff] [review]
don't bump sipcc thread priorities on B2G

Review of attachment 8512420 [details] [diff] [review]:
-----------------------------------------------------------------

Android app threads run with the SCHED_OTHER/SCHED_NORMAL scheduling policy, so this code has no effect there, or it's the same as regular Unix, basically.
Attachment #8512420 - Flags: feedback?(gpascutto) → feedback+
(In reply to Gian-Carlo Pascutto [:gcp] from comment #65)
> Android app threads run with the SCHED_OTHER/SCHED_NORMAL scheduling policy,
> so this code has no effect there, or it's the same as regular Unix,
> basically.

But Sotaro wrote:

>Though normal linux system does not permit it, android application process permit it.

We purposely removed the attempts to set SCHED_RR or SCHED_FIFO from the webrtc.org thread code.  The main app should be SCHED_OTHER already, so why is it finding that it can "lower" (effectively raise) the priority?  Normally that's restricted to root/etc in linux.  So is there really a difference between android apps and what we see in b2g?  (Are we running elevated privs?)

I will note that sipcc was using setpriority(), not pthread_setschedparam() (which is what the webrtc.org code uses, and maps internal priorities to the range sched_get_priority_min(policy) ... sched_get_priority_max(policy)).
Attachment #8512648 - Flags: review?(docfaraday)
From comment 42, total cpu usage during WebRTC seems to be increased from b2g v2.0 to b2gv2.1. This seems also a problem.
(In reply to Sotaro Ikeda [:sotaro] from comment #68)
> From comment 42, total cpu usage during WebRTC seems to be increased from
> b2g v2.0 to b2gv2.1. This seems also a problem.

Note that CPU use at the top end is adaptive - frames will be dropped, quality will suffer, audio will underrun at the overload point; so comparisons between 2.0 and 2.1 where the values are >90% are not likely to be valid.  You'd also need to include the video encode/decode/display framerates, and audio quality.

That said: it may well be that 2.1 is using more CPU.  I know in browser tests it appears to be using a HUGE amount of CPU in 'b2g'; from previous tests this appears to be at least partly due to the network activity status icon constantly updating and forcing reflows - even when not visible.  However, there may be more; we need profiles into the b2g process to see that.
Comment on attachment 8512648 [details] [diff] [review]
don't try to set the priority of the CCApp thread (which doesn't exist)

Review of attachment 8512648 [details] [diff] [review]:
-----------------------------------------------------------------

This should do the trick I think. We could be extra sure by making cprAdjustRelativeThreadPriority a no-op, or even removing it entirely (the former is easiest, and it doesn't matter much if it's ugly since it is all going away anyhow).
Attachment #8512648 - Flags: review?(docfaraday) → review+
blocking-b2g: 2.1? → 2.1+
I tested attachment 8512648 [details] [diff] [review] H.264 WebRTC by using Firefox hello app's SimpleWebRTC on 2 Flame devices. I did not see video freeze during the testing.
Comment on attachment 8512648 [details] [diff] [review]
don't try to set the priority of the CCApp thread (which doesn't exist)

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): SDP refactor/thread-flattening (landed in 34)

User impact if declined: Priority issues causing webrtc calls to fail

Testing completed: Tested by Sotaro; tested by myself and shown that priorities are normal with this patch.

Risk to taking this patch (and alternatives if risky): No risk

String or UUID changes made by this patch: none
Attachment #8512648 - Flags: approval-mozilla-b2g34?
(In reply to Sotaro Ikeda [:sotaro] from comment #71)
> I tested attachment 8512648 [details] [diff] [review] H.264 WebRTC by using
> Firefox hello app's SimpleWebRTC on 2 Flame devices. I did not see video
> freeze during the testing.

The problem seems to be fixed if the applications are always in foreground. But when once Firefox hello app set to background and doing another task like launch Setting app, when I put "Firefox hello app" as foreground app again, the receiving video seems freezed. It seems another problem.

[1] Start "Firefox hello" app.
[2] Start SimpleWebRTC. It starts H.264 WebRTC session.
[3] Continue WebRTC several seconds.
[4] Push home button for a short time. Foregorund application becomes homescreen.
[5] Start "Setting app". Then doing something.
[6] Push homebutton for a long time, then card view is shown.
[7] Choose "Firefox hello" app as a foreground app. Then "Firefox hello" app becomes foreground app.

After [7] receiving video update becomes only intermittent updates. And then the receiving video becomes freezed.
This should probably be uplifted to aurora, and beta if possible, since we don't want a webrtc call tweaking the thread priority of main, which will persist until the browser is restarted.
(In reply to Sotaro Ikeda [:sotaro] from comment #71)
> I tested attachment 8512648 [details] [diff] [review] H.264 WebRTC by using
> Firefox hello app's SimpleWebRTC on 2 Flame devices. I did not see video
> freeze during the testing.

Hmm, I updated the code to latest b2gv2.1, even after applying attachment 8512648 [details] [diff] [review], the problem seems not fixed :-(
(In reply to Sotaro Ikeda [:sotaro] from comment #75)
> (In reply to Sotaro Ikeda [:sotaro] from comment #71)
> Hmm, I updated the code to latest b2gv2.1, even after applying attachment
> 8512648 [details] [diff] [review], the problem seems not fixed :-(

I am wondering if Bug 1080755 might be affected to it.
Reduced b2g thread CPU use from 25-30% to <3%
Overall CPU use from 90% to 70-80%
frame rate seems higher
end-to-end delay much lower to start, and and *stays* low after 30+ minutes
No freezing
Blocks: 1091240
CCApp fix - was going to land this last night, but the tree closed
https://hg.mozilla.org/integration/mozilla-inbound/rev/e82273c162bf
waiting for inbound/central landing before getting the uplift here.
https://hg.mozilla.org/mozilla-central/rev/e82273c162bf
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
(In reply to Randell Jesup [:jesup] from comment #77)
> Created attachment 8513775 [details] [diff] [review]
> WIP NOT FOR CHECKIN: nuke all traces of the network-activity indicator
> 
> Reduced b2g thread CPU use from 25-30% to <3%
> Overall CPU use from 90% to 70-80%
> frame rate seems higher
> end-to-end delay much lower to start, and and *stays* low after 30+ minutes
> No freezing

I also tried attachment 8513775 [details] [diff] [review] in Firefox Hello's SimpleWebRTC on v2.1 flame. The problem is also fixed in this use case. Cpu usage was changed like the following.
 - total cpu usage 98% -> 81%
 - b2g main thread 12% -> 4%
 - b2g Compositor thread 12%->6%
Does this affect desktop Fx as well?
Flags: needinfo?(rjesup)
Comment on attachment 8512648 [details] [diff] [review]
don't try to set the priority of the CCApp thread (which doesn't exist)

Approval Request Comment
[Feature/regressing bug #]: bwc's sipcc thread flattening in 34

[User impact if declined]: see below.  maybe nothing, perhaps android/windows affected.

[Describe test coverage new/current, TBPL]:  covered by normal tests.  This may make b2g emulator debug usable again on 34+ (or rather, as uisabled as it ever was) for peerconnection tests.

[Risks and why]: 
b2g: absolutely important
linux: not needed
Mac: unlikely
Android: gcp says no, sotaro says yes.  Needs to be tested
Windows: dunno, maybe

Zero downside to taking it.

[String/UUID change made/needed]: none
Flags: needinfo?(rjesup)
Attachment #8512648 - Flags: approval-mozilla-beta?
Attachment #8512648 - Flags: approval-mozilla-aurora?
Keywords: verifyme
Attachment #8512648 - Flags: approval-mozilla-b2g34? → approval-mozilla-b2g34+
Since we need this for b2g34 uplift, marking status flags accordingly.
Attachment #8512648 - Flags: approval-mozilla-beta?
Attachment #8512648 - Flags: approval-mozilla-beta+
Attachment #8512648 - Flags: approval-mozilla-aurora?
Attachment #8512648 - Flags: approval-mozilla-aurora+
(In reply to Randell Jesup [:jesup] from comment #83)
> [Risks and why]: 
> Android: gcp says no, sotaro says yes.  Needs to be tested

Sotaro is right: https://bugzilla.mozilla.org/show_bug.cgi?id=1090865#c3
This issue is verified fixed on Flame 2.1 and 2.2.

Result: The video did not freeze following STRs on Comment 0 after 10 minutes. The video also did not freeze following STRs on Comment 73 as well.

Device: Flame 2.1 (319mb, KK, Shallow Flash)
BuildID: 20141107001205
Gaia: 6295f6acfe91c6ae659712747dd2b9c8f51d0339
Gecko: 8c23b4f2ba29
Version: 34.0 (2.1) 
Firmware Version: v188-1
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Device: Flame 2.2 (319mb, KK, Full Flash)
BuildID: 20141107040206
Gaia: 779f05fead3d009f6e7fe713ad0fea16b6f2fb31
Gecko: 64f4392d0bdc
Gonk: 48835395daa6a49b281db62c50805bd6ca24077e
Version: 36.0a1 (2.2 Master)
Firmware: V188
User Agent: Mozilla/5.0 (Mobile; rv:36.0) Gecko/36.0 Firefox/36.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Keywords: verifyme
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: