Being a moderator at openATV, a forum for Linux settup boxes, I have seen reports, and sometimes am experiencing myself, artefacts during video playback or timeshift.
As the artefacts are non repetetive (rewinding and watching again does not show artefacts), I can exclude a corrupted video source.
We found that each artefact (up to one per minute in average) 100% correlates with an entry in /var/log/messages like
Jun 29 14:54:54 ventonhdx user.warn kernel: enigma2: page allocation failure: order:5, mode:0xd0
Jun 29 14:54:54 ventonhdx user.warn kernel: Call Trace:
Jun 29 14:54:54 ventonhdx user.warn kernel: [<805ff9d0>] dump_stack+0x8/0x34
Jun 29 14:54:54 ventonhdx user.warn kernel: [<80091f0c>] warn_alloc_failed+0xe4/0x124
Jun 29 14:54:54 ventonhdx user.warn kernel: [<80094690>] __alloc_pages_nodemask+0x434/0x6e8
Jun 29 14:54:54 ventonhdx user.warn kernel: [<800cb6c8>] cache_alloc_refill+0x318/0x8c0
Jun 29 14:54:54 ventonhdx user.warn kernel: [<800cbdc4>] __kmalloc+0x154/0x19c
Jun 29 14:54:54 ventonhdx user.warn kernel: [<800a8900>] memdup_user+0x24/0x94
Jun 29 14:54:54 ventonhdx user.warn kernel: [<8045e198>] dvbdmx_write+0x48/0xd0
Jun 29 14:54:54 ventonhdx user.warn kernel: [<800cf8b0>] vfs_write+0x9c/0x184
Jun 29 14:54:54 ventonhdx user.warn kernel: [<800cfcd8>] sys_write+0x50/0xb0
Jun 29 14:54:54 ventonhdx user.warn kernel: [<8000e928>] stack_done+0x20/0x44
Jun 29 14:54:54 ventonhdx user.warn kernel: Mem-Info:
Jun 29 14:54:54 ventonhdx user.warn kernel: Normal per-cpu:
Jun 29 14:54:54 ventonhdx user.warn kernel: CPU 0: hi: 186, btch: 31 usd: 0
Jun 29 14:54:54 ventonhdx user.warn kernel: CPU 1: hi: 186, btch: 31 usd: 173
Jun 29 14:54:54 ventonhdx user.warn kernel: active_anon:11037 inactive_anon:11111 isolated_anon:0
Jun 29 14:54:54 ventonhdx user.warn kernel: active_file:4120 inactive_file:24772 isolated_file:0
Jun 29 14:54:54 ventonhdx user.warn kernel: unevictable:0 dirty:6121 writeback:1050 unstable:0
Jun 29 14:54:54 ventonhdx user.warn kernel: free:12176 slab_reclaimable:1301 slab_unreclaimable:1821
Jun 29 14:54:54 ventonhdx user.warn kernel: mapped:997 shmem:69 pagetables:129 bounce:0
Jun 29 14:54:54 ventonhdx user.warn kernel: Normal free:62716kB min:2876kB low:3592kB high:4312kB active_anon:44148kB inactive_anon:44444kB active_file:16480kB inactive_file:85132kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:518144kB mlocked:0kB dirty:
Jun 29 14:54:54 ventonhdx user.warn kernel: lowmem_reserve[]: 0 0
Jun 29 14:54:54 ventonhdx user.warn kernel: Normal: 3575*4kB 4431*8kB 2239*16kB 78*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 88260kB
Jun 29 14:54:54 ventonhdx user.warn kernel: 21111 total pagecache pages
Jun 29 14:54:54 ventonhdx user.warn kernel: 3465 pages in swap cache
Jun 29 14:54:54 ventonhdx user.warn kernel: Swap cache stats: add 4601, delete 1136, find 7/9
Jun 29 14:54:54 ventonhdx user.warn kernel: Free swap = 14448kB
Jun 29 14:54:54 ventonhdx user.warn kernel: Total swap = 32764kB
Jun 29 14:54:54 ventonhdx user.warn kernel: 131072 pages RAM
Jun 29 14:54:54 ventonhdx user.warn kernel: 58359 pages reserved
Jun 29 14:54:54 ventonhdx user.warn kernel: 14100 pages shared
Jun 29 14:54:54 ventonhdx user.warn kernel: 34321 pages non-shared
Jun 29 14:54:54 ventonhdx user.warn kernel: SLAB: Unable to allocate memory on node 0 (gfp=0xd0)
Jun 29 14:54:54 ventonhdx user.warn kernel: cache: size-131072, object size: 131072, order: 5
Jun 29 14:54:54 ventonhdx user.warn kernel: node 0: slabs: 4/4, objs: 4/4, free: 0
This seems to indicate severe memory fragmentation. Although enough total memory is available, the supply of 128k blocks is low.
This is not always the case. After starting the box, or after "echo 3 > /proc/sys/vm/drop_caches", there is a lot of memeory available:
root@gbquad:~# cat /proc/buddyinfo
Node 0, zone Normal 133 314 249 919 1558 655 178 23 0 0 1
Within the next few minutes, the caches fill up until approx. 6MB RAM are left. In the "good state", fragmentation is low, note the 4MB segment:
root@gbquad:~# cat /proc/buddyinfo
Node 0, zone Normal 232 160 0 0 0 0 0 0 0 0 1
In the "bad state", memory is severely fragmented, resulting in allocation failures and playback artefacts:
root@gbquad:~# cat /proc/buddyinfo
Node 0, zone Normal 1409 350 16 1 0 0 0 0 0 0 0
Unfortunately, we have not yet found out what is causing the "bad state". I have so far only seen it after configuring the timeshift buffer to be on a USB stick and then moving it back to the HDD.
We have tried some approaches trying to cure symptoms (not addressing the root cause):
Clearing caches
"echo 3 > /proc/sys/vm/drop_caches" is freeing up memory, and executing this every 3 minutes in a cron job seems to be helpful.
Many Linux users may say that dropping caches is a bad idea. And yes, dropping them and allowing them to fill again in a cyclic manner is definitely a waste of performance, so avoiding or reducing caching from the start would probably be better. In contrast to a Linux PC executing the OS and programs from HDD, these settop boxes never execute code from HDD but from built-in Flash memory, so caching of CPU code is not required. The cache used for video data may be required for "reading ahead" and thus guaranteeing a continuous stream, but data actually may be dropped after playing it. Having said this, this may actually be critical, there is a risk that data is dropped that is just about to be played. I know very little about this and cannot say whether this is an issue. I'm also not sure what other data is being cached, I can see the cache fill up (much more slowly) with timeshift disabled.
Memory compaction
"echo 1 > /proc/sys/vm/compact_memory", (with CONFIG_COMPACTION=y), executed regularly in a cron job may be helpful as well, though I have not been able to test yet whether in the "bad state" the fragmentation is actually improving (on my box, the "bad state" is rare).
Swap
Some users reported an improvement after installing swap on a USB stick. Other experiments show that swap, though installed, is hardly being used, and I'm also a bit concerned about the access time of swap on a USB stick.
I would be grateful for thoughts and hints, espacially about strategies for finding the root cause of the memory fragmentation, knowing that this may be very difficult without detailed knowledge of the settop box internals.