Do you still googling error messages?

I am intermediate Linux user which has basic knowledge of programming (c, perl, js ...) and some system troubleshooting (strace, SystemTap, lsof ...) and I am tired of Googling the messages which comes to Linux logs (/var/log/messages). I would like to improve my Linux kernel knowledge. Since Linux (and it's utilities like ssh etc.) is open source there are source codes available somewhere. So my question is: How can I troubleshoot/debug Linux problems on source code level? Is this even possible for intermediate Linux user? Where to begin and how to improve my programming skills and Linux kernel knowledge this way? Any best practices are welcomed.

Something like this:

  1. Copy your error message
  2. Paste it to online search engine for kernel source codes which can be found *
  3. Now you have file name where it appears under kernel structures
  4. See folders and files structure contained in kernel here *
  5. You can deduce from file location what is purpose of this file under kernel, or find further documentation here *
  6. Use tool * to find further files in kernel on which your incriminated file depends
  7. Now read their source until it will be clear
  8. Now you can see under which conditions this message happens (if else statements)
  9. You can use software like * to debug or write some kind of exception to see when this message appears.

Thank you.

I don't see a point looking in source code. It still won't tell you what it means, exactly.

Knowing which module is provoking the error is half the battle. If you know what it is, you can narrow down from there.

Why looking into source wont tell me what is the problem? I really like this kind of explaining Tsuna's blog: The "Out of socket memory" error
where the author goes directly into the source from user perspective and also touches kernel. What do you mean by module? Is there any repository for source codes of Linux utilities/commands which I can read online, then find here the error message of program and see what causes this message? I do not have general problem just trying to improve my experiences beyond classical debugging. For example if I see something like this in messages

May 19 10:29:57 lonsha10 sshd[32373]: Closing connection to 192.168.1.1

I know that this comes from sshd so I wold like to inspect deep what causes this error/message. This can be applied also for example when you issue command and it gives you some message and you do not know what this means. My aim is to better understand system, learn some kernel hacking and C by some kind of "reverse engineering" process.

I certainly see value in digging in sshd's source code.

Kernel code is unfortunately another thing again. Educating yourself on the relevant structures and conventions is an entire career. Sometimes you may luck out, often you won't be able to tell what you're looking at.

So what would you recommend to starting kernel/C language newbie (hacker)?

Depends what you want to do. You could start with one of the many 'write a linux device driver' tutorials you can find all over the internet, but be sure to pick a current one, an old one probably won't work with a new kernel. A better understanding of the C language than newbie would also be a plus.

And keep in mind that the environment inside the kernel is profoundly not what you're used to. Memory layout and memory management will be bizarre and inconsistent and limited and messy. You can't just open a file if you feel like it, you have to jump through many hoops. And any bugs in your code can crash or freeze your PC. All the nice things you've come to expect when programming C are things a kernel gives you, which no longer applies when building the kernel itself.

If you're interested in learning kernel internals, Linux Kernel Development (3rd Edition) by Robert Love is highly regarded. It introduces key subsystems and data structures.

You may also find useful information at kernelnewbies.org

Regards,
Alister

1 Like

Hi wakatana...

This might be of pure interest value, and will certainly test your programming skills:-

http://wiki.osdev.org/Expanded_Main_Page

I joined the ML some time ago...

It is devoted to those who develop and write OSes for fun...

But boy there are some seriously knowledgable guys on there...