Sign in with
Sign up | Sign in

Linux Needs GC Lingua Franca(s) to Win

By , posting for special guest columnist <b>Keith Curtis</b> - Source: Tom's Hardware US | B 55 comments

Keith Curtis joins us once again for a discussion on Linux. A year ago, he spoke of How Linux Could Achieve Faster World Domination. Now he's back with a more focused view on just what Linux needs to pull ahead.

If we were already talking to our computers, etc. as we should be, I wouldn’t feel a need to write this to you. Given current rates of adoption, Linux still seems a generation away from being the priceless piece of free software useful to every child and PhD. This army your kernel enables has millions of people, but they often lose to smaller proprietary armies, because they are working inefficiently. My mail one year ago listed the biggest workitems, but I realize now I should have focused on one. In a sentence, I have discovered that we need GC lingua franca(s).

Every Linux success builds momentum, but the desktop serves as a powerful daily reminder of the scientific tradition. Many software PhDs publish papers but not source, like Microsoft. I attended a human genomics conference and found that the biotech world is filled with proprietary software. IBM’s Jeopardy-playing Watson is proprietary, like Deep Blue was. This topic is not discussed in any of the news articles, as if the license does not matter. I find widespread fear of having ideas stolen in the software industry, and proprietary licenses encourage this. We need to get these paranoid programmers, hunched in the shadows, scribbled secrets clutched in their fists, working together, for any of them to succeed. Windows is not the biggest problem, it is the proprietary licensing model that has infected computing, and science. Desktop world domination is not necessary, but it is sufficient to get robotic chaffeurs and butlers.

There is, unsurprisingly, a consensus among kernel programmers that usermode is “a mess” today, which suggests there is a flaw in the Linux desktop programming paradigm. Consider the vast cosmic expanse of XML libraries in a Linux distribution. Like computer vision, there are not yet clear places for knowledge to accumulate. It is a shame that the kernel is so far ahead of most of the rest of user mode.

The most popular free computer vision codebase is OpenCV, but it is time-consuming to integrate because it defines an entire world in C++ down to the matrix class. Because C/C++ didn’t define a matrix, nor provide code, countless groups have created their own. It is easier to build your own computer vision library using standard classes that do math, I/O, and graphics, than to integrate OpenCV. Getting productive in that codebase is months of work and people want to see results before then. Building it is a chore, and they have lost users because of that. Progress in the OpenCV core is very slow because the barriers to entry are high. OpenCV has some machine learning code, but they would be better delegating that out to others. They are now doing CUDA optimizations they could get from elsewhere. They also have 3 Python wrappers and several other wrappers as well; many groups spend more time working on wrappers than the underlying code. Using the wrappers is fine if you only want to call the software, but if you want to improve the underlying code, then the programming environment instantly becomes radically different and more complicated.

There is a team working on Strong AI called OpenCog, a C++ codebase created in 2001. They are evolving slowly as they do not have a constant stream of demos. They don’t consider their codebase is a small amount of world-changing ideas buried in engineering baggage like STL. Their GC language for small pieces is Scheme, an unpopular GC language in the FOSS community. Some in their group recommend Erlang. The OpenCog team looks at their core of C++, and over to OpenCV’s core of C++, and concludes the situation is fine. One of the biggest features of the ROS (Robot OS), according to its documentation, is a re-implementation of RPC in C++, not what robotics was missing. I’ve emailed various groups and all know of GC, but they are afraid of any decrease in performance, and they do not think they will ever save time. The transition from brooms to vacuum cleaners was disruptive, but we managed.

C/C++ makes it harder to share code amongst disparate scientists than a GC language. It doesn’t matter if there are lots of XML parsers or RSS readers, but it does matter if we don’t have an official computer vision codebase. This is not against any codebase or language, only for free software lingua franca(s) in certain places to enable faster knowledge accumulation. Even language researchers can improve and create variants of a common language, and tools can output it from other domains like math. Agreeing on a standard still gives us an uncountably infinite number of things to disagree over.

Because the kernel is written in C, you’ve strongly influenced the rest of community. C is fully acceptable for a mature kernel like Linux, but many concepts aren’t so clear in user mode. What is the UI of OpenOffice when speech input is the primary means of control? Many scientists don’t understand the difference between the stack and the heap. Software isn’t buildable if those with the necessary expertise can’t use the tools they are given.

C is a flawed language for user mode because it is missing GC, invented a decade earlier, and C++ added as much as it took away as each feature came with an added cost of complexity. C++ compilers converting to C was a good idea, but being a superset was not. C/C++ never died in user mode because there are now so many GC replacements, it created a situation paralyzing many to inaction, as there seems no clear place to go. Microsoft doesn’t have this confusion as their language, as of 2001, is C#. Microsoft is steadily moving to C#, but it is 10x easier to port a codebase like MySQL than SQL Server, which has an operating system inside. C# is taking over at the edges first, where innovation happens anyway. There is a competitive aspect to this.

Lots of free software technologies have multiple C/C++ implementations, because it is often easier to re-write than share, and an implementation in each GC language. We all might not agree on the solution, so let’s start by agreeing on the problem. A good example for GC is how a Mac port can go from weeks to hours. GC also prevents code from being able to use memory after freeing, free twice, etc. and therefore that user code is less likely to corrupt your memory hardware. If everyone in user mode were still writing in assembly language, you would obviously be concerned. If Git had been built in 98% Python and 2% C, it would have become easier to use faster, found ways to speed up Python, and set a good example. It doesn’t matter now, but it was an opportunity in 2005.

You can “leak” memory in GC, but that just means that you are still holding a reference. GC requires the system to have a fuller understanding of the code, which enables features like reflection. It is helpful to consider that GC is a step-up for programming like C was to assembly language. In Lisp the binary was the source code — Lisp is free by default. The Baby Boomer generation didn’t bring the tradition of science to computers, and the biggest legacy of this generation is if we remember it. Boomers gave us proprietary software, C, C++, Java, and the bankrupt welfare state. Lisp and GC were created / discovered by John McCarthy, a mathematician of the WW II greatest generation. He wrote that computers of 1974 were fast enough to do Strong AI. There were plenty of people working on it back then, but not in a group big enough to achieve critical mass. If they had, we’d know their names. If our scientists had been working together in free software and Lisp in 1959, the technology we would have developed by today would seem magical to us. The good news is that we have more scientists than we need.

There are a number of good languages, and it doesn’t matter too much what one is chosen, but it seems the Python family (Cython / PyPy) require the least amount of work to get what we need as it has the most extensive libraries: http://scipy.org/Topical_Software. I don’t argue the Python language and implementation is perfect, only good enough, like how the shape of the letters of the English language are good enough. Choosing and agreeing on a lingua franca will increase the results for the same amount of effort. No one has to understand the big picture, they just have to do their work in a place where knowledge can easily accumulate. A GC lingua franca isn’t a silver bullet, but it is the bottom piece of a solid science foundation and a powerful form of social engineering.

The most important thing is to get lingua franca(s) in key fields like computer vision and Strong AI. However, we should also consider a lingua franca for the Linux desktop. This will help, but not solve, the situation of the mass of Linux apps feeling dis-integrated. The Linux desktop is a lot harder because code here is 100x bigger than computer vision, and there is a lot of C/C++ in FOSS user mode today. In fact it seems hopeless to me, and I’m an optimist. It doesn’t matter; every team can move at a different pace. Many groups might not be able to finish a port for 5 years, but agreeing on a goal is more than half of the battle. The little groups can adopt it most quickly.

There are a lot of lurkers around codebases who want to contribute but don’t want to spend months getting up to speed on countless tedious things like learning a new error handling scheme. They would be happy to jump into a port as a way to get into a codebase. Unfortunately, many groups don’t encourage these efforts as they feel so busy. Many think today’s hardware is too slow, and that running any slower would doom the effort; they do not appreciate the steady doublings and forget that algorithm performance matters most. A GC system may add a one-time cost of 5-20%, but it has the potential to be faster, and it gives people more time to work on performance. There are also real-time, incremental, and NUMA-aware collectors. The ultimate in performance is taking advantage of parallelism in specialized hardware like GPUs, and a GC language can handle that because it supports arbitrary bitfields.

Science moves at demographic speed when knowledge is not being reused among the existing scientists. A lingua franca makes more sense as more adopt it. That is why I send this message to the main address of the free software mothership. The kernel provides code and leadership, you have influence and the responsibility to lead the rest, who are like wandering ants. If I were Linus, I would threaten to quit Linux and get people going on AI ;-) There are many things you could do. I mostly want to bring this to your attention. Thank you for reading this.

-Keith

Curtis spent 11 years as a Software Design Engineer at Microsoft before examining Linux and the open source side of things, which resulted in a change of perspective and a published book. See more about his book here, including a link to a free PDF version.

This content originally appeared on Keith Curtis' blog.

Display 55 Comments.
This thread is closed for comments
  • 0 Hide
    memadmax , April 16, 2011 8:47 PM
    The only thing linux needs is acceptance as an option for an OS on the prebuilt granny computers: DELL/HP/Etc Etc.. Once you have it as an option as standard equipment with those companies... its on.

    Oh, and I suppose another problem is there is too many damn flavors of linux out there. If one can rise above the rest like Ubuntu is starting too, the chances of linux going mainstream go up
  • 0 Hide
    jhansonxi , April 16, 2011 9:09 PM
    I wonder what he thinks of the D programming language.
  • 1 Hide
    goldenthunder , April 16, 2011 9:59 PM
    Totally agree. The Linux environment has to converge in some way, a common set of libraries would be a good choice.
    It doesn't have to be a specific language if the library can be used in it (Code in Python, library in C/C++).

    Duplication of core features...imagine you had a set of libraries for everything you want (like VB programers tell us), coding would be much faster and easier.
  • 0 Hide
    kcorp2003 , April 16, 2011 10:24 PM
    personally i'm creating my own programming language just for fun. its a lot of hard work. gets crazy at times. which makes me appreciate all other languages out there.
  • -2 Hide
    ivan_chess , April 17, 2011 12:26 AM
    memadmaxThe only thing linux needs is acceptance as an option for an OS on the prebuilt granny computers: DELL/HP/Etc Etc.. Once you have it as an option as standard equipment with those companies... its on.Oh, and I suppose another problem is there is too many damn flavors of linux out there. If one can rise above the rest like Ubuntu is starting too, the chances of linux going mainstream go up


    If you want it on "granny" computers you will have to do away with terminal because the target audience is too comfortable with GUIs. To be frank, the terminal is what makes linux great but also what scares the average joe away.
  • 3 Hide
    pelov , April 17, 2011 12:36 AM
    ivan_chessIf you want it on "granny" computers you will have to do away with terminal because the target audience is too comfortable with GUIs. To be frank, the terminal is what makes linux great but also what scares the average joe away.


    you mean a unix-based OS without a terminal? They have that and it's apple. As far as linux goes, you can always hide the terminal. i mean it is linux... you can do whatever the hell you want.
  • 1 Hide
    ivan_chess , April 17, 2011 12:51 AM
    pelovyou mean a unix-based OS without a terminal? They have that and it's apple. As far as linux goes, you can always hide the terminal. i mean it is linux... you can do whatever the hell you want.


    True, but there are just too many things that are easier to do in terminal or don't have a GUI component. Just installing packages is usually done with something like apt-get or yum. The first thing the community (or what the average joe will call tech support) tells you to do to fix anything is open a terminal window. Linux just isn't as computer illiterate friendly as it could be.
  • 1 Hide
    Anonymous , April 17, 2011 2:25 AM
    ivan_chess: Apparently you haven't used Linux in the past few years. You can download Ubuntu, install it through the Ubiquity GUI, then do all of your web browsing, photo viewing, etc... without touching the terminal. In the event something doesn't work correctly, the tech person fixing it may resort to using the command line, however, the same thing applies to Windows, so this isn't really Linux-specific. I do Linux development, and if I use the command-line, it's purely by choice, as there are GUIs for everything.

    The terminal argument is every bit as outdated as the "OMGz, you must get Nvidia if you run Linux, because their drivers R L337", even though in 2010, AMD's Linux drivers piss all over Nvidia, and AMD supports the open driver, whereas Nvidia is opposed to the open source Nouveau driver.
  • 2 Hide
    Anonymous , April 17, 2011 2:56 AM
    What does GC stand for?
  • 0 Hide
    STravis , April 17, 2011 3:18 AM
    pelovyou mean a unix-based OS without a terminal? They have that and it's apple. As far as linux goes, you can always hide the terminal. i mean it is linux... you can do whatever the hell you want.

    OS X has a terminal - I use it all the time...
  • 1 Hide
    Thunderfox , April 17, 2011 3:36 AM
    asplosionWhat does GC stand for?


    Garbage Collection. They really should have stated it in the article. I am familiar with the concept but not with the internal politics of Linux development, so I read most of it not knowing what the hell they were talking about until they mentioned C#.
  • -1 Hide
    randomizer , April 17, 2011 3:54 AM
    ivan_chessTrue, but there are just too many things that are easier to do in terminal or don't have a GUI component. Just installing packages is usually done with something like apt-get or yum. The first thing the community (or what the average joe will call tech support) tells you to do to fix anything is open a terminal window. Linux just isn't as computer illiterate friendly as it could be.

    Funny but that's often what tech support wants you to do on Windows. Having Internet problems? Well you'll often get told to run a trace route to see if the problem is at a specific hop. Some thing just don't receive a GUI because there is no reason to add one. A GUI should be used when needed, not just for the sake of having one. Too much software has a GUI that is so poorly laid out that it would be quicker to learn a few commands and bash (pun intended) them into the command line.
  • 1 Hide
    agnickolov , April 17, 2011 6:43 AM
    pelovyou mean a unix-based OS without a terminal? They have that and it's apple. As far as linux goes, you can always hide the terminal. i mean it is linux... you can do whatever the hell you want.

    Not true. You can open a terminal in Mac OS X. That's how I do my work there after all... (Though I usually just use an SSH connection to be frank, I definitely prefer Windows.) I'm a software developer if that wasn't clear :) ...
  • 2 Hide
    agnickolov , April 17, 2011 6:48 AM
    AS far as commenting on the article, I certainly try to steer clear of integrating open source code within my code. Working on proprietary codebases and all... :) 
    What struck me is the author never defined what GC means even though it's the central theme to the article. For those that haven't yet figured it (and it took me a long time to figure it out myself even though I'm a software developer), it means Garbage Collection - automatically reclaiming memory that the program no longer uses.
    Finally, I was under the impression that such GC lingua franca already exists -- it's called Java...
  • 0 Hide
    CyberAngel , April 17, 2011 6:51 AM
    Eiffel language was the best, but "not invented here" syndrome all but killed it.
    All the goodies of C++ none of the bad sides. Perfect GC
    Environment to operate with incremental compiler was amazing back then when computers were slow.
    The transparent design/programming is still...
  • 0 Hide
    dbranko , April 17, 2011 8:30 AM
    At least one userbase built on linux kernel has GC and clear default set of libraries. It's just that advantages of this did not occur to "community", but to another for-profit-company: Google. And they don't even call it linux.
    ---
    While at that. GC isn't good for everything, because at least implementations that I know of have to "stop the world" for at least some time during GC, which results in annoying pause while, for example, playing your game. As for leaks, tools like valgrind make it trivial to detect most of them.
  • 1 Hide
    Tjik , April 17, 2011 8:43 AM
    otacon72Linus will never be mainstream and the main reason, amongst many, is too many kernels.

    Linus won't become mainstream. He prefers to spend time with his family and isn't by choice a public figure. ;) 

    Linux has actually the opposite advantage: it's released as one main version. Not many as you seem to suggest. Linux releases follow a set schedule. That Linux is easily modified makes it just as suitable for standard as obscure hardware/implementations. I think you've misunderstood something fundamental about the nature of the Linux kernel. From kernel source you can compile it support practically all known platforms; hence you don't need to maintain different kernel sources.

    Microsoft on the other hand actually has a situation of not compatible kernels developed separately, even though they share some elements. That has also been pointed out as a hinder for how well Microsoft will adopt to the fast moving markets of smart-phones and tablets.

    The reason the Linux kernel itself is coded in C is simple: C doesn't forgive errors. Therefore Linus sees no benefit in coding in a higher level language that eventually would add more garbage, poor quality, code.
  • 3 Hide
    haplo602 , April 17, 2011 12:29 PM
    after 5 paragraphs I gave up. what the hell is the guy talking about ? a good way to start an article is to describe the base you are building upon.

    this seems to be written by a geek that never sees the light of day and has problems forming coherent senteces.

    I guess those 11 years at MS did leave a scar on his soul.
  • 5 Hide
    descendency , April 17, 2011 1:47 PM
    I'm a software engineer. I received my degree from a top state university. I've used "Linux" (Ubuntu and Redhat mostly), Macs, and Windows. I develop applications on Windows because of what I am about to say. I just want to make it clear, I'm not anti-linux or trying to bash it. There are tons of reasons to love it. However, as an average user there is a lot of reasons why you will be turned off before you even really start to learn it.

    Linux isn't a generation away, it's a revolution away.

    Most of the OSs that are put on top of the kernel are ugly, hard to navigate, and full of other usability nightmares. The learning curve going from a Windows or Mac environment is seriously steep. Sure, you can Google "how to do _________ in (insert distro name here)" and there is probably an idiot proof video on youtube, but you will be surprised how few home users can even do that.

    That isn't a function of what tools are used to develop applications. Linux software has too many engineers working on it and too few designers.

    It's really that simple and until people on the software side of this understand that, it will never get any closer to Windows or Macintosh OS (as a desktop OS - because cellphones and other devices are flocking to Linux based OSs)
  • 2 Hide
    godmodder , April 17, 2011 2:17 PM
    Seriously, I don't think the need for garbage collection is the biggest problem Linux has these days.
Display more comments