Thursday, August 27, 2015

Assembly and C

Sebesta makes an interesting and (to my mind, anyway) highly debatable statement in his review of programming language evolution: that the development of assembly languages had no real impact on high level languages, so it isn't considered as part of the review.

On the one hand, I see his point. If anything, assembly pre-processors started offering macros borrowing control structures and data definitions from high-level languages in an effort to impose some structure on an unruly code base. However, to view the development of C without consideration for assembly structures misses the whole point of the language.

During the 1960's, application programmers moved away from assembly en masse for obvious reasons. The performance hit was far outweighed by the gains in development time and maintenance. System programmers, on the other hand, stuck with assembly because they needed both the efficiency and the direct access to the hardware. Recognizing this, C provided features that extended the capabilities of assembly structures to the semantics of the language. If C had not provided things like register variables, absolute memory addressing, pointer arithmetic, not to mention the radical idea of "in-line" assembly blocks, the language would not have gained acceptance among system programmers.

Consider the following piece of code that exists in just about every system program ever written in C:
while (*target++ = *source++);
We've written this so many times, it's come to be thought of as a high-level structure, but it's not. A high-level structure is something that imposes structure on top of the underlying implementation. This is the opposite. Machine designers figured out very early on that copying strings was both common and expensive. So, they started building in instructions that would perform block memory to memory copies given two starting addresses and a terminating condition (typically a value of zero or a specified byte count). The above "loop" captures all the side effects of such an instruction, so the compiler can replace the construct with the single instruction.

C is loaded with these sorts of things. *p is MOVI; += is ADD; i-- is DECR. Array indices start at zero instead of one because that's how assembly works. Assignments can be made inside conditional expressions because no competent assembly programmer would ever re-test a result they just saved. All these features are mappings of assembly constructs into a high-level context, not the other way around.

As C still sees heavy use as a systems language, not to mention the enormous application code base written in its offspring, C++, and grandchildren, Java, and C#, I'd say that languages, and the programmers that use them, have been very much influenced by the design of assembly instruction sets made in the 60's (even if many of them have never programmed at the machine level).

No comments:

Post a Comment