Just listened to @nasser@merveilles.town's talk on multilingual programming, titled "A Personal Computer for Children of All Cultures" again.
As a (for now) linguistics student I really like this talk and highly
recommend it. But also as a linguistics person coming from a programming
background, it has me thinking and I have some questions and ideas
I want to voice, with the belief that asking these questions early on in
a project like Ramsey's will help us design these solutions such that in
departing from the domination of English in programming languages and
communities, we don't involuntarily find ourselves in another form of
inequity's dominion: that of monolingualism, which itself comes from the
exact same source as English's global dominance and destructive status.
First of all, I think the next step / next big question here is how to enable bilingual programming, code switching in code.
Code switching is extremely common, and in ways we don't often think it exists. E.g. languages have registers and styles, and we go between these pretty frequently (e.g. formal to informal, programmer jargon to kitchen jargon to just small talk [hehe] vocabulary), besides switching between more major linguistic varieties, like what we call languages and dialects (which are political terms and not linguistically sound, but I'll avoid that discussion here).
Could that happen in code within this framework?
So my languages are Turkish, English, and Italian. With Ramsey's ideas, I can write modules that are in one language or another, and my whole program can be multilingual. But could it be possible for a declaration, say the body of a function to be code switching between Turkish and English? I could of course do that with "local identifiers", using Ramsey's terminology, but could I also do it with keywords and external identifiers? Because it's very common for a bilingual community to do code switching not only at conversation or whole text level, not only between sentences, but even mid-sentence.
So imagine:
int main (void) {
const char* w = "world";
puts(sprintf("hello, %s", w));
return 0;
}
How could we allow, then:
sayma_s baş (boş) {
sabit harf* m = "il mondo";
puts(sprintf("ciao, %s", m));
ritorna 0;
}
which starts out with Turkish but outputs and ends with Italian, and has some English identifiers in the middle. (There's also the %s in there which is a complicating factor, as it definitely comes from the English string, but that can be completely replaced with something like string interpolation probably.)
This is a toy example of course, but there can be real-world situations where this becomes a cultural question. Imagine me collaborating with an Arabic/Armenian/Greek/Kurdish-speaking programmer on a given module as a speaker of Turkish. There's a cultural domination/injustice relationship there, and every time we decide on a module's language, that'll come into play as I'm relative to them, privileged. And it's not only a me-question, as it's likely that this decision takes place in Turkish-dominated spaces in Turkish-dominated conurbations and political settings.
And then a related question is of course what linguistic varieties get access to being a "language" versus a "dialect" versus an "argot/jargon/style/slang" and similar. None of these categories are scientifically sound, they are all political. Which is why we invent terms like "variety", "register" and similar in linguistics, because the structural properties are seldom what political properties capture.
This of course leads us on to the question of how we encode
linguistic varieties, how do we decide which linguistic variety is
active for a given snippet of code at each level, and how do we do this
without making it difficult so that the devised solutions don't lead
English or some other lingua franca to take over all other practical
uses of the solution. Yes we have international codes for languages, but
they are also centrally gatekept by institutions of the Western world,
and they carry the same (de)politicising linguistic ideologies that
today govern the statuses and the status quo regarding which varieties
get to be called languages and which dialects, which get representation
and which are devalued, which are kept around and which are left to
wither.
⁂
Another question is how this maps to existing ways of combining multiple programming languages, because it poses both opportunities and challenges.
E.g. we readily use the ironically named FFI's to communicate across programming-linguistic boundaries, so using extern "C" or it's analogue in many programming languages, you can combine them at some level. And there are other facilities, like RPy, Pymacs, and similar. I think reworking these a little bit should actually really help with going beyond human-linguistic boundaries in programming too.
For example new ABIs can be developed for existing libraries that do
not use the English names, but some other identifiers, hashes or
otherwise. I believe (as a fairly inexperienced programmer when it comes
to anything beyond small stuff and scripting, but still) that there
should be ways to incorporate the existing codebase the world has
developed into an emergent multi-human-lingual paradigm of programming
without simply having to rewrite it all.
But also we have other ways of multi-programming-lingual combination,
or code switching, if you will. These manifest themselves in the likes
of Knuth's literate programming or Emacs' Org Mode's and Rmarkdown's
similar-but-not-exactly-the-same mechanisms. Could we exploit these
systems' ideas in developing programming environments that can combine
multiple human languages and multiple programming
languages? Why shouldn't that be possible?
Because in Org mode, which is the system I'm most familiar with at
this point, the programming languages bit is at least possible,
practical, and also highly useful. For example consider this
setup
script I have for my Raspberry PI which combines Emacs Lisp and
Bourne Shell programming languages liberally, using Org Mode's
mechanisms for doing so. (You can search for begin_src in
the file to explore how the two very different languages are used and
combined in the literate script.)
These literate programming environments could easily be used for any
compiler for a multi-human-lingual programming language/environment,
that's pretty straight-forward, but what's food for thought is how such
a sytem can take advantage of the ideas and tools developed by the said
literate environments over the last ~50 years, despite relative
obscurity among especially professional programmers.
⁂
This is all I have for now. I am really excited for a future where
programming becomes customarily multilingual in both human and
programming language dimensions, because as someone who is advancing
towards a career in academic scholarship and as a long-time hobbyist
programmer, and as a non-native speaker of English, I have personally
experienced how limiting it can be when programming tools are
exclusively targeted at English-speaking professionals, and what sort of
things become possible once we start breaking those barriers.
I believe Ramsey's doing god's work in breaking some of these
barriers with thinking about how to make programming work for all human
linguistic varieties, and hope that this text here contributes some
questions/ideas to consider in such efforts. Really, thank you Ramsey!