What is the right tech for a web-based chat?

A retrospective on writing ChatStrike’s chat interface

Published: 2022-10-14
Word count: 1605
Est. reading time: 5–8 minutes
Author: Isaac Shapira; Platonic Systems
Tags: elm

ChatStrike is a startup in the recruiting technology space centered around, you guessed it, chat. When Cameron Levy first approached me to work on a chat application for recruiters, choosing the right technology stack took serious consideration. At Platonic Systems, we default to purely functional programming unless there are compelling reasons not to. Because of the unique logic demands of this project, we considered several options:

Shpadoinkle in Haskell
Halogen in PureScript
React in TypeScript
Elm with TypeScript

Why Elm?

Since Shpadoinkle is a project of Platonic Systems, one might have expected us to choose it from the start. Don’t get me wrong: Haskell is a fantastic language for frontend development. But for this project, it had some drawbacks. For example, there is only one pathway to run Haskell code in a browser: GHCjs. While GHCjs produces suitable output for most common business cases, the artifact size is large and computationally expensive.

ChatStrike has to run on phones—and not with an app, but in the mobile browser. Computationally expensive is a non-starter in a CPU-poor environment. A heavy artifact means slow start times, particularly when there’s poor network connectivity.

While Haskell would have delivered a technically correct application, it would not have serviced critical technical goals like performance. The value of functional programming in a user interface is to improve the user experience through correctness, not the other way around. So Haskell was not a good choice for this use case.

The value of functional programming in a user interface is to improve the user experience through correctness, not the other way around.

The same was ultimately true of PureScript. Purescript can be an excellent choice for UI development, but its Halogen artifact sizes are not much smaller than those produced by GHCjs. And with many optimizations still missing from the PureScript compiler (such as “fast curry”), it’s just not going to run well on some old Android phone in a rural community.

So what about the corporate stack with React? React can be written in a purely functional style, and TypeScript does provide some safety. React could also fulfill the technical needs in terms of artifact size and computationally efficiency. The problem here is that TypeScript lacks most of the features we use to guarantee correctness in functional programming. While TypeScript is a massive win over JavaScript, its primary value lies in being documentation and IDE assistant, not in providing type safety and immutability—features that were critical to developing this production chat client.

We chose Elm not through elimination but by facing all the critical issues involved. First, its type system is robust enough to provide what we need in terms of correctness. It’s nearly impossible for Elm code to throw a runtime error, a boast even Haskell cannot match. Second, its output outperforms React in computational efficiency and artifact size. Neither fact is really surprising, as Elm’s virtual DOM can optimize around assumptions of purity in a way that React simply cannot. And avoiding the node_modules nightmare means artifacts are lighter in practice, not just in theory.

So there we have it: a purely functional language with strong type guarantees, a low error rate, lighter artifacts, and high performance. Elm strikes the perfect balance for a logically intensive UI that can run well in a resource-poor environment.

What’s the downside?

Elm is a hexagonal design taken to its logical extreme. It does not allow for any inline execution of side effects. Not even Monadic IO is supported. The application architecture is dictated at the language level and not by frameworks in userspace. This architecture comes with many problems, such as high levels of boilerplate and difficulty in reasoning about subsequent updates to the state machine. In short, Elm is highly rigid and does not provide much room for experimentation or improvement.

Does this slow down development compared to, say, Haskell? Absolutely. Provided we are talking about boring Haskell. Few things rival the productivity loss of dependently typed programming in Haskell via singletons, including the mountain of boilerplate in Elm. The boilerplate in Elm grows sub linearly with application complexity, as not all aspects of code re-use involve stateful computation and is essentially a mechanical process to implement. While it is a regrettable aspect of Elm programming, it’s not too bad.

Builds

We build with Nix. Nix is a functional programming language that provides deterministic reproducible builds, among other things. Nix allows us to describe our build using `bash` by default. With it, we can do just about anything. We can use WebPack or Parcel or any number of build technologies within the project.

So what did we use?

We used bash scripts with direct calls to compilers. Why? Every bundler ultimately failed to provide value. Reasoning about bash scripts with calls to `elm compile` or `tsc` was just easier. It also made tracking down errors and modifying the build pipeline trivial. Unfortunately, bundlers are complex software projects that tend to swallow errors and require a deep buy-in. In the end, our bash script approach came out to about 20 lines of code to do everything we would typically get from WebPack.

Thanks to nix-shell hooks, we provided the same build scripts used in the nix build to the nix-shell, along with a comprehensively curated developer environment just as deterministic as our builds.

This allowed for the smooth handling and building of artifacts across many different languages, specifically:

Elm
TypeScript (for “ports” code)
Sass
EJS
SVG

Despite each having its own processing steps, Nix didn’t even break a sweat.

What went wrong?

In over two years of full-time development, what went wrong? Well, plenty went wrong, but much less than I would have expected from a JavaScript application.

Runtime errors occurred almost exclusively in the JavaScript layer. A few errors from Elm’s runtime did happen, but primarily due to browser extensions such as 1Password and Grammarly. These extensions sometimes manipulated the page’s Html, causing Elm’s virtual DOM algorithm to misfire, sometimes in ways visible to the customer. This was not good but was easy to fix by explicitly disallowing these extensions on `` elements.

Setting up proactive error monitoring for the front end turned out to be valuable. Not many applications have a black box on the client-side to recover errors, but perhaps it should be more common. In addition, we provided visibility into production issues by getting all exceptions thrown on the client to appear in a Slack channel.

Administrative forms are where we saw breakages, not in terms of errors, but rather in fields not persisting. Somewhere between the API calls and the form view code, something would go wrong and have to be tracked down manually. Administrative / CRUD forms have three critical failure points:

Parsing at the API boundary
Conversion from backend to display
Conversion from display to the backend

The first point is usually best addressed by a type system that shares types between server and client. However, with Elm, this is not possible. So if one cannot share type information or property test/universally quantify JSON encoding and decoding, it’s best to make the parser extremely unforgiving. This way, invariants the client expects are maintained at the furthest possible boundary, allowing the bulk of code to operate with these invariants assumed. Additionally, failing fast means less branching logic, and backend engineers get alerted quickly to data breakages.

Points 2 and 3 are usually best addressed by Generic Programming, where the shape of the data dictates conversion. In other words, we write conversions generically to work with all forms. Sadly, most programming languages lack support for Generic Programming. Both Haskell and PureScript have this feature, but not Elm, and certainly not any mainstream JavaScript dialects. So while we still have to live with manual testing to ensure forms persist round-trip, this is hardly a significant loss given that the proper resolution doesn’t exist in most systems anyway.

While you can argue that dynamic programming would solve this, it’s not easily done without onerously stressing the backend API to supply type information or substantially increasing the complexity of parsing. However, since Haskell and PureScript are off the table for the reasons mentioned above, we have to live with this compromise. Moreover, it’s a compromise that should be explicitly acknowledged as a weakness of Elm’s compiler.

What went right?

Luckily, almost everything about this project can be hailed as a technical success. The chat client has demonstrated a high degree of resistance to technical adversity, operating smoothly in network, memory, and processing constrained environments—such as with rural users with aging smartphones.

Elm delivers what it promises in terms of runtime errors. In addition, Elm code is extraordinarily pure and side-effect free, even more so than Haskell or PureScript. Elm also proved to be a great language to express recovery logic and other critical algorithms to the chat client, with much less effort and enhanced correctness.

Handoff of functionality to other engineers also proved to be relatively smooth. Elm retains key benefits like referential transparency and equational reasoning, making the functional code easy to read and understand. Elm is also a comparatively simple language. Therefore, it doesn’t incur the readability and knowledge transfer risks when functional programmers reach for fancy tools like simulating Pi-Calculus with singletons.

The readability benefits of Elm’s disallowing of fancy type-level programming is a bit of an unsung design win, particularly on a project where future development will require contributions from more junior talent.

In Conclusion

Elm is not a panacea, but it was absolutely the right choice for ChatStrike. Elm cleanly fits our technical evaluation criteria for use within ChatStrike’s specific early-stage startup scenario:

Assurances and guarantees of correctness
Performance constraints
Fit for current and future staff

Elm is not the right choice for every project. But I feel strongly that for Web-based chat clients that need to operate in resource-poor environments, Elm excels.