“To expect the unexpected shows a thoroughly modern intellect.” — Oscar Wilde
There have been many disastrous incidents in the recent history caused by computer programs that could have been avoid. In some of these cases the issues would have surfaced a lot earlier than they did if proper methodologies and practices were followed. These incidents affected people's lives, privacy, and finances among other things.
These failures are often spectacular and catastrophic. We’ve smashed spaceships into distant planets, blown up expensive rockets filled with expensive equipment and irreplaceable experiments, medical machines delivering a lethal dose of radiation to patients due to a race condition resulting in death and serious injury, and stranded airline travelers on a semi-regular basis.
Secure programing is comprised of sensible programming concepts and methodologies that are about putting safe guards within the program to avoid situations where malformed data, intentional or not, makes our programs do things it isn't supposed to do. Using these concepts does require a different mind set from what most us are use to.
Security as a Concern
It all starts with thinking about security as a concern and not a feature. The difference between a feature and concern is that a concern goes beyond what a feature may be addressing.
For example, we are building a notes app. A feature request is submitted to add a log in screen to the app in order for a user to access their notes. When this functionality is implemented, the requirement is satisfied and before accessing the notes, a user has to log in. The developer created the screen and it went through testing and now the app requires the user to log in when the app is launched.
It turns out that once the user is logged in, the notes are delivered with a static URI and that URI does not take a user's authentication into consideration. Meaning, if another user has a direct link to a note containing sensitive information (credit card, financial information, etc.) that you have added under your account, they will be able to see that note, no log in needed. The whole point of having a log in screen is moot.
The issue here is that the request was formulated as a feature. However, it stemmed from a concern about security of an individual's notes and what they can access. That means, just the fact of having a log in screen did not address the security concern; but it did address the feature aspect of it. The screen only enforces that a note can only be viewed by the user that has been authenticated and is the owner of that note.
Trust Is Earned Not Given
Receiving external input, be it from the user or an external service, is a reality for most of the applications. By its nature, external input can take any number of permutations and it is near impossible to account for every one of these variations. This also opens the door to malicious data that could be received. Therefore, our approach is to not trust any piece of data that is received by the app, until it has been validated.
Our apps deal with highly sensitive data, and to instill trust with the user we aim to protect that data as best as we can. As part of our "fail fast" methodology, we validate every piece of data at the point where it is received and when validation fails, that data is immediately discarded. We propagate an error, up the chain, where it is handled according to the context in which the failure happend (we talk about this a bit more in the in the Offensive Programming section below).
In some instances, we deem the validation failure to be a “Catastrophic Event”. These are events that we simply cannot recover from; because the information that the app needs to proceed is not available or cannot be determined. Therefore, when these events happen, we gather metadata and related information around the failure, clears out all sensitive information that app may have stored in memory and begin the graceful shutdown of the app. We notify the user that we have encountered a situation from which we can't recover from and allow the user to close the app.
The presence of the "Catastrophic Event" is very important; because it helps with not only preventing the app from going into an undefined state; but it also clearly pinpoints origin of the issues. This practice has tremendous value for testing and making sure that a program behaves correctly and predictably when receiving all kinds of data. As a side-effect this practice also benefits external services by surfacing issues that were either being masked or overlooked by this external services; so they can be resolved by these external services.
This validation is not just for the data that is received, but also, as is the case with networking, every connection that is made to a server, is subject to this validation by way of public key pinning. This is a fantastic practice and having it in the app provides a high level of confidence in the communication with external services, especially in a public setting; preventing man-in-the-middle (MITM) attacks.
Securing and vetting each networking connection, that connection needs to happen over the secure protocol (e.g. HTTPS). In addition, for every external service that the app uses, the public key of the certificates that are configured for that service needs to be embedded within the app. At the point of establishing a connection, the server sends the cert to the app, and the app matches the public keys that are associated with that cert with those that are embedded in the app, and only allows the connection to move forward when both keys match; otherwise the connection is immediately dropped, because the authenticity of the connection could not be established.
This is all about having situational awareness of the data that the app is using and how it is being handled. A lot of the data that the app will use will be deemed sensitive, such as password, financial information, a free form note field where a user can potentially input sensitive data, etc. Therefore, it is our responsibility to be mindful of this information and how we are using, storing, and/or logging it.
The use and storage of sensitive information in memory becomes even more critical in an environment where the reliance is on a garbage collection mechanism to clean up memory that is used by the application. In this environment the programmer does not have any control over when the data will be removed from memory. Therefore, when sensitive information is used in a particular scope and after we are done using that information; the expectation is that we fill the variable with garbage data. Doing so will prevent another process from reading sensitive information from the variable's memory address, because now it is filled with garbage data, before the garbage collector has had a chance to clean it up once the variable has gone out of scope.
By default, within the codebase, we do not log any passwords in plain text (or at all). When a password is provided by the user (e.g. when logging in or at other clearly marked password input fields, if any) it is immediately stored in the secure storage instead of it being held on to it in a variable or being passed along to the other components of the program by way of function, methods, or messages to other objects.
In the instance where a password needs to be stored into the variable, it is only held in that variable until the end of its scope; after which it is removed. This could be done manually or by using a custom type (if the language allows) that will fill the variable with garbage before it is handed off to the system for clean up.
For network communications, the request/response contents is never logged; again this should be the default. This prevents sensitive information from being leaked into plain sight. There are exception to this where the logging mechanism provides some form of secure logging, such as those systems that are in compliance with Health Insurance Portability and Accountability Act (HIPPA) in the United States.
Don't Roll Your Own Crypto
Crypto is hard. This is very important and worth repeating, Crypto is hard. Therefore, in order to lock down the data that is used/created by your app, don't try to write the encryption that is used by your product to lock down these files. Home-brewed cryptography is more prone to bugs, and most likely haven't been reviewed or tested in the wild.
In the event that you do need to reach for the home-brewed crypto solution, don't. There are better and mature encryption options that are available, and it is better to use the encryption schemes that have been vetted and studied. Even then a comprehensive design of the solution needs to be followed by a thorough review by your team and security experts.
Storage and Caching
Any information that is deemed sensitive and needs to be stored, is to be stored in a secure storage; either one that is provided by the platform or a custom one that is developed for the specific purpose of the program that will be using it. All data used by the app that is held in memory (and for the most part it should be treated as immutable) must be immediately cleaned up and discarded when the session for a user is terminated; this also needs to happen when a Catastrophic Event occurs; which is another benefit of having this mechanism in place.
Covering All Paths
Expecting the unexpected has major benefits in making our programs resilient and avoiding undefined states. This practice helps in catching subtle bugs that would have otherwise alluded us. As a side benefit, it improves the code readability tremendously and helps spot logic problems during code reviews. Majority of these subtle bugs happen because not all paths that can be taken are covered. As such, each and every path that can be taken needs to be covered, even if that path is considered a "this will never happen" path.
Covering all paths make couple of key considerations explicit:
- A path was not unintentionally left out.
- A path that should never be taken is clearly called out and handled accordingly.
The practice of leaving a comment for a path, where nothing needs to happen and invoking a function
theUnexpectedHasHappened() when a path should never be taken; has served us well in catching bugs that would have alluded us if these were not in place.
theUnexpectedHasHappened(), essentially gathers metadata and related information around the failure, clears out all sensitive information that the program may have stored in memory and begin the graceful shutdown of the app.
While not opposed to it, Offensive Programming practices can be implemented along side Defensive Programming practices. The goal here is to differentiate between expected and preventable errors. This distinction provides a level of confidence within the codebase that when all components work as expected, preventable errors will not happen.
This can be defined as failing fast. At the moment an unexpected error is encountered, it needs to be handled and propagated up and ultimately the program needs to gracefully shutdown. There should never be any errors that are silenced, and all errors should be handled, even the unexpected ones, by calling a specific function like
theUnexpectedHasHappened() (as mentioned in the previous section) so that appropriate measures and steps can be taken.
Gracefully shutting down is just one way of failing fast. How the error is handled is predicated on the context in which the error occurs. Different contexts have different needs and in some contexts gracefully shutting down the app is overkill. For example, when receiving metadata from an external service and the validation for that data fails; in a production environment it would be overkill to shutdown the app when an optional piece of data fails validation. This situation can be properly handled by dropping the metadata and not allowing the user to view the screen where that data is to be shown and instead notifying the user of the error that is suitable for that context.
During development and in a test environment you do want to fail fast; because it will highlight weakness in the code and will help to make it more resilient; along with surfacing use cases that may not have been considered when the code was initially written.
I have used these practices in real world applications for a while now. These practices have had an enormous positive affect on the codebase and have lead to higher code quality, better readability, and code behaving in a predictable manner. It has reduced a great number of bugs and have contributed to the high stability and quality of the program.
It definitely is an adjustment and shift in mindset than what we have generally been doing. However, this different mindset, once we come to understand its importance and why these practices are better, then using them becomes second nature and it is no-more time consuming than writing bad code.
- The Checklist Manifesto by Atul Gawande
- Code Complete by Steve McConnell
- Secure by Design by Dan Bergh Johnsson, Daniel Deogun, Daniel Sawano