🪨

Hardening guides

Preamble & credits:

Author: Emmanuel Odeke, Orijtech Inc.

Contributors & Editors: Nathan Dias, Jean-Philippe Aumasson, Chainguard Inc by virtue of their Q2 2022 Supply chain assessment for Cosmos

Target audience: Cosmos Ecosystem https://cosmos.network/ and Go developers

Versioning

VersionStart dateEnd dateCommentsAuthorsEditors
v1.0.0Saturday-12th-August-2022Tuesday-23rd-August-2022Emmanuel OdekeNathan Dias Jean-Philippe Aumasson

Testing:

Fuzzing:

What is fuzzing?

Fuzzing is the art of employing randomized state to generate more test cases than can be cognitively crafted by a human. Most modern fuzzing is coverage based in that it is applied code paths and tries to mutate known inputs while being guided with what’s similar to good inputs that cover more new paths. Go 1.18 (introduced in March 2022) and above has native support for fuzzing per https://go.dev/blog/go1.18 but these are prior guides for fuzzing in Go https://go.dev/doc/fuzz/ We use oss-fuzz for continously fuzzing cosmos/cosmos-sdk and tendermint/tendermint and when get informed of continuously presented bugs.

Fuzz trophies:

The impetus to fuzz is the wide spectrum of bugs that can be uncovered just by guiding the fuzzer’s genetic algorithm with inputs. Here is a glimpse into some of the long lists of critical bugs that go-fuzz by the excellent Dmitry Vyukov has uncovered https://github.com/dvyukov/go-fuzz#trophies in a variety of critical libraries.

Goroutine leaks:

func doWorkGood(ctx context.Context, jobsCh <-chan *W, do func(ctx context.Context, *W)) {
    for {
       // Otherwise carry on doing the work
       select {
       case w, stillOpen := <-jobsCh:
           if !stillOpen {
               return
           }
           do(ctx, w)
       case <-ctx.Done():
           return
       }
    }
}

func doWorkBad(jobsCh <-chan *W, do func(*W)) {
    for w := <-jobsCh {
        do(w)
    }
}

File permissions:

UNIX file permissions are expressed in Octal (base 8) as most file permissions cover 3 classifications of users:

Please see https://en.wikipedia.org/wiki/File-system_permissions#Traditional_Unix_permissions

Example permissions can be 0o755 or just 0755 which can be interpreted as:

and collectively 0755 and 0600 mean

SplayedCurrent user/ownerGroupOthers
0755111-101-101Read-Write-ExecuteRead, ExecuteRead, Execute
0600110-000-000Read-WriteNo accessNo access

Below is an illustration of what they are like and how to interpret octal UNIX permissions

Signal handling:

Data races & synchronization:

Building binaries:

Container images:

Integer overflows and underflows:

Integer overflows occur when a numeric value is shoved into the limited register width precision/range of an integer type and thereby changes sign or interpreted value due to warping. Integers are categorized as signed or unsigned. Signed means that it can be negative or positive; while unsigned means that it can at minimum be equal to +0 and thus only positive within its maximum range. The register width determines how data is interpreted for various primitive times. For signed values, the register has to maintain a carry bit, while for unsigned values, there is no need to maintain the carry bit, which is why

TypeMinimum valueMaximum value
Signed int 32 bits-21474836482147483647
Unsigned int 32 bits04294967295
Signed int 64 bits-92233720368547758089223372036854775807
Unsigned int 64 bits018446744073709551615

If we think of a register width as a clock, warping happens by continuing clockwise

The minimum value of a 32-bit signed integer is: -(1<<31) -2147483648and the maximum value is: ((1<<31) -1) 2147483647 thus if an untyped constant value greater than ((1<<31)-1) is shoved into an integer value for storage, its value warps around to the alternating signs while being deducted from maxInt until the deficit fits within that range.

Integer overflows can occur when uint* values are cast to int* values and likewise when int* value are cast to uint* This usually happens when values that are out of the respective register width.

Original type & valueCast type and resulting valueOverflowReason
int64(-1)uint64(x) = 18446744073709551615YesThe bit pattern of -1 is 0b1111111111111111111111111111111111111111111111111111111111111111 which when interpreted for unsigned values is the maximum unsigned value of 0xffffffffffffffff; due to the warping around
uint64(1<<63)int64(x) = -9223372036854775808YesThe value x overflows the max range of int64 when warped passed ((1<<63)-1) falls on the minimum value of int64 due to the extra +1
int32(-1) uint32(x) = 4294967295YesThe value x is negative and when its bit pattern is interpreted and stored in the register for a uint32
int64(2000)uint64(x) = 2000NoThe value x is in the range of uint64

Fix

To protect yourself against overflows, avoid blindly casting values between unsigned types. Ensure you set guards against the minimum and maximum values and reject values out of the correct ranges. We have found bugs that resulted from callously casting from int* to uint* values from unchecked bounds code

Periodic supply chain analysis for dependencies:

Our supply chain security partner and vendor Chainguard Inc periodically conducts supply chain security audits of the Cosmos ecosystem and many related supply chains. For July 2022 aka Q2 2022, they produced a comprehensive report per https://drive.google.com/file/d/1BCDUSZ3cSdO8FTD9A-nA21_iViONoFln/view which contains plenty of gems related to not only the Cosmos ecosystem but for general secure software supply chain security. We highly recommend that everyone read the report and internalize the suggestions which include tightening security and checks using much better processes like more security containers, using Chainguard Enforce, OpenSSF score cards to increase the “Supply chain Levels for Software Artificats” aka SLSA compliance levels.

Maps:

var m map[string]any

// Then later on in code, accidentally set without
// initialization
m["key"] = "value

will panic per https://go.dev/play/p/iAy0L-Iykm4 with

panic: assignment to entry in nil map

goroutine 1 [running]:
main.set(...)
	/tmp/sandbox2664608898/prog.go:4
main.main()
	/tmp/sandbox2664608898/prog.go:10 +0x49

the proper fix would be https://go.dev/play/p/fr4jp8sQxv0 or inlined below

Permissions on already opened file handles might not reflect even after modifying operations like .Chmod, .Close

TL;DR: If I have Read-Write-Execute permissions on a file and I opened a handle with those permissions successfully, until I close, discard that handle then re-acquire it, the initial permissions will be retained and even if you change the permissions to no access, I’ll still be able to read, write and execute that file.

Here is an exhibit in a variety of programming languages https://gist.github.com/odeke-em/4bf530f3dc6bb4993a1f1d12dac72d46 of the problem

Understand the order of package loading in Go:

TL;DR: When auditing supply chains and to figure out attack susceptibility, ensure you understand the order in which packages are loaded per the Go specification

In the Go specification and compliant implementations, packages are loaded in declaration order and further loaded sooner the less variables that they depend on, per https://go.dev/ref/spec#Package_initialization and basically determined by the order in which files are presented to the compiler

Cryptographic randomness:

TL;DR: Always prefer to use crypto/rand.Reader or crypto/rand.* to seed functions that require strong randomness. Avoid math/rand functions as much as you can, because they create predictable sequences with low entropy. For example, use cryptographic randomness for: generating cryptographic keys, secret tokens, temporary passphrases, session IDs.

Randomness backdoor sanity test:

In Go, variables can be swapped out at runtime. The very important and highly venerated crypto.Reader is used by all cryptographic functions in the Go standard library and in many important libraries like hashing algorithms plus core infrastructure. By definition given that crypto/rand.Reader is just any other variable, it is mutable by changing its reference so the following code is valid and will have an actual effect of changing every use of crypto/rand.Reader to have the new behavior. PLEASE be ware of this supply chain attack!

import "crypto/rand"

type panickingReader int
func (pr *panickingReader) Read(b []byte) (int, error) {
     panic("all I do is panic!")
}

func init() {
    rand.Reader = new(panickingReader)
}

Fix/Suggestion:

To guard against such issues it is highly recommended that before critical functions, if feeling extra paranoid, please make sure to perform a sanity check on crypto/rand.Reader such as in your cmd/*/main.go file which isn’t a package and can’t be loaded (aka is a terminal point in your init function that you control), then please run for example per https://go.dev/play/p/Zfk0-rXwZcZ

crypto/tls: avoid using tls.Config.InsecureSkipVerify

Many servers out there that use TLS might be tempted to use crypto/tls.Config.InsecureSkipVerify so as to allow for server tests to run locally, but then mistakenly share that code with production services. With that flag set, any certificate can be mis/presented and this allows MITM attacks due to no verifications :-(

Fix

Instead, prefer to use net/http/httptest.NewTLSServer in tests for TLS servers like this

Use html/template to generate HTML to avoid security exploits like XSS

It might be tempting and seemingly most convenient to generate HTML just by string concatenation, but it is much safer to use html/template which is built to allow code pipelines and be safe against code injection. This packages allows HTML to be dynamically generated by passing in what could be untrusted data into template.Execute as data inputs. The html/package package has been hardened for security to allow escaping and here is a demo that is runnable locally

Exhibit

Results:

Remedy:

Use html/template to generate HTML and NOT text/html nor string concatenation to build HTML content.

Iterate on extracted and sorted map keys instead of directly on map iteration which is non-deterministic

For code that’s supposed to be deterministic, avoid iterating on the map itself and instead use for example golang.org/x/exp/maps.Keys and golang.org/x/exp/slices.Sort. The Go specification allows that map iteration will be in randomized order per https://go.dev/ref/spec#For_statements

reason why this is very important is because:

given the same code and the same sequences, getting entirely different results dependent on an uncontrollable factor like a randomization seed causes lots of problems for systems that are trying to reach consensus

Exhibit of the problem https://go.dev/play/p/-STb7u5QzhN or inlined below

which produces different hashes in 5 iterations, yet given the same code

Iteration #1 : 40fd9fe62e7f550849279076cb931d153983cb6fe2775e686b3c35f76df8d629
Iteration #2 : 78681526b89bfb9e9f89cf5d779f5c820f92883edc88699c680a3d1af5825d19
Iteration #3 : f5b1a57dabf1fe02f9d629cd3d70ae387c85c03e4c2087e7a76d74ea62cc1321
Iteration #4 : 3c126d11f061e16e0f8827c3b11e77f1bfa09c380c357d9b3b6e1242f6c29eca
Iteration #5 : 97414e29d6462ac46c6337cd0b5d90aa3523917ae21869c3df0be83bd1b8d7f4

Fix

The correct fix for this problem is to use the extracted keys and sort them

Aim for 2+ person code reviews before merge:

The more eyes that look at code, the shallower bugs get, as paraphrased from “Linus’ Law” https://en.wikipedia.org/wiki/Linus's_law Getting a second, third and more opinion on code before submission ensures diversity of experiences that can pitch in but also others with different eyes can point out what you might have missed, but even more to build trust. Always strive for 2+ person code reviews.

Avoid rolling out your own algorithms if battled tested libraries exist:

Writing software involves taking ideas and interpreting them by writing code which a computer exists. The cognitive load of writing code means that almost every line of code is subject to misinterpretation unless carefully tested and audited

Binary search is CS-101 whose amazing result is that for sorted sequences, is guaranteed to find a match or not in log2(n) so if n = 1 million, it’ll run in 20 steps. It is useful and one of the most studied algorithms that almost everyone can recite in their sleep. Sounds pretty trivial and the algorithm’s description on Wikipedia and as commonly described is:

Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

When put into steps:

Most folks will go implement the algorithm as is, but there is a subtle bug which is appalling an integer overflow when computing the average mid naively such as:

mid := (start + end)/2

the result can overflow because the result of (start + end) can overflow in situations with large lengths of seq, when that value exceeds the maximum integer value of either (1<<31) - 1 on 32-bit systems or (1<<63) - 1 on 64-bit systems

The fix for this is to make the result of the addition firstly a uint value then performing the division so

mid := int(uint(start + end)/2)

or

mid := int(uint(start + end)>>1)

and NOT

mid := (start + end)/2

Fix:

Please use standard library and well written packages

Avoid rolling out your own cryptography:

Cryptography is a very intricate and highly complex subject; a tip is to find popular and well recommended cryptographic libraries that are battle tested such as from the Go standard library and use them. Cryptography is critical to modern secure communications yet broken cryptography is so much more easier to build, and just subtle broken code can be the difference between security and insecurity. Cryptographers continually attack cryptographic schemes and find novel attacks then retire algorithms, please see for example the CRIME attack present even in well audited and critical infrastructure. There are very subtle/stealthy attacks such as timing attacks that can’t be detected except by very highly sophisticated tracing but that’s virtually impossible to figure out unless you intentionally set up a honey pot to investigate attackers. Use audited libraries that label themselves as constant time. The high attention and maintenance burden of trying to keep up with the latest means that your best defence is relying on well maintained cryptographic libraries.

Test your web server handles in hermetic setups, without having to access the non-local network (internet)

Understanding the behaviour of your systems can be further improved by thorough tests anticipating and asserting on end-to-end results. Most folks don’t test their HTTP/web handler logic because they don’t know how to wire up web servers nor inputs. It is highly important that tests be diverse and hermetic so that they can be fast, reliable and reproducible. Here is an example of how a server handler can be tested in a hermetic setup

You can learn more about how to tame HTTP in the Go programming language per this article on our engineering blog

Use TLS with your services to avoid MITM attacks plus eavsdropping

Transport Layer Security allows public key cryptography and encryption to protect your data in transit. Gone are the days when SSL certificates cost thousands of dollars per month to have. These days, Let’s Encrypt from the Internet Security Research Group provides free and renewable SSL certificates at scalable and in a variety of languages.

Consensus: avoid time.Now() which is localized and instead use a consensus aware clock

When distributed systems communicate and are trying to get to consensus, they’ll need a common agreed time reference. Using time.Now() subjects systems to serious problems like clock drifts, NTP time attacks where an attacker can control your clock to reject specific proposals or cause chaos, they can cause partitioning in the network because consensus can’t be settled upon. This issue manifested as a vulnerability in the Cosmos ecosystem.

Fix:

Please use consensus aware clocks. For Cosmos code, please always us block.HeaderTime

Large distributed systems use highly accurate clocks like in protocols like TrueTime; some open source protocols like RoughTime

Prevent replay attacks of cryptographic objects by using unique identifiers

Replay attacks (of, say, an encrypted payload) can happen in stateless systems, where data received can't be checked to be fresh, and not copied from a previous session. The reuse/lack of a nonce is an attack that’s basically been done even before the advent of computers.

Use a unique nonce in cryptographic algorithms that require it (AES-GCM, ChaCha20, etc.)

A nonce (number used only once) is a non-secret input to certain ciphers that, if repeated, can annihilate the security of the cipher. This is particularly true of stream cipher-like constructions: AES-CTR, AES-GCM, ChaCha20, Salsa20, and so on. The nonce does not need be random, it only has to be unique (that is, a given nonce must never be reused in combination with the same key). If random nonces are used, then they must be large enough to make the risk of collisions negligible.

Salt your passwords even after hashing them before storing them in databases

Good hash functions compute a digest of information generating hashes with very low collision rates such as with AES; however, if the value “password” is hashed with a SHA256 cipher, it always produces the value 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8 and it is unlikely that other values will produce the same hash aka a collision. If an attacker can collect a listing of one way plain values and their resulting hashes, they can reverse engineer a dictionary of password hashes to plaintext and without some obscurity, it becomes just a problem of collecting a large enough and diverse corpus. Some folks building web applications can be tempted to just store the hash of a password of a value say after hashing them say with MD5 or SHA256. However, unfortunately when hashes are stored, they can be broken by using rainbow tables and dictionary looks which perform pre-saved password hash look-ups, given that. Salting is a technique that adds randomized and fresh data to data that’ll be hashed, but can be authenticated as the original. This highly guarantees that even if everyone shared the same password, the resulting hash won’t be the same, hence it makes it prohibitavely expensive to try to map against rainbow tables and pre-saved hash→plaintext mappings!

Use the latest Go releases & subscribe to the mailing list plus official social media

Go’s security team is very active, hard working and they post up software updates transparently and responsibly disclosing vulnerabilities. We’ve reported a couple of vulnerabilities. Whenever a security release is made, we highly implore that you upgrade immediately.

Do not accept compressed payloads from the world as they can contain vulnerability sequences such as zip bombs

Accepting data compressed from the wild say by customers can be a recipe for disaster. There are vulnerabilities such as zip bombs that can’t be mitigated easily unless by careful comparison of the data size of the inputs and memory consumed so far

Don’t accept arbitrary inputs for memory allocation: set sane bounds checks and avoid arbitrary integer type casts

When allocating memory such as with primitives like make, ensure that you carefully check user inputs, checking for ranges like even x ≤ 0 and x ≥ max. Some inputs naively assume that inputs will be in the range x ≥ 0 but really attackers will take advantage of such. During a supply chain audit for cosmos-sdk, we found a curious bug in a popular downstream dependency that could be exploited by passing in a negative value (there wasn’t a bounds check for negative value) but later on the code just blindly converted that value into a uint value of which uint(x) where x < 0 can be calculated by maxUint - abs(x) - 1. This vulnerability was reported by us at

https://github.com/libp2p/go-buffer-pool/issues/26

The subtlety of that bug is that one can use it to cause “death by a thousand cuts” in which an attacker can stealthy and progressively send many requests of sizes that won’t blow up memory but will cause almost large allocations and eventually due to an overwhelming amount of allocated memory, the server can freeze. In that bug I demonstrated a way to freeze a machine.

Reject arbitrary PIDs being passed into process killers like os.Kill

We’ve found code that implements Command & Control (C&C) systems and innocently it accepted string values via RPC then parsed those values into an integer and sadly didn’t perform any range checks then passed the value to os.Kill. Luckily Go’s os.Kill firstly invokes os.FindProcess(PID) which always succeeds then checks the process state. However if that code used a shell to invoke kill it would be a rain of vulnerabilities.

Use memory safe languages like Go, Rust, Java

Memory safety bugs occur in languages that have no separation between memory layouts and can allow user manipulation of sensitive memory in arbitrary sections of RAM, without any bounds checks. Languages such as C, C++, Assembly variants allow this kind of manipulation and the attack surface area that they expose can even drown out the productivity costs of having gone to bare metal. Modern languages like Go & Rust are usually sufficient & expressive enough to write most programs, please use them. Heavy C++ production code bases like Chromium report that 70% of the bugs they’ve encountered are memory safety bugs https://www.chromium.org/Home/chromium-security/memory-safety/

Conclusion:

Securing systems forever requires attention to detail and intentional concerted effort to prevent vulnerabilities. As long as human beings are involved in writing software, our ideas of expressing and writing software are bound to have weaknesses due to us being human and cognitive dissonance from translating human thoughts to software. However, we can enhance security by leveraging rules from computer systems and using static analyzers, dynamic/runtime tooling and bug checkers.

References

ResourceCommentsURL
Go race detector Just run go test -racehttps://go.dev/doc/articles/race_detector
Google Chromium 70% of bugs are memory safety bugsWhenever possible, please use memory safe languageshttps://www.chromium.org/Home/chromium-security/memory-safety/
SLSAhttps://slsa.dev/
Chainguard Q3 2022 July 2022, Cosmos Software Supply chain assessmentPlease read through this important report to garner new skills & fixes that’ll make a variety of supply chains much more securehttps://drive.google.com/file/d/1BCDUSZ3cSdO8FTD9A-nA21_iViONoFln/view