Mutation Testing with Taquito

Mutation testing aims to improve your test coverage by checking that unit tests will fail when code is modified. Code changes that do not break existing tests are called “surviving mutants.” Testers hope to identify and eliminate problematic mutants by adding additional tests.

In Taquito, I decided to try mutation testing on the Taquito package called ‘taquito-remote-signer.’ It is a relatively small package that is easy to interact with through unit tests. Taquito uses Jest to execute unit tests. I added Stryker to perform mutation testing.

Stryker Configuration

Stryker has a convenient configuration file. Much of the configuration is straightforward, and there is a stryker init command. There are, of course, some necessary configuration settings that will manage the project.

Since I am focused only on ‘taquito-remote-signer,’ I can declare in the stryker.conf.json File that:

"mutate": [                  "~/taquito/packages/taquito-remote-signer/src/*.ts"
        ],

An essential consideration for the configuration of Stryker is whether to include type checking. As the number of type check errors and warnings can be large, and since correct typing is an independent topic to be resolved elsewhere, we can specify:

"disableTypeChecks": "~/taquito/packages/**/**/*.{js,ts,jsx,tsx,html,vue}",

Finally, we can limit the mutators that are applied. Many mutants are not worth killing. To get to interesting missing tests quickly, we should reduce the mutators that produce too much noise. In this case, I have

"mutator": {
                "plugins": null,
                "excluded mutations": [                        "BlockStatement",
                        "StringLiteral",
                        "ArrayDeclaration",
                        "ObjectLiteral"
                ]
        },

In some cases, it is also helpful to exclude ConditionalExpression.

There is a complete commented configuration file listing at the end of this article.

Selecting interesting mutators

Are these mutants really things that tests have to detect?

In this example, demonstrating the mutator BlockStatement, an empty block {} can replace {throw new InvalidKeyHashError(this.pkh);} and no test will fail. Therefore the change is undetectable by the unit tests, and running the tests after making the change will not make anyone aware of it.

captionless image

But is that a bug? It could be if someone inadvertently deleted the content of the block. The code would compile, and the test would pass, but the valuable client would scream when they use their favourite but invalid key hash and see {} on their screen.

Sampling variations of mutators and looking for missing but necessary tests

After the Taquito remote signer mutation testing experiment, the Mutation score was:

---------|----------|-----------|------------|---------|--------
 % score | # killed | # timeout | # survived | # nocov | # error
---------|----------|-----------|------------|---------|--------
   75.00 |       21 |         0 |          7 |       0 |       9

Including BlockStatement and rerunning:

---------|----------|-----------|------------|---------|-------- score    | # killed | # timeout | # survived | # nocov | # error
---------|----------|-----------|------------|---------|--------
  67.50 |       27 |         0 |         10 |       3 |      19

There are three BlockStatement mutants in taquito-remote-signer.ts not covered by tests. Also, the current tests are killing some BlockStatement mutants, as that number has increased by six, while 3 BlockStatement mutants survive.

You can start with a few excluded mutators to keep your task small. When the File is getting 100%, add another mutator and kill the new surviving mutants by adding to or adapting the tests.

Add code coverage and kill mutants.

Stryker produces an HTML report of its findings. In that report, I observed the following:

Finding a mutation test result in a Stryker report

The selected code checks the validity of the signature by examining the first letters, and if they match the allowed prefix ‘sig,’ the signature check proceeds. However, Stryker is telling us that while there is a unit test that checks a three-letter signature prefix, there is not a unit test checking the five char strings, such as, for example, ‘edsig.’

When we add a test for a signature with the prefix ‘edsig’, Stryker shows a test now covers the line. The code coverage report generated by Jest also would have shown this lack of coverage, but it would have been harder to see the specific issue. After covering the line with a test, we can also check if Stryker shows a mutation is possible. Typically when you add coverage, you find you have opened the door for yet more mutants.

captionless image

We have also killed a mutant by providing the test. Stryker shows us that substituting just “signature” instead of a subsection of the signature would break the test Should sign messages with five-digit ‘edsig’ prefix, we just added. Since the test fails with that substitution, we have killed a mutant.

In the Stryer ClearTextReporter output, we can see:

captionless image

The original test had killed the mutant for ‘? signature.substring(0, 3)’ and now the additional test we added killed the mutant for ‘: signature.substring(0, 5)’;

The test coverage now includes the conditional line, but it is also a more robust test since we can now be sure that if someone drops the substring condition, this test will catch that change.

Modify existing tests to kill mutants

Another example of a killed mutation:

captionless image

By adding a test with backlashes appended to the end of an HTTP address, we can cover this code with a test (explicitly requiring the backslashes to be removed) and simultaneously remove two mutants derived from changes in the regex. The code as written uses:

'/\/+$/g'

Stryker notes that these regexes are potential mutants for this code:

'/\/+/g' and '/\/$/g'

Without the test explicitly requiring the stripping of backslashes, we don’t have a test that forces a choice among these regexes. The test added is here:

it('Should strip trailing slashes when creating URL because it is assumed to be included in path', async (done) => {
      const signer = new RemoteSigner(
        'tz1iD5nmudc4QtfNW14WWaiP7JEDuUHnbXuv',
        'http://127.0.0.1:6732///',
        {},
        httpBackend as any
      );
      httpBackend.createRequest
        .mockResolvedValueOnce({
          signature:
            'sigiGUGvWRkoYuf7ReH3wWAYnpgBFTa2DJ4Nxi7v1Wy5KqS7sZaxNhRiW6ivuoSUdKZnyGTABVk23WnppatuYqHty7uDtWRY',
        })
        .mockResolvedValueOnce({
          public_key: 'edpkuAhkJ81xGyf4PcmRMHLSaQGbDEpkGhNbcjNVnKWKR8kqkgQR3f',
        });
      const signed = await signer.sign(
        '0365cac93523b8c10346c0107cfea5e12ff3c759459020e532f299e2f41082f7cb6d0000f68c4abfa21dfc0c9efcf588190388cac85d9db60f81d6038b79d8030000000000b902000000b405000764045b0000000a2564656372656d656e74045b0000000a25696e6372656d656e740501035b0502020000008503210317057000010321057100020316072e020000002b032105700002032105710003034203210317057000010321057100020316034b051f020000000405200002020000002b0321057000020321057100030342032103170570000103210571000203160312051f0200000004052000020321053d036d0342051f020000000405200003000000020000'
      );
      expect(httpBackend.createRequest.mock.calls[0][0]).toEqual({
        method: 'POST',
        url: 'http://127.0.0.1:6732/keys/tz1iD5nmudc4QtfNW14WWaiP7JEDuUHnbXuv',
        headers: undefined,
      });
      expect(httpBackend.createRequest.mock.calls[0][1]).toEqual(
        '0365cac93523b8c10346c0107cfea5e12ff3c759459020e532f299e2f41082f7cb6d0000f68c4abfa21dfc0c9efcf588190388cac85d9db60f81d6038b79d8030000000000b902000000b405000764045b0000000a2564656372656d656e74045b0000000a25696e6372656d656e740501035b0502020000008503210317057000010321057100020316072e020000002b032105700002032105710003034203210317057000010321057100020316034b051f020000000405200002020000002b0321057000020321057100030342032103170570000103210571000203160312051f0200000004052000020321053d036d0342051f020000000405200003000000020000'
      );
      expect(httpBackend.createRequest.mock.calls[1][0]).toEqual({
        method: 'GET',
        url: 'http://127.0.0.1:6732/keys/tz1iD5nmudc4QtfNW14WWaiP7JEDuUHnbXuv',
        headers: undefined,
      });
      expect(signed).toEqual({
        bytes:
          '0365cac93523b8c10346c0107cfea5e12ff3c759459020e532f299e2f41082f7cb6d0000f68c4abfa21dfc0c9efcf588190388cac85d9db60f81d6038b79d8030000000000b902000000b405000764045b0000000a2564656372656d656e74045b0000000a25696e6372656d656e740501035b0502020000008503210317057000010321057100020316072e020000002b032105700002032105710003034203210317057000010321057100020316034b051f020000000405200002020000002b0321057000020321057100030342032103170570000103210571000203160312051f0200000004052000020321053d036d0342051f020000000405200003000000020000',
        prefixSig:
          'sigiGUGvWRkoYuf7ReH3wWAYnpgBFTa2DJ4Nxi7v1Wy5KqS7sZaxNhRiW6ivuoSUdKZnyGTABVk23WnppatuYqHty7uDtWRY',
        sbytes:
          '0365cac93523b8c10346c0107cfea5e12ff3c759459020e532f299e2f41082f7cb6d0000f68c4abfa21dfc0c9efcf588190388cac85d9db60f81d6038b79d8030000000000b902000000b405000764045b0000000a2564656372656d656e74045b0000000a25696e6372656d656e740501035b0502020000008503210317057000010321057100020316072e020000002b032105700002032105710003034203210317057000010321057100020316034b051f020000000405200002020000002b0321057000020321057100030342032103170570000103210571000203160312051f0200000004052000020321053d036d0342051f0200000004052000030000000200009b00f3b02e3760092415595548e2bd6532b0223d8ff282d31b6f30e35592b6b91a4c37fb0a6a24cc8b5176cc497e204ab722a4711803121ff51dc5a450cfd10b',
        sig: 'sigiGUGvWRkoYuf7ReH3wWAYnpgBFTa2DJ4Nxi7v1Wy5KqS7sZaxNhRiW6ivuoSUdKZnyGTABVk23WnppatuYqHty7uDtWRY',
      });
      done();
    });

Practical considerations

Mutation testing can be time-consuming. If it stops identifying necessary missing tests, it may be too time-consuming. The automated execution of mutations is easy — considering and reasoning about what the results mean for the test suite takes time.

Stryker offers this optimistic view :

"Bugs, or mutants, are automatically inserted into your production code. Your tests are run for each mutant. If your tests fail then the mutant is killed. If your tests passed, the mutant survived. The higher the percentage of mutants killed, the more effective your tests are."

Stryker Configuration file for Taquito

Stryker will help you start a configuration file if you cd into a project and run

stryker --init

The configuration file starts with some header info:

{
        "$schema": "./node_modules/@stryker-mutator/core/schema/stryker-schema.json",
        "_comment": "This config was generated using 'stryker init'. Please take a look at: https://stryker-mutator.io/docs/stryker-js/configuration/ for more information",

Then, here we select the tests we wish to examine: For the Taquito package:

"mutate": [                    "/home/mike/taquito/packages/taquito/src/**/*.ts"
        ],

And for Taquito additional packages: (a separate run)

"mutate": [                   "/home/mike/taquito/packages/**/src/**/*.ts"
        ],

There are several environment parameters:

"packageManager": "npm",
        "checkers": [                "typescript"
        ],
        "tsconfigFile": "tsconfig.json",
        "reporters": [                "html",
                "clear-text",
                "progress"
        ],
        "testRunner": "jest",
        "coverageAnalysis": "perTest",

Here we choose not to be told of all the type check fails that can happen: For Taquito Package:

"disableTypeChecks": "/home/mike/taquito/packages/taquito/**/**/*.{js,ts,jsx,tsx,html,vue}",

And for the other packages:

"disableTypeChecks": "/home/mike/taquito/packages/**/**/**/*.{js,ts,jsx,tsx,html,vue}",

Some more general parameters:

"fileLogLevel": "trace",
        "logLevel": "debug",
        "allowConsoleColors": true,
        "checkerNodeArgs": [],
        "maxTestRunnerReuse": 0,
        "commandRunner": {
                "command": "npm test"
        },
        "clearTextReporter": {
                "allowColor": true,
                "logTests": true,
                "maxTestsToLog": 3
        },
        "dashboard": {
                "baseUrl": "https://dashboard.stryker-mutator.io/api/reports",
                "reportType": "full"
        },
        "eventReporter": {
                "baseDir": "reports/mutation/events"
        },
        "ignorePatterns": [],
        "ignoreStatic": false,
        "inPlace": false,
        "maxConcurrentTestRunners": 9007199254740991,

Here we want to specify which mutators we do not like to hear from:

"mutator": {
               "plugins": null,
               "excludedMutations": [                       "BlockStatement",
                       "StringLiteral"
               ]
       },

And then the configuration file concludes with some more general parameters:

"plugins": [                "@stryker-mutator/*"
        ],
        "appendPlugins": [],
        "htmlReporter": {
                "fileName": "reports/mutation/mutation.html"
        },
        "jsonReporter": {
                "fileName": "reports/mutation/mutation.json"
        },
        "symlinkNodeModules": true,
        "tempDirName": ".stryker-tmp",
        "cleanTempDir": true,
        "testRunnerNodeArgs": [],
        "thresholds": {
                "high": 80,
                "low": 60,
                "break": null
        },
        "timeoutFactor": 1.5,
        "timeoutMS": 5000,
        "dryRunTimeoutMinutes": 5,
        "warnings": true,
        "disableBail": false
}

100% coverage but 75% mutation score

taquito-local-forging/src/validator.ts

has 100% code coverage by unit tests, but Stryker finds two mutants that survive these tests.

const deleteArrayElementByValue = (array: string[], item: string) => {
   return array.filter((e) => e !== item);
};

One is a Method Expression:

const deleteArrayElementByValue = (array: string[], item: string) => {
   return array;
};

And the other is a Conditional Expression

const deleteArrayElementByValue = (array: string[], item: string) => {
   return array.filter((e) => true);
};

Does either of these changes in the code create bugs? Are the changes just wrong? Then tests should catch that.

This function is Covered by 402 tests (yet still survived)
☂️ Forge and parse operations default protocol Common test: Delegation (packages/taquito-local-forging/test/taquito-local-forging.spec.ts:15:5)
☂️ Forge and parse operations default protocol Common test: Reveal (packages/taquito-local-forging/test/taquito-local-forging.spec.ts:15:5)
☂️ Forge and parse operations default protocol Common test: Ballot (packages/taquito-local-forging/test/taquito-local-forging.spec.ts:15:5)
☂️ Forge and parse operations default protocol Common test: Seed nonce revelation (packages/taquito-local-forging/test/taquito-local-forging.spec.ts:15:5)
☂️ Forge and parse operations default protocol Common test: Proposals (packages/taquito-local-forging/test/taquito-local-forging.spec.ts:15:5)
☂️ Forge and parse operations default protocol Common test: transaction (packages/taquito-local-forging/test/taquito-local-forging.spec.ts:15:5)
    
etc.

We can look at

packages/taquito-local-forging/test/taquito-local-forging.spec.ts

We can see why there are 402 tests. It’s from the .forEach!

describe('Forge and parse operations default protocol', () => {
  const localForger = new LocalForger();
  commonCases.forEach(({ name, operation, expected }) => {
    test(`Common test: ${name}`, async (done) => {
      const result = await localForger.forge(operation);
      expect(await localForger.parse(result)).toEqual(expected || operation);
      done();
    });
  });
});

But none of these tests failed when we substituted Method Expression and Conditional Expressions in the code.

Does this matter? What does it tell us about the code?

It looks like the tests are indifferent to whether the array returned has been filtered or not. What would happen if the item does not screen the returned array? Would this create a bug elsewhere? What is consuming the array, and what does it have to say about this?

deleteArrayElementByValue is used here:

/**
 *  returns 0 when the two array of properties are identical or the passed property
 *  does not have any missing parameters from the corresponding schema
 *
 *  @returns array element differences if there are missing required property keys
 */
export const validateMissingProperty = (operationContent: OperationContents) => {
  const kind = operationContent.kind as OperationKind;  const keys = Object.keys(operationContent);
  const cleanKeys = deleteArrayElementByValue(keys, 'kind');  const schemaKeys = Object.keys(OperationKindMapping[kind]);  return getArrayDifference(cleanKeys, schemaKeys);
};

And then validateMissingProperty gets used by the local forger:

const diff = validateMissingProperty(content);
      if (diff.length === 1) {
        if (content.kind === 'delegation' && diff[0] === 'delegate') {
          continue;
        } else if (content.kind === 'origination' && diff[0] === 'delegate') {
          continue;
        } else if (content.kind === 'transaction' && diff[0] === 'parameters') {
          continue;
        } else if (
          content.kind === ('tx_rollup_submit_batch' as unknown) &&
          diff[0] === 'burn_limit'
        ) {
          continue;
        } else {
          throw new InvalidOperationSchemaError(
            `Missing properties: ${diff.join(', ').toString()}`
          );
        }
      } else if (diff.length > 1) {
        throw new InvalidOperationSchemaError(`Missing properties: ${diff.join(', ').toString()}`);
      }
    }
    const forged = this.codec.encoder(params).toLowerCase();
    return Promise.resolve(forged);
  }

I suspect we can kill these mutants with a test of a local forger that asserts something about the keys that have been cleaned by deleteArrayElementByValue.