Skip to main content

ADR-0007: Proposal How to Mark Findings With Hashes to Find Duplicates

Status:PROPOSED
Date:2020-11-25
Author(s):Sven Strittmatter Sven.Strittmatter@iteratec.com

NOTE: The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Context

We need the possibility to find duplicate findings. One use case is that we want to accept a finding and want to ignore the same finding in the future.

Assumptions

  • The execution order of hooks is unspecified.
  • The information if a finding’s hash is a duplicate MUST NOT be stored or maintained in the SCB S3 storage.
  • The SCB MUST NOT remove findings: read-write-hooks may alter them, but never delete or filter them out.
    • Maybe a read-hooks MAY decide to not store a finding into an external system.

Decision

  • We generate a hash for each finding so we can compare findings by the hash and identify duplicates.
  • This hash MUST be mutable and MAY be altered by read-write-hooks because we don’t want to introduce an exceptions to what a read-write-hooks can alter.
  • The parser MUST generate the initial hash of a finding from some of it’s attributes (e.g. name, lication, category …).
    • Each scanner MUST define a default set of attributes used for the hashing.
    • This set of hashed attributes MAY be overwritten.
  • Each read-write-hooks MUST update the hash as last step because the hook MAY changed a hashed attribute.

We implement the hashing step in the parser first with feature flag to evaluate this proposal.

Consequences

  • We don’t need to introduce an ordering for the read-write-hooks.
  • The duplicate detection/handling MUST be done in another service with its own data storage. This is because we have no stable hash until the read-hooks will be executed and these MUST NOT alter the data in SCB itself. But the read-hooks MAY decide to not store data into an external system.