Coordination between C++ indexer jobs
Closed, ResolvedPublic

Description

Without coordination, the C++ indexer will generate a lot of globally redundant output. (It should not generate much locally redundant output). We have two mechanisms for dealing with this:

  • static claiming, a closed and reproducible method in which headers are assigned to compilation units
  • hash caching, an open and non-reproducible method which works by hashing blocks of indexer output

We're looking at developing another non-reproducible dynamic method to deduplicate based on the keys we use for static claiming. We may also consider loosening the guarantees on the resulting database (such that some data that would have been there with the feature turned off would not be reproduced with it being turned on).

schroederc created this task.Via WebOct 12 2015, 4:04 PM
schroederc assigned this task to zarko.
schroederc added subscribers: amshali, juanch, schroederc.
schroederc added a project: C++.
schroederc changed the edit policy of this Maniphest Task from "All Users" to "Core Team (Project)".
zarko changed the title from "C++ indexer emits an enormous amount of duplicate facts" to "Coordination between C++ indexer jobs".Via WebOct 12 2015, 4:26 PM
zarko edited the task description. (Show Details)
zarko added a revision: Restricted Differential Revision.Via WebOct 16 2015, 4:23 PM
schroederc changed the visibility of this Maniphest Task from "All Users" to "Public (No Login Required)".Via WebMay 16 2016, 3:26 PM
zarko closed this task as "Resolved".Via WebMar 24 2017, 11:22 AM

This is actually working pretty well.

Add Comment