Examples
- Extracting Compilations
- Extracting Compilations using Bazel
- runextractor tool
- Extracting Gradle based repositories
- Extracting projects built with
make
- Indexing Compilations
- Indexing the Kythe Repository
- Using Cayley to explore a GraphStore
- Serving data over HTTP
This document assumes that the latest release archive from https://github.com/kythe/kythe/releases has been unpacked into /opt/kythe/. See /opt/kythe/README.md for more information.
Extracting Compilations
Extracting Compilations using Bazel
Kythe uses Bazel to build itself and has implemented Bazel action_listeners that use Kythe’s Java and C++ extractors. This effectively allows Bazel to extract each compilation as it is run during the build.
Extracting the Kythe repository
Add the flag
--experimental_action_listener=@io_kythe//kythe/extractors:extract_kzip_java
to make Bazel extract Java compilations and
--experimental_action_listener=@io_kythe//kythe/extractors:extract_kzip_cxx
to do the
same for C++.
Extracting other Bazel based repositories
You can use the Kythe release to extract compilations from other Bazel based repositories.
runextractor tool
runextractor
is a generic extraction tool that works with any build system capable of emitting a compile_commands.json file. runextractor
invokes an extractor for each compilation action listed in compile_commands.json and generates a kzip in the output directory for each.
Build systems capable of emitting a compile_commands.json
include CMake, Ninja, gn, waf, and others.
runextractor configuration
runextractor
is configured via a set of environment variables:
KYTHE_ROOT_DIRECTORY
: The absolute path for file input to be extracted. This is generally the root of the repository. All files extracted will be stored relative to this path.KYTHE_OUTPUT_DIRECTORY
: The absolute path for storing output.KYTHE_CORPUS
: The corpus label for extracted files.
Extracting from a compile_commands.json file
This example uses Ninja, but the first step can be adapted for others.
- Begin by building your project with compile_commands.json enabled. For ninja, the command is
ninja -t compdb > compile_commands.json
- Set environment variables - see above section.
- Invoke runextractor:
runextractor compdb -extractor /opt/kythe/extractors/cxx_extractor
- If successful, the output directory should contain one kzip for each compilation action. An optional last step is to merge these into one kzip with
/opt/kythe/tools/kzip merge --output $KYTHE_OUTPUT_DIRECTORY/merged.kzip $KYTHE_OUTPUT_DIRECTORY/*.kzip
.
Extracting CMake based repositories
The runextractor
tool has a convenience subcommand for cmake-based repositories that first invokes CMake to generate a compile_commands.json, then processes the listed compilation actions. However the same result could be achieved by invoking CMake manually, then using the generic runextractor compdb
command.
These instructions assume your environment is already set up to successfully run cmake for your repository.
$ export KYTHE_ROOT_DIRECTORY="/absolute/path/to/repo/root"
$ export KYTHE_CORPUS="github.com/myproject/myrepo"
$ export KYTHE_OUTPUT_DIRECTORY="/tmp/kythe-output"
$ mkdir -p "$KYTHE_OUTPUT_DIRECTORY"
# $CMAKE_ROOT_DIRECTORY is passed into the -sourcedir flag. This value should be
# the directory that contains the top-level CMakeLists.txt file. In many
# repositories this path is the same as $KYTHE_ROOT_DIRECTORY.
$ export CMAKE_ROOT_DIRECTORY="/absolute/path/to/cmake/root"
$ /opt/kythe/tools/runextractor cmake \
-extractor=/opt/kythe/extractors/cxx_extractor \
-sourcedir=$CMAKE_ROOT_DIRECTORY
Extracting Gradle based repositories
-
Install compiler wrapper
Extraction works by intercepting all calls to
javac
and saving the compiler arguments and inputs to a “compilation unit”, which is stored in a .kzip file. We have a javac-wrapper.sh script that forwards javac calls to the java extractor and then calls javac. Add this to the end of your project’s build.gradle:allprojects { gradle.projectsEvaluated { tasks.withType(JavaCompile) { options.fork = true options.forkOptions.executable = '/opt/kythe/extractors/javac-wrapper.sh' } } }
-
VName configuration
Next, you will need to create a vnames.json mapping file, which tells the extractor how to assign vnames to files based on their paths. A basic vnames config for a gradle project looks like:
[ { "pattern": "(build/[^/]+)/(.*)", "vname": { "corpus": "MY_CORPUS", "path": "@2@", "root": "@1@" } }, { "pattern": ".*/.gradle/caches/(.*)", "vname": { "corpus": "MY_CORPUS", "path": "@1@", "root": ".gradle/caches" } }, { "pattern": "(.*)", "vname": { "corpus": "MY_CORPUS", "path": "@1@" } } ]
(note: change “MY_CORPUS” to the actual corpus for your project)
You can test your vname config using the
vnames
command line tool. For example:bazel build //kythe/go/util/tools/vnames echo "some/test/path.java" | ./bazel-bin/kythe/go/util/tools/vnames/vnames apply-rules --rules vnames.json > { > "corpus": "MY_CORPUS", > "path": "some/test/path.java" > }
-
Extraction
# note: you may want to use a different javac depending on your install export REAL_JAVAC="/usr/bin/javac" export JAVA_HOME="$(readlink -f $REAL_JAVAC | sed 's:/bin/javac::')" export JAVAC_EXTRACTOR_JAR="/opt/kythe/extractors/javac_extractor.jar" export KYTHE_VNAMES="$PWD/vnames.json" export KYTHE_ROOT_DIRECTORY="$PWD" # paths in the compilation unit will be made relative to this export KYTHE_OUTPUT_DIRECTORY="/tmp/extracted_gradle_project" mkdir -p "$KYTHE_OUTPUT_DIRECTORY" ./gradlew clean build -x test -Dno_werror=true # merge all kzips into one /opt/kythe/tools/kzip merge --output $KYTHE_OUTPUT_DIRECTORY/merged.kzip $KYTHE_OUTPUT_DIRECTORY/*.kzip
-
Examine results
If extraction was successful, the final kzip should be at
$KYTHE_OUTPUT_DIRECTORY/merged.kzip
. Thekzip
tool can be used to inspect the result.$ kzip info --input merged.kzip | jq . # view summary information $ kzip view merged.kzip | jq . # view all compilation units in the kzip
Extracting projects built with make
Projects built with make can be extracted by substituting the C/C++ compiler with a wrapper script that invokes both Kythe’s cxx_extractor binary and the actual C/C++ compiler.
Given a simple example project:
# main.cc
#include <iostream>
int main(int argc, char** argv) {
std::cout << "Hello" << std::endl;
}
# makefile
all: bin
bin: main.cc
$(CXX) main.cc -o bin
# cxx_wrapper.sh
#!/bin/bash -e
$KYTHE_RELEASE_DIR/extractors/cxx_extractor "$@" & pid=$!
/usr/bin/c++ "$@"
wait "$pid"
Extraction is done by setting the CXX
make variable as well as some environment variables that configure cxx_extractor
.
# directory where kythe release has been installed
export KYTHE_RELEASE_DIR=/opt/kythe-v0.0.50
# parameters for cxx_extractor
export KYTHE_CORPUS=mycorpus
export KYTHE_ROOT_DIRECTORY="$PWD"
export KYTHE_OUTPUT_DIRECTORY=/tmp/extract
export CXX="cxx_wrapper.sh"
mkdir -p "$KYTHE_OUTPUT_DIRECTORY"
make
If all goes well, this will populate $KYTHE_OUTPUT_DIRECTORY
with kzip files, one for each compiler invocation. These files can be inspected with the kzip
tool distributed as part of the kythe release. For example kzip view $KYTHE_OUTPUT_DIRECTORY/some.file.kzip | jq
.
Indexing Compilations
All Kythe indexers analyze compilations emitted from extractors as either a .kzip file. The indexers will then emit a delimited stream of entry protobufs that can then be stored in a GraphStore.
Indexing the Kythe Repository
Using Cayley to explore a GraphStore
Install Cayley if necessary: https://github.com/google/cayley/releases
// Get all file nodes
cayley> g.V().Has("/kythe/node/kind", "file").All()
// Get definition anchors for all record nodes
cayley> g.V().Has("/kythe/node/kind", "record").Tag("record").In("/kythe/edge/defines").All()
// Get the file(s) defining a particular node
cayley> g.V("node_ticket").In("/kythe/edge/defines").Out("/kythe/edge/childof").Has("/kythe/node/kind", "file").All()
Serving data over HTTP
The http_server
tool can be run over the serving table created with the
write_tables
binary (see above).