Installing KHOJ AI assistant using docker compose (local only)
posted on 2023-08-10
Introduction
Below I have wrote a reference guide how to install a local-only (i.e. data won’t leave your environment) of KHOJ AI assistant using docker compose and setting Khoj’s Emacs plugin to use it as a local server instance.
Disk space requirements
There are some pretty hefty disk space requirements, noting them here just for completion. On my computer it took north of 11GB of disk space after installation:
Disk space used: % du -hs Applications/khoj/ 5.4G Applications/khoj/
plus the size of docker images:
% docker images REPOSITORY TAG IMAGE ID CREATED SIZE ghcr.io/khoj-ai/khoj latest 5641a8f9b32d 9 hours ago 5.94GB
The RAM requirements are start at 3.2GB but it grows in time and queries can take all of the CPU cores and max them when composing an answer to a query.
Assumptions to configuration
Some assumption to the instructions below:
~/Application/khoj/
directory will held for docker compose project and all the KHOJ non-transient related data.
I keep my org-mode files and PDF files that I wanted to have indexed in:
/home/adam/Cloud/GTD/Getting Things Done
and
/home/adam/Cloud/GTD/Indexed PDF files
directories respectively. Substitute those below for directories matching your files and directory structure.
I keep pristine KHOJ source code repository clone in ~/Src/opensource/khoj/
.
KHOJ docker compose guide
Clone the source repository:
cd ~/Src/opensource/
git clone https://github.com/khoj-ai/khoj
make directory for modified sample docker compose projects and its data:
mkdir -p ~/Application cd ~/Application mkdir khoj cd khoj cp ~/Src/opensource/khoj/docker-compose.yml . cp ~/Src/opensource/khoj/config/khoj_docker.yml .
Modify the KHOJ’s configuration file khoj/config/khoj_docker.yml
to reflect the contents below.
The config also disables the on-by-default telemetry, and changes the encoder to use one better suited for multilingual documents:
app: should_log_telemetry: false content-type: github: null notion: null org: compressed-jsonl: /data/embeddings/notes.jsonl.gz embeddings-file: /data/embeddings/note_embeddings.pt index_heading_entries: false input-files: null input-filter: - /data/org/**/*.org pdf: compressed-jsonl: /data/embeddings/pdf.jsonl.gz embeddings-file: /data/embeddings/pdf_embeddings.pt input-files: null input-filter: - /data/pdf/**/*.pdf plugins: null processor: conversation: conversation-logfile: /data/embeddings/conversation_logs.json enable-offline-chat: true openai: null search-type: asymmetric: cross-encoder: cross-encoder/ms-marco-MiniLM-L-6-v2 encoder: paraphrase-multilingual-MiniLM-L12-v2 model_directory: /data/models/asymmetric image: encoder: sentence-transformers/clip-ViT-B-32 model_directory: /data/models/image_encoder symmetric: null version: 0.0.0
Next modify sample docker compose project file khoj/docker-compose.yml
to use the config and mount directories holding our notes and PDF documents for indexing. You definitely want to change the /home/adam/Cloud/GTD/...
paths below to reflect yours.
version: "3.9" services: server: image: ghcr.io/khoj-ai/khoj:latest # if you would like to have the server running even after restart. #restart: unless-stopped ports: # Default Khoj port, changing it requires changing khoj-server-url Emacs package variable - "42110:42110" working_dir: /app volumes: - .:/app # Volumes pointing to org-mode (and other) files for indexing. # the provided khoy_docker.yml config expects data in: # /data/org and /data/pdf directories. # Please change the left side to point to your own directories # holding org-mode and pdf files for indexing. - /home/adam/Cloud/GTD/Getting Things Done:/data/org/ - /home/adam/Cloud/GTD/Indexed PDF files:/data/pdf/ # Embeddings and models are populated after the first run # You can set these volumes to point to empty directories on host - ./data/embeddings/:/data/embeddings/ - ./data/models/:/root/.khoj/search/ # Models and cache # (as we don't want the 3GB+ model files being downloaded each time we restart) - ./data/cache:/root/.cache/ - ./data/models:/data/models/ # Use 0.0.0.0 to explicitly set the host ip for the service on the container. https://pythonspeed.com/articles/docker-connection-refused/ command: --host="0.0.0.0" --port=42110 -c=khoj_docker.yml -vv
Last setp is to launch KHOJ AI assistant using docker compose
:
cd ~/Application/hkoj/
docker compose up -d
After a while you should be able to access Khoj instance at: http://localhost:42110/
Khoj Emacs guide
Now we can set up KHOJ’s AI Assistant plugin to Emacs to be able to interact with our local AI from our Editor of Choice™ instead of web browser 😉
Instructions below are using straight.el:
(use-package khoj :after org :straight (khoj :type git :host github :repo "khoj-ai/khoj" :files (:defaults "src/interface/emacs/khoj.el")) ;; not mentioned in the online quick start, but this will prevent the emacs package ;; downloading and configuring a separate khoj instance and will use already running ;; one instead. :config (setq khoj-auto-setup nil))
That’s all what’s needed to be able to interact with KHOJ AI assistant running locally from Emacs.
Conclusions
I won’t write any conclusions as I myself are still exploring how useful such a tool is for my own workflow. Hope those instructions could be helpful in at least trying using AI for your own personal notes and making your own conclusions.
Happy hacking!