Note: ElasticSearch, Kibana, FSCrawler on TrueNAS
When you finally realize that searching through decades of documents has become a challenge, it is time to acknowledge that you have reached a “big-data” stage. Now, you need some big data utilities to understand, index, search and find out where and what you’ve got.
This process is a nightmare. Bits of instructions are scattered across a half-dozen blogs, random posts and a mass of online instructions that are poorly described. This took me two days to get it working. These are my notes:
Overview
- Create a New Jail in TrueNas to hold and isolate the indexing system
- Configure the Jail
- Install ElasticSearch, Kibana, teseract
- Configure… Configure More… Ask why it is so complicated and disjoint
- Download the FSCrawler
- Configure more
- Make a script to run in background
- More configuring in Kibana
- Fix the massive memory demand issues
Create a new Jail in TrueNAS
- Login to your TrueNAS system and pick the Jails options
- Add a new Jail by clicking ADD
- Give it a Name: ElasticSearch
- Pick the newest Release you have installed
- Sect DHCP to get a unique IP address for this sub-system
- Select Auto-start
- Next
- Select: allow_set_hostname
- Select: allow_raw_sockets
- Select: allow_mount
- Select: allow_mount_procfs
- Click: Next
- Click: Save
- Wait for the jail creation to complete.
- Stop the Jail…
Configure the shared files you want to index by adding a Mount Point.
- Click on MOUNT POINTS
- Click on ACTIONS
- Click on ADD
- Select the host folder you want to index in the top panel
- Select /mnt for the location in the jail
- Select: Read-Only
- Click: Submit
- Go to System Shell
You need to change the StatFS option by using the command line. This allows the jail to see the running processes. We will use “iocage” to make setting changes to a jail. Use looks like this: iocage <get / set> <setting> <Name of Jail>
- Type: iocage get enforce_statfs ElasticSearch
It will show a value of 2
- This needs changed to a value of 1 by typing this: iocage set enforce_statfs=1 ElasticSearch
- Go Back to the Jails Section
- Start or Restart the Jail
Make Note of the IP address
Then Open the Shell for this Jail
- Start by installing the package manager and nano file editor. Type: pkg install nano
- It will prompt you to install pkg. Press “Y”
It will then install pkg
Then, it prompts you to install nano - Press “Y”
Now we need to add the /proc mount point into the /etc/fstab
- type: nano /etc/fstab
- copy and paste this line into the new file:
proc /proc procfs rw 0 0 - press control o – to save
- press enter to confirm file name
- press control x – to exit
- type mount /proc to attach the proc.
Install ElasticSearch, Kibana, Teseract
- To install elasticsearch, type: pkg install elasticsearch8
- press enter
- press “y” to acknowledge
- wait…
- update the rc.conf to allow elasticsearch to automatically start as a service
Type: nano /etc/rc.conf - move to the last line and copy/paste in this: elasticsearch_enable=“YES”
- press control+o
- enter to save
- press control+x to exit
Configure some of Elasticsearch
The config files are stored in /usr/local/etc/elastic search
- type: nano /usr/local/etc/elasticsearch/elasticsearch.yml
Now we make changes to the name of the elastic cluster:
- Uncomment the line the says cluster.name: and change the name to whatever you want
- Uncomment the node.name and change as needed
- Scroll down to the Network Section
- uncomment the network.host: line
- update the ip address with the address of your system – which you wrote down earlier to avoid having to exit the editor/shell and looking for it.
- press control+o
- press enter to save
- press control+x to exit
- start elasticsearch by typing: service elasticsearch start
It should eventually start….
- Now we reset the password with this command: elasticsearch-reset-password -u elastic
- press “y”
- It will print out the password on the screen. Write your new password down or copy it out for use later!
- Reset the password for the Kibana system user by typing: elasticsearch-reset-password -u kibana_system
- press “y”
- Write down the password for the kibana_system user
- Now we create basic TLS security by typing: elasticsearch-certutil ca
- Press enter.
- You can add passwords here if needed and/or just press enter to leave blank
More information is available here - Now, type: elasticsearch-certutil cert –ca elastic-stack-ca.p12
- Press enter a few times and/or add passwords as needed
If you entered a password… more steps are needed.
Enter this: elasticsearch-keystore add xpack.security.transport.ssl.keystore.secure_password
Enter this: elasticsearch-keystore add xpack.security.transport.ssl.truststore.secure_password
This section also automatically updated the configuration file at /usr/local/etc/elasticsearch/elasticsearch.yml
Install Kibana
- Type: pkg install kibana8
- press enter
- press “y”
Edit the rc.conf file to autostart Cuban
- Type: nano /etc/rc.conf
- press enter
- scroll to the bottom of the file and add or copy/pate this: kibana_enable=”YES”
- press control+o to save
- press enter
- press control+x to exit
- Now, edit the kibana config file by typing: nano /usr/local/etc/kibana/kibana.yml
- press enter
- uncomment the server.host: line
- type in your jail’s IP address
- Start Kibana by typing: service kibana start
- Use the browser to open a new tab or window and navigate to the kibana address.
In this case it is http://192.168.1.234:5601
Replace with your jail’s ip address
The site should look like this:
- Now, go back the jail’s shell terminal and fix a write issue.
The kibana.yml config file is write locked and the online enrollment will fail. The file permissions need changed for the online enrollment to work.
Type: chmod 777 /usr/local/etc/kibana/kibana.yml - press enter
- Now, we make the token for the web setup for kibana by typing:
elasticsearch-create-enrollment-token –scope kibana - Press enter
- It will give you a code to copy and paste back into kibana browser window:
- Press: Configure Elastic
It will then ask you for a verification code
- Go back to the jail’s shell and copy/paste the command: kibana-verification-code
- Press enter
- Note the code to type in the box. Type in code and click: “Verify”
Wahoo!
You can now log in with the elastic account that you created/reset the password for way back when…
Let’s setup monitoring.
- Click on the menu on the left – the three lines
- Go to the bottom of the menu and click Stack Monitoring
- Click: “Or, set up with self monitoring”
- Click: “Turn on Monitoring”
- Wait for it to finish
Make sure the everything is good!
Install FSCrawler and Tesseract OCR
- type: pkg install wget
- press enter
- press “y”
We need the location of the newest file: Located here. Note: Only the latest version has the ElasticSearch version 8 extensions. The other older versions will not work. You need the link of the latest one… at the bottom of the page… right click and copy link
- Go back in the jail’s shell type this: cd
- press enter
~you should be back at the root home folder~ - Type: wget https://s01.oss.sonatype.org/content/repositories/snapshots/fr/pilato/elasticsearch/crawler/fscrawler-distribution/2.10-SNAPSHOT/fscrawler-distribution-2.10-20231023.160816-291.zip
- press enter
- it will download the files
- type: unzip fscrawler-distribution-2.10-20231023.160816-291.zip
This extracts the files - If you type ls and press enter. You should see the new files:
- for ease of typing… rename the folder by typing this:
mv fscrawler-distribution-2.10-SNAPSHOT fscrawler
Now… we initiate the first run of the fscrawler to create basic setting file for us to edit
- type: ./fscrawler/bin/fscrawler scanfiles
- press enter
- type “y” to create
- press enter
- open the newly created settings file to edit by typing: nano .fscrawler/scanfiles/_settings.yaml
- edit the url line to indicate the location of the files which are now appearing in /mnt
url: “/mnt” - edit the elasticsearch: nodes: url: to show your jail’s ip address: 192.168.1.234
- add a line and type: username: “elastic”
- add a line and type: password: “bZb0v7BlO38sWyB0a1M7”
use your password here from when you reset it earlier and wrote it down - we are not doing SSL verification
- press control+o to save
- press enter
- press control+x to exit
- Install tesseract OCR package by typing: pkg install tesseract
- press enter
- press “Y”
- wait for it to finish
- See if it all starts okay by typing the following: ./fscrawler/bin/fscrawler scanfiles
Set a Cron to Schedule the FSCrawler
- type crontab -e
- press enter
- The screen enters a vi editor, press “i” to enter edit mode
- set the runtime by using @600 for running every 10 minutes. Type: @600
- then type the command /root/fscrawler/bin/fscrawler scanfiles
- press esc
- press :w
- press enter
- press :
- press q
- press enter
The search will run on the new schedule and create an index for you to search in Kibana
Fix the memory Issue
ElasticSearch, Kibana, and FScrawler are all running in Java virtual machines that will each will take up to 50% of the total system memory. I’m sure this made some sense to someone… THIS WILL CRASH your TrueNAS server. You need enough memory for ZFS to work, and if you accidentally decide to do anything, poof… system halt… process protection starts to kill off tasks that you might actually need.
Now we tweak the memory footprint for the JVM.
We are going to edit the system wide environment variables. In FreeBSD these are stored in the csh settings file
- Type: nano /etc/csh.cshrc
- We can add a line to set the JAVA_HOME. Type: setenv JAVA_HOME /usr/local/openjdk17
- Add a line to fix the JAVA Memory. This has 2 parts, the starting (Xms) and max (Xmx)
I’m picking approx 2 GB of memory use for each instance, that is 2 GB x 3 processes = 6 GB.
Type: setenv FS_JAVA_OPTS “-Xmx2048m -Xms2048m” - ctrl + o
- enter to save
- ctrl +x
- type: exit
- press enter
- You should be back at the jail’s control panel, press the “restart” button
DONE!
At this point, the jail system should reboot and start without any errors.
It will take a few moments for the servers to all start-up. Then, you can then go to Kibana dashboard and monitor the progress, do a search, look through your index.
The cron scheduler will start FSCrawler in a few minutes. It will first index the folders, and then start on the files. You may want to modify the settings in the fscrawler settings further to do OCR, not do OCR, exclude some files, etc.
Keep in mind… depending on how many files and the complexity, it might take days to complete.