Top 50 Splunk interview questions and answers

Table of Contents

1) Define Splunk

It is a software technology that is used for searching, visualizing, and monitoring machine-generated big data. It monitors and different types of log files and stores data in Indexers.

2) List out common ports used by Splunk.

Common ports used by Splunk are as follows:

Web Port: 8000

Management Port: 8089

Network port: 514

Index Replication Port: 8080

Indexing Port: 9997

KV store: 8191

3) Explain Splunk components

The fundamental components of Splunk are:

Universal forward: It is a lightweight component which inserts data to Splunk forwarder.

Heavy forward: It is a heavy component that allows you to filter the required data.

Search head: This component is used to gain intelligence and perform reporting.

License manager: The license is based on volume & usage. It allows you to use 50 GB per day. Splunk regular checks the licensing details.

Load Balancer: In addition to the functionality of default Splunk loader, it also enables you to use your personalized load balancer.

4) What do you mean by Splunk indexer?

It is a component of Splunk Enterprise which creates and manages indexes. The primary functions of an indexer are 1) Indexing raw data into an index and 2) Search and manage Indexed data.

5) What are the disadvantages of using Splunk?

Some disadvantages of using Splunk tool are:

Splunk can prove expensive for large data volumes. Dashboards are functional but not as effective as some other monitoring tools. Its learning curve is stiff, and you need Splunk training as it’s a multi-tier architecture. So, you need to spend lots of time to learn this tool. Searches are difficult to understand, especially regular expressions and search syntax.

6) What are the pros of getting data into a Splunk instance using forwarders?

The advantages of getting data into Splunk via forwarders are TCP connection, bandwidth throttling, and secure SSL connection for transferring crucial data from a forwarder to an indexer.

7) What is the importance of license master in Splunk?

License master in Splunk ensures that the right amount of data gets indexed. It ensures that the environment remains within the limits of the purchased volume as Splunk license depends on the data volume, which comes to the platform within a 24-hour window.

8) Name some important configuration files of Splunk

Commonly used Splunk configuration files are:

Inputs file

Transforms file

Server file

Indexes file

Props file

9) Explain license violation in Splunk.

It is a warning error that occurs when you exceed the data limit. This warning error will persist for 14 days. In a commercial license, you may have 5 warnings within a 1-month rolling window before which your Indexer search results and reports stop triggering.

However, in a free version, license violation warning shows only 3 counts of warning.

10) What is the use of Splunk alert?

Alerts can be used when you have to monitor for and respond to specific events. For example, sending an email notification to the user when there are more than three failed login attempts in a 24-hour period.

11) Explain map-reduce algorithm

Map-reduce algorithm is a technique used by Splunk to increase data searching speed. It is inspired by two functional programming functions 1) reduce () 2) map().

Here map() function is associated with Mapper class and reduce() function is associated with a Reducer class.

12) Explain different types of data inputs in Splunk?

Following are different types of data inputs in Splunk:

Using files and directories as input

Configuring Network ports to receive inputs automatically

Add windows inputs. These windows inputs are of four types: 1) active directory monitor, 2) printer monitor, 3) network monitor, and 4) registry inputs monitor.

13) How Splunk avoids duplicate log indexing?

Splunk allows you to keeps track of indexed events in a fish buckets directory. It contains CRCs and seeks pointers for the files you are indexing, so Splunk can’t if it has read them already.

14) Explain pivot and data models.

Pivots are used to create the front views of your output and then choose the proper filter for a better view of this output. Both options are beneficial for the people from a semi-technical or non-technical background.

Data models are most commonly used for creating a hierarchical model of data. However, it can also be used when you have a large amount of unstructured data. It helps you make use of that information without using complicated search queries.

15) Explain search factor and replication factor?

Search factor determines the number of data maintained by the indexer cluster. It determines the number of searchable copies available in the bucket.

Replication factor determines the number of copies maintained by the cluster as well as the number of copies that each site maintains.

16) What is the use of lookup command?

Lookup command is generally used when you want to get some fields from an external file. It helps you to narrow the search results as it helps to reference fields in an external file that match fields in your event data.

17) Explain default fields for an event in Splunk

There are 5 default fields which are barcoded with every event into Splunk. They are: 1) host, 2) source, 3) source type, 4) index, and 5) timestamp.

18) How can you extract fields?

In order to extract fields from either sidebar, event lists or the settings menu using UI.

Another way to extract fields in Splunk is to write your regular expressions in a props configuration file.

19) What do you mean by summary index?

A summary index is a special index that stores that result calculated by Splunk. It is a fast and cheap way to run a query over a longer period of time.

20) How to prevent events from being indexed by Splunk?

You can prevent the event from being indexed by Splunk by excluding debug messages by putting them in the null queue. You have to keep the null queue in transforms.conf file at the forwarder level itself.

21) What is a Splunk Forwarder? What are the different types of Splunk Forwarders?

Splunk Forwarder or Splunk Universal Forwarder is a free, dedicated version of Splunk Enterprise that contains only the essential components required to forward data. It is designed to run on production servers, having minimal CPU and memory usage. It is used to gather data from various inputs and forward the data to Splunk indexers. After that, the data would be available for searching.

There are mainly two types of Splunk Forwarders:

Universal Forwarder (UF): It is used to gather data locally. It can’t parse or index data.

Heavyweight Forwarder (HWF): It has advanced functionalities and generally works as a remote collector, intermediate forwarder, and possible data filter. It can parse data so; it is not recommended for production systems.

22) What are the most important configuration files in Splunk?

Following is the list of most important configuration files in Splunk:

props.conf

indexes.conf

inputs.conf

transforms.conf

server.conf

23) What are the common port numbers used by Splunk?

Following is the list of the common port numbers used by Splunk:

Splunk Web Port: 8000

Splunk Management Port: 8089

Splunk Index Replication Port: 8080

Splunk Network port: 514 (Used to get data from the Network port, i.e., UDP data)

Splunk Indexing Port: 9997 KV store: 8191

24) What do you understand by Splunk App?

In Splunk, the Splunk app is a container or directory of configurations, searches, dashboards, etc.

25) What are the features not available in Splunk Free?

Following is a list of features that are not available in the Splunk Free version:

Authentication and scheduled searches/alerting

Deployment management

Distributed search

Forwarding in TCP/HTTP (to non-Splunk)

26) What are the different types of Splunk dashboards available in Splunk?

Following are the three different types of Splunk dashboards available in Splunk:

Real-time dashboards

Dynamic form-based dashboards

Dashboards for scheduled reports

27) What will happen if the License Master is unreachable in Splunk?

In Splunk, if the license master is not available or unreachable, the license slave will start a 24-hour timer, after which the search will be blocked on the license slave (though indexing continues). After that, the users will not be able to search for data in that slave until it can reach the license master again.

28) What are the different types of search modes supported in Splunk?

Splunk supports the following three types of dashboards:

Fast mode

Smart mode

Verbose mode

29) Where is the Splunk Default Configuration stored?

The Splunk Default Configuration is stored at $splunkhome/etc/system/default

30) What are the advantages of feeding data into a Splunk instance through Splunk Forwarders?

The biggest advantages of feeding data into a Splunk instance through Splunk Forwarders are that you can get the three significant benefits:

TCP connection

Bandwidth throttling

An encrypted SSL connection to transfer data from a Forwarder to an Indexer.

Splunk’s architecture is made so that the data forwarded to the Indexer is load-balanced by default. In this case, if one Indexer goes down for some reason, the data can quickly re-route itself via another Indexer instance. Another advantage is that the Splunk Forwarders cache the events locally before forwarding them, creating a temporary backup of the data.

31) What is a license violation in Splunk?

In Splunk, a license violation is a warning error when the data limit is exceeded. This warning error persists for 14 days. If you have a commercial license, you may see 5 warnings within a 1-month rolling window before which your Indexer search results and reports stop triggering. If you have a free Splunk version, you will see 3 license violation warnings.

32) What is the use of Splunk DB Connect?

Splunk DB Connect is a generic SQL database plugin specially designed for Splunk. It facilitates users to integrate database information with Splunk queries and seamlessly get reports.

33) Why is license master important in Splunk?

The license master is important in Splunk because it ensures that the right amount of data gets indexed. It also ensures that the environment remains within the limits of the purchased volume. The Splunk license depends on the data volume, which comes to the platform within a 24-hour window.

34) What is the “Summary Index” in Splunk? What is its advantage?

In Splunk, the Summary Index specifies a default Splunk index used to store data retrieved from scheduled searches over time. Splunk Enterprise uses the Summary Index by default if a user does not specify or indicate another.

The biggest advantage of the Summary Index is that it facilitates users to retain the analytics and reports even after the data has aged.

35) What is the main function of the Splunk Indexer?

As the name specifies, the Splunk Indexer is used to create and manage indexes.

There are the two main functions of the Splunk Indexer:

It is used to index raw data into an index.

It is used to search and manage the indexed data.

36) What does the Splunk License specify?

The Splunk license specifies how much data we can index per calendar day (within 24 hours).

37) How does the Splunk License determine 1 day?

The Splunk License determines 1 day from midnight to midnight on the clock of the license master.

38) What is the difference between Splunk with Spark?

Following is a list of key differences between Splunk with Spark:

Criteria	Splunk	Spark
Deployment area	Splunk is used for collecting large amounts of machine-generated data.	Spark is used for iterative applications and in-memory processing.
Nature of tool	It is proprietary software. It is not open-sourced.	It is open-source software.
Working mode	It works on streaming mode.	It works on both streaming and batch modes.

39) What are the disadvantages of using the Splunk tool?

Following is a list of some disadvantages of using the Splunk tool:

Splunk is not open-source software.

You have to pay a specific price if you want a complete Splunk IT Solutions so, it may prove expensive for large data volumes.

Splunk dashboards are functional but not as effective as some other monitoring tools.

Splunk has a multi-tier architecture, and its learning curve is stiff.

So, you need to invest a lot of time to learn this tool.

You must need Splunk training to use it effectively.

In Splunk, searches are difficult to understand especially regular expressions and search syntax.

40) What are the advantages of using forwarders to get data into a Splunk instance?

Some key advantages of getting data into Splunk via forwarders are:

TCP connection

Bandwidth throttling

A secure SSL connection for transferring important data from a forwarder to an indexer.

41) What are some important Splunk search commands used in the Splunk tool?

Following is a list of some important Splunk search commands used in the Splunk tool:

Abstract.

Addtotals.

Accum.

Anomalies.

Erex.

Filldown.

Rename.

Typer etc.

42) What is the use of Transaction and Stats commands in Splunk?

In Splunk, transaction, and stats, both commands are used for different purposes. The transaction command is mostly used in two specific cases:

The transaction command is used when the unique ID (from one or more fields) alone is not sufficient to discriminate between two transactions. In this case, we have to reuse the identifier. When we have to reuse the identifier, for example, in DHCP logs, a particular message is used to identify the beginning or end of a transaction. For example, web sessions are identified by a cookie/client IP. In this case, the time or pauses are also used to segment the data into transactions.

It is also used when we want to see the raw text of events combined rather than an analysis of the constituent fields of the events.

In other cases, it is preferred to use stats commands. The performance of the stats command is higher, so it is best suited for distributed search environment. We can also use the stats command in the case of a unique ID.

43) What are some important configuration files used in Splunk?

Some important and most commonly used Splunk configuration files are:

Inputs file

Transforms file

Server file

Indexes file

Props file

44) What do you understand by Buckets? Explain the Bucket Lifecycle of Splunk.

In Splunk, buckets are the directories used to store the indexed data. It is a physical directory that chronicles the events of a specific period. A bucket undergoes the following stages of transformation over time.

Hot Bucket: A hot bucket stores the newly indexed data. It is open for writing and new additions. An index can have one or more hot buckets.

Warm Bucket: A warm bucket is used to store the data rolled out from a hot bucket.

Cold Bucket: The cold bucket is used to store the data rolled out from a warm bucket.

Frozen Bucket: A frozen bucket stores the data rolled out from a cold bucket. By default, the Splunk Indexer deletes the frozen data. However, Splunk provides an option to archive it. One thing you must remember is that frozen data is not searchable.

45) What is the difference between Index time and Search time?

In Splunk, the index time is a period when the data is consumed and the point when it is written to disk. On the other hand, search time occurs when the search is run as events are composed by the search.

46) What is the difference between stats and eventstats commands?

Stats Command: The stats command generates summary statistics of all the existing fields in the search results. After generating summary statistics, it saves them as values in new fields.

Eventstats: Eventstats is similar to the stats command, but it aggregates results and adds inline to each event if the aggregation is pertinent to that event. The eventstats command computes the requested statistics, like the stats command does, but aggregates them to the original raw data.

47) How can you reset the Splunk administrator password?

We can reset the administrator password by performing the following steps:

First, login into the server on which you have installed the Splunk tool.

Now, rename the password file and then again start the Splunk tool.

In this step, you can sign into the server by using the username of either the administrator or admin with a password ‘change me’ option.

48) What are the top direct competitors of Splunk tool?

The top direct competitors of Splunk tool are Logstash, Loggly, LogLogic, Sumo Logic, etc.

49) How can you troubleshoot Splunk performance issues?

You should perform the following steps to troubleshoot the Splunk performance issues:

First, check the splunkd.log to find if there is any error.

Then, check the server performance issues (CPU/memory usage, disk i/o, etc.) After that, check the number of saved searches running in the background and their system resources consumption.

Install the SOS (Splunk on Splunk) app and check if the dashboard shows any warning or errors. Now, install a Firefox extension called Firebug and enable it in your system.

Now, log into Splunk using Firefox, open the Firebug’s panels, and go to the ‘Net’ panel to enable it. The Net panel displays the HTTP requests and responses and the time spent in each. Here, you will see which requests are slowing down Splunk and affecting the overall performance.

By following the above steps, you can troubleshoot the Splunk performance issues and enhance the performance.

50) What is Sourcetype in Splunk?

In Splunk, Sourcetype specifies a default field used to identify the data structure of an incoming event. We have to set Sourcetype at the forwarder level for indexer extraction to identify the different data formats easily. It also determines how Splunk Enterprise formats the data during the indexing process. For this, we have to assign the Sourcetype to your data correctly. If you provide accurate timestamps and event breaks to the indexed data, you can make the data searching even easier.

51) What is the Summary Index in Splunk?

A summary index is the default Splunk index (the index that Splunk Enterprise uses if we do not indicate another one) .

52) What is Splunk DB Connect?

Splunk DB Connect is a generic SQL database plugin for Splunk that allows us to easily integrate database information with Splunk queries and reports.