Apache Lucene Query Example

Lucene query cheatsheet

Basic Search

  • Single Term:term
    • Finds documents containing term.
  • Phrase Search:"exact phrase"
    • Finds documents containing the exact phrase.

Boolean Operators

  • AND:term1 AND term2
    • Both terms must be present.
  • OR:term1 OR term2
    • At least one of the terms must be present.
  • NOT:NOT term
    • Documents must not contain term.
  • Combination:(term1 AND term2) OR term3
    • Complex boolean logic can be applied by combining operators.

Wildcard Searches

  • Single Character Wildcard:te?t
    • Matches text with one character replaced.
  • Multiple Character Wildcard:test*
    • Matches text with zero or more characters.
  • Wildcard at Start:*test
    • Not supported directly but can be used in certain contexts.

Fuzzy Searches

  • Fuzzy:term~
    • Matches terms that are similar to the specified term.

Proximity Searches

  • Proximity:"term1 term2"~N
    • Matches terms that are within N words of each other.

Range Searches

  • Range:[start TO end]
    • Finds documents with terms within the specified range.
  • Exclusive Range:{start TO end}
    • Excludes the exact start and end values.

Regular Expressions

  • Regex:/regex/
    • Matches terms by regular expression.

Boosting Terms

  • Boost:term^N
    • Increases the relevance of a term by a factor of N.

Field-Specific Searches

  • Specific Field:fieldname:term
    • Searches for the term within a specific field.

Grouping

  • Group Queries:(query1) AND (query2)
    • Groups parts of queries for complex searches.

How to search Apache HTTPD using Lucene

These examples assume that the logs have been indexed in a Lucene-based system like Elasticsearch, and they demonstrate how to utilize various Lucene query features to filter and search log data effectively. Note that the specific fields used in these examples (ip, timestamp, response, request, etc.) should correspond to the fields defined in your Lucene schema for Apache HTTPD logs.


// 1. Find logs for a specific IP address
ip:"192.168.1.1"

// 2. Search logs within a specific date range
timestamp:[20230101 TO 20230131]

// 3. Identify logs with 4xx client error response codes
response:[400 TO 499]

// 4. Locate logs for requests to a specific URL
request:"GET /index.html HTTP/1.1"

// 5. Filter logs by a specific user-agent string
agent:"Mozilla/5.0 (Windows NT 10.0; Win64; x64)"

// 6. Search for logs with a specific referrer
referrer:"http://example.com/"

// 7. Find all logs of GET requests
request_method:GET

// 8. Filter logs resulting in 5xx server errors
response:[500 TO 599]

// 9. Identify requests to a specific directory
request:"/images/*"

// 10. Locate requests taking longer than 2 seconds
duration:>2000

// 11. Exclude logs from a specific IP address
-ip:"192.168.1.1"

// 12. Find requests for a specific file type (.jpg)
request:"*.jpg"

// 13. Identify logs from a specific day
timestamp:20230115

// 14. Search logs with responses in a byte range
bytes:[1000 TO 5000]

// 15. Filter logs by HTTP method and response code
request_method:POST AND response:200

// 16. Search for failed login attempts (custom log message)
message:"Failed login attempt"

// 17. Find logs from a range of IP addresses
ip:[192.168.1.1 TO 192.168.1.100]

// 18. Identify logs with a 200 OK response
response:200

// 19. Search for logs with specific query parameters
request:"*?user=john&*"

// 20. Locate logs with a 404 Not Found response
response:404

Rajesh Kumar
Follow me
Latest posts by Rajesh Kumar (see all)
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x