The setup
You're staring at a multi-gigabyte log file. You want to filter, sort, or otherwise process entries by timestamp. The dates are in some format, possibly more than one format, possibly with subtle variations between server versions.
This is a regex cookbook for the formats I run into most often. Each entry includes:
- An example log line
- A regex that matches the timestamp
- The PCRE / Python flavor (most other regex flavors work the same)
- Known failure modes — situations where the regex will give you wrong answers
1. ISO 8601 / RFC 3339 — the universal format
2026-05-15T14:30:00.123Z user logged in
2026-05-15T14:30:00+05:30 request handled
Regex:
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})
Failure modes: this matches "looks like ISO 8601" not "is valid ISO 8601." It accepts impossible dates like 2026-13-45. For most log-extraction purposes, that's fine — you want everything that could be a timestamp, and you'll validate downstream.
2. Apache/nginx access log default format
192.0.2.1 - - [15/May/2026:14:30:00 +0000] "GET /api HTTP/1.1" 200 1234
Regex (extracts the bracketed timestamp including the offset):
\[(\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\s+[+-]\d{4})\]
The month is the three-letter abbreviation (Jan, Feb, etc.) — not the number. Parsing this back into a datetime requires a lookup. In Python:
from datetime import datetime
ts = datetime.strptime("15/May/2026:14:30:00 +0000",
"%d/%b/%Y:%H:%M:%S %z")
Failure modes: locale-dependent. The month abbreviations are English in the standard Apache/nginx config, but some non-English servers might log in their own language. Most don't.
3. syslog (RFC 3164, the old one)
May 15 14:30:00 hostname process[123]: log message
Regex:
^([A-Za-z]{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2})
Big failure mode: no year. RFC 3164 syslog timestamps don't include the year. You have to infer it from the log file's mtime or from when you collected the logs. This means if you're reading old logs that span a year boundary, you can get January entries that are technically from the previous year.
Most modern systems have moved to RFC 5424 syslog (next entry), which fixes this. But if you're working with Cisco gear, old Linux distros, or random embedded devices, RFC 3164 is what you get.
4. syslog (RFC 5424, the new one)
2026-05-15T14:30:00.123456+00:00 hostname process - - - log message
This is just ISO 8601 with extra fields after, so regex 1 works:
^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2}))
5. journald — the structured one
If you're reading journalctl output directly:
May 15 14:30:00 hostname process[123]: message
Same as RFC 3164 — same year ambiguity. But if you use journalctl --output=short-iso or --output=json, you get proper ISO 8601 instead. Always prefer the structured output:
journalctl --output=short-iso --no-pager -u myservice
# → 2026-05-15T14:30:00+0000 hostname process[123]: message
6. Unix epoch seconds (10-digit)
1747318200 SOMETHING_HAPPENED key=value
Regex (matches any 10-digit integer at the start of a line — be careful, this also matches phone numbers and IDs):
^(\d{10})\b
To avoid false matches, anchor to your specific log format:
^(\d{10})\s+[A-Z_]+\s
Failure mode: any 10-digit number looks like a Unix timestamp. A user ID, a session ID, an IP address (as integer), a random hex value cast to decimal — they all match this regex. If you can constrain by what comes after the number, do so.
7. Unix epoch milliseconds (13-digit)
Same as above but 13 digits:
^(\d{13})\b
Used by JavaScript console logs, many Node-based loggers, and most web servers. Less likely to collide with other ID types since 13-digit integers are less common than 10.
8. ISO 8601 with milliseconds (often comma-separated, European style)
2026-05-15 14:30:00,123 INFO message
2026-05-15 14:30:00.123 INFO message
Java's log4j and Python's standard logging module use this format by default. Note the space between date and time instead of T, and the comma (or period) for fractional seconds:
(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}[.,]\d{3})
Convert to a parseable form by replacing comma with period:
# Python
ts_str = match.group(1).replace(',', '.')
ts = datetime.strptime(ts_str, "%Y-%m-%d %H:%M:%S.%f")
9. RFC 2822 (email-style)
Date: Wed, 15 May 2026 14:30:00 +0000
[Wed May 15 14:30:00 UTC 2026]
Wed, 15 May 2026 14:30:00 GMT
This one's irregular — the comma after the day-of-week is optional, the timezone can be +0000, UTC, GMT, or various other strings. A forgiving regex:
(?:[A-Za-z]{3},?\s+)?\d{1,2}\s+[A-Za-z]{3}\s+\d{4}\s+\d{2}:\d{2}:\d{2}\s+(?:[+-]\d{4}|[A-Z]{3,4})
Most languages have a dedicated parser for this format because it's so common in email and HTTP headers. Use the parser, not the regex, to actually interpret the value. The regex is just for extracting the substring.
10. Custom epoch (the "what is this even" format)
Sometimes you'll find logs with timestamps that don't match any standard. A SaaS app might use milliseconds-since-app-start. An IoT device might use seconds-since-its-own-boot. A printer driver might use Windows FILETIME because the original engineer copy-pasted it from MSDN.
If you see a number that looks like a timestamp but doesn't decode reasonably as Unix time, try the other epochs (see the cheat sheet in the specialty timestamp formats post). The most common alternatives:
- 17-19 digits: FILETIME or .NET ticks (1601 or year-1 epoch in 100ns)
- 15-17 digits: Chrome timestamp (1601 in microseconds)
- 9-10 digits: Unix seconds or Cocoa (2001) seconds
- 12-13 digits: Unix ms
- 5-6 digits: Excel OADate or Modified Julian Day
Universal "extract anything that looks like a date" regex
If you just want to find all the date-like strings in a file regardless of format:
(?:
\d{4}-\d{2}-\d{2}(?:[T\s]\d{2}:\d{2}:\d{2}(?:[.,]\d+)?(?:Z|[+-]\d{2}:?\d{2})?)? # ISO
|
\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\s+[+-]\d{4} # Apache/nginx
|
[A-Za-z]{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2} # syslog
|
\b\d{10,13}\b # Unix s or ms
)
Use it with re.VERBOSE in Python or the equivalent in other languages to keep it readable.
Three rules of thumb
- Extract first, validate later. The regex just gets you a candidate substring. Validating that it's a real date (with a real datetime library) is a separate step.
- If logs span a year boundary, prefer formats with the year. RFC 3164 syslog without a year is a real source of off-by-one-year bugs.
- Test with edge cases. Leap seconds. New Year's transitions. Daylight saving transitions. Single-digit days vs zero-padded days. The same regex can match all variations if you're careful with optional groups.
You can spot-check any extracted timestamp by pasting it into the main converter — it auto-detects most common formats and tells you what it thinks the value is.
Published April 17, 2026. Tagged: regex, logs, parsing.