CURRENT EPOCH · EPOCHTIME.TOOLS · A PRECISION INSTRUMENT FOR TIME
Converter Batch Difference Blog
Languages
JavaScript Python TypeScript Go Rust Java PHP SQL Bash
Specialty
LDAP Timestamp .NET Ticks Chrome/WebKit Cocoa / Core Data Discord Timestamp Excel OADate Unix Hex
Standards
ISO 8601 Guide Year 2038 NTP Timestamp GPS Time Julian Day

The setup

You're staring at a multi-gigabyte log file. You want to filter, sort, or otherwise process entries by timestamp. The dates are in some format, possibly more than one format, possibly with subtle variations between server versions.

This is a regex cookbook for the formats I run into most often. Each entry includes:

1. ISO 8601 / RFC 3339 — the universal format

2026-05-15T14:30:00.123Z user logged in
2026-05-15T14:30:00+05:30 request handled

Regex:

\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})

Failure modes: this matches "looks like ISO 8601" not "is valid ISO 8601." It accepts impossible dates like 2026-13-45. For most log-extraction purposes, that's fine — you want everything that could be a timestamp, and you'll validate downstream.

2. Apache/nginx access log default format

192.0.2.1 - - [15/May/2026:14:30:00 +0000] "GET /api HTTP/1.1" 200 1234

Regex (extracts the bracketed timestamp including the offset):

\[(\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\s+[+-]\d{4})\]

The month is the three-letter abbreviation (Jan, Feb, etc.) — not the number. Parsing this back into a datetime requires a lookup. In Python:

from datetime import datetime
ts = datetime.strptime("15/May/2026:14:30:00 +0000",
                      "%d/%b/%Y:%H:%M:%S %z")

Failure modes: locale-dependent. The month abbreviations are English in the standard Apache/nginx config, but some non-English servers might log in their own language. Most don't.

3. syslog (RFC 3164, the old one)

May 15 14:30:00 hostname process[123]: log message

Regex:

^([A-Za-z]{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2})

Big failure mode: no year. RFC 3164 syslog timestamps don't include the year. You have to infer it from the log file's mtime or from when you collected the logs. This means if you're reading old logs that span a year boundary, you can get January entries that are technically from the previous year.

Most modern systems have moved to RFC 5424 syslog (next entry), which fixes this. But if you're working with Cisco gear, old Linux distros, or random embedded devices, RFC 3164 is what you get.

4. syslog (RFC 5424, the new one)

2026-05-15T14:30:00.123456+00:00 hostname process - - - log message

This is just ISO 8601 with extra fields after, so regex 1 works:

^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2}))

5. journald — the structured one

If you're reading journalctl output directly:

May 15 14:30:00 hostname process[123]: message

Same as RFC 3164 — same year ambiguity. But if you use journalctl --output=short-iso or --output=json, you get proper ISO 8601 instead. Always prefer the structured output:

journalctl --output=short-iso --no-pager -u myservice
# → 2026-05-15T14:30:00+0000 hostname process[123]: message

6. Unix epoch seconds (10-digit)

1747318200 SOMETHING_HAPPENED key=value

Regex (matches any 10-digit integer at the start of a line — be careful, this also matches phone numbers and IDs):

^(\d{10})\b

To avoid false matches, anchor to your specific log format:

^(\d{10})\s+[A-Z_]+\s

Failure mode: any 10-digit number looks like a Unix timestamp. A user ID, a session ID, an IP address (as integer), a random hex value cast to decimal — they all match this regex. If you can constrain by what comes after the number, do so.

7. Unix epoch milliseconds (13-digit)

Same as above but 13 digits:

^(\d{13})\b

Used by JavaScript console logs, many Node-based loggers, and most web servers. Less likely to collide with other ID types since 13-digit integers are less common than 10.

8. ISO 8601 with milliseconds (often comma-separated, European style)

2026-05-15 14:30:00,123 INFO message
2026-05-15 14:30:00.123 INFO message

Java's log4j and Python's standard logging module use this format by default. Note the space between date and time instead of T, and the comma (or period) for fractional seconds:

(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}[.,]\d{3})

Convert to a parseable form by replacing comma with period:

# Python
ts_str = match.group(1).replace(',', '.')
ts = datetime.strptime(ts_str, "%Y-%m-%d %H:%M:%S.%f")

9. RFC 2822 (email-style)

Date: Wed, 15 May 2026 14:30:00 +0000
[Wed May 15 14:30:00 UTC 2026]
Wed, 15 May 2026 14:30:00 GMT

This one's irregular — the comma after the day-of-week is optional, the timezone can be +0000, UTC, GMT, or various other strings. A forgiving regex:

(?:[A-Za-z]{3},?\s+)?\d{1,2}\s+[A-Za-z]{3}\s+\d{4}\s+\d{2}:\d{2}:\d{2}\s+(?:[+-]\d{4}|[A-Z]{3,4})

Most languages have a dedicated parser for this format because it's so common in email and HTTP headers. Use the parser, not the regex, to actually interpret the value. The regex is just for extracting the substring.

10. Custom epoch (the "what is this even" format)

Sometimes you'll find logs with timestamps that don't match any standard. A SaaS app might use milliseconds-since-app-start. An IoT device might use seconds-since-its-own-boot. A printer driver might use Windows FILETIME because the original engineer copy-pasted it from MSDN.

If you see a number that looks like a timestamp but doesn't decode reasonably as Unix time, try the other epochs (see the cheat sheet in the specialty timestamp formats post). The most common alternatives:

Universal "extract anything that looks like a date" regex

If you just want to find all the date-like strings in a file regardless of format:

(?:
  \d{4}-\d{2}-\d{2}(?:[T\s]\d{2}:\d{2}:\d{2}(?:[.,]\d+)?(?:Z|[+-]\d{2}:?\d{2})?)?  # ISO
  |
  \d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\s+[+-]\d{4}  # Apache/nginx
  |
  [A-Za-z]{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}  # syslog
  |
  \b\d{10,13}\b  # Unix s or ms
)

Use it with re.VERBOSE in Python or the equivalent in other languages to keep it readable.

Three rules of thumb

  1. Extract first, validate later. The regex just gets you a candidate substring. Validating that it's a real date (with a real datetime library) is a separate step.
  2. If logs span a year boundary, prefer formats with the year. RFC 3164 syslog without a year is a real source of off-by-one-year bugs.
  3. Test with edge cases. Leap seconds. New Year's transitions. Daylight saving transitions. Single-digit days vs zero-padded days. The same regex can match all variations if you're careful with optional groups.

You can spot-check any extracted timestamp by pasting it into the main converter — it auto-detects most common formats and tells you what it thinks the value is.


Published April 17, 2026. Tagged: regex, logs, parsing.

← Back to blog  ·  Try the converter