Home

iamklaus

Friends' Entries

You are viewing the most recent 1 entries.

14th December 2009

sophiaserpentia, posting in php @ 10:41am: poorly formed JSON - options?
Hello everyone!

For a project I'm working on, I'm writing a screen scraper. Unfortunately the material I am attempting to mine is poorly-formed JSON (probably parsed with proprietary code). json_decode() returns NULL.

My first thought was that I'd just parse it with regular expressions. Here I've run into a question I can't seem to work out. Much of the data I'm trying to parse looks like

name:'blah blah blah'

So I thought perhaps I could use
/name:\'([^'.*]*)\'/
to grab the stuff between single quotes.

Unfortunately some of the data has escaped single quotes inside, like this:
name:'blah\'s blah blah'
in which case my regex returns only "blah\".

So is there a way to say in regex speak, "stop at the next single quote, unless it is escaped with a slash?" I've not been able to find any reference to a way to write a regex which has an exception to a character negation.

As an alternative, I'm wondering if maybe there's something like Tidy, but for JSON. So far I'm not finding much.

Thanks!

ETA: After much trial and error I found a solution:
name:\'([^']+\\\'[^']+)\'|name:\'([^']*)\'

But I'd still be curious to learn if there's a program like Tidy for JSON.

(3 i will do lols | will you do lol?)

Powered by LiveJournal.com

Advertisement