php - Not gready regex doesn't work -
regex:
preg_match('/<td[^<^>]*>(.*?)<\/td><td[^<^>]*>'.preg_quote('<input type=\'text\' name=\'nazwisko\'>', '/').'<\/td>/ui', $form_string, $matches);
input:
<form action='http://freebot.pl/post.php' name='implebot.plshow' method='post' onsubmit='return sprawdzformularz(this)'> <table><tr><td align=right> <input type='hidden' name='uid' value='60431'> email :</td><td><input type='text' name='email'></td></tr> <tr><td align=right>imię :</td><td><input type='text' name='imie'></td></tr><tr><td align=right>nazwisko :</td><td><input type='text' name='nazwisko'></td></tr><tr><td align=right>#opcja1 :</td><td><input type='text' name='pole_1' value='war.1'></td></tr><input type='hidden' name='pole_2' value='war.2'><tr><td align=right>#opcja3 :</td><td><select name='pole_3'><option></option><option value='s1'>s1</option><option value='s2'>s2</option><option value='s3'>s3</option><option value='s4'>s4</option><option value='s5'>s5</option></select><tr><td align=right>#opcja4 :</td><td><select name='pole_4'><option></option><option value='a'>a</option><option value='b'>b</option><option value='c'>c</option><option value='d'>d</option><option value='e'>e</option><option value='f'>f</option><option value='g'>g</option></select><tr><td align=right>#opcja5 :</td><td><input type='text' name='pole_5' value='war.5'></td></tr></table><input type='hidden' name='zrodlo' value='formularz1'>zgadzam się z <input type='checkbox' name='pp' checked><a href='http://' >polityką prywatności</a><br><input type='submit' value='wyślij'></form>
$matches[1]:
<input type='hidden' name='uid' value='60431'>email :</td><td><input type='text' name='email'></td></tr><tr><td align=right>imi─Ö :</td><td><input type='text' name='imie'></td></tr><tr><td align=right>nazwisko :
instead of:
nazwisko :
i got (.*?)
in <td[^<^>]*>(.*?)<\/td>
should give me expected nazwisko :
what i'm doing wrong?
i don't see reason use ungreedy quantifiers in pattern. try instead:
preg_match('~<td[^>]*>([^<]*)</td><td[^>]*>' .preg_quote("<input type='text' name='nazwisko'>") .'</td>~i', $form_string, $matches);
if td tags can contain html content, can replace ([^<]*)
((?>[^<]+|<+(?!/td>))*)
explanation:
(?> # atomic group [^<]+ # characters expect < 1 or more times | # or <+(?!/td>) # < 1 or more times not followed /td> (negative lookahead) )* # close atomic group, 0 or more times
in other words, part match: characters not < or < not followed /td>, each 1 or more times, of 0 or more times. it's little longer (.*?)
more efficient far.
the reason regex engine must test each character, 1 one, followed </td>
ungreedy pattern. in pattern regex engine test when character <
.
i use atomic group (?>...)
instead non capturing group (?:...)
when possible, practice, can find more infos here.
Comments
Post a Comment