EDDYMENS

Published a year ago

Regex Get All Images In A Markdown File | PHP

The PHP script below parses and returns the URL and ALT of image tags found in any markdown text.

Note: I added support for the HTML image tag as well since markdown is a superset of HTML.

Script

01: <?php 02: $re = '/!\[(?<altText>.*)\]\s*\((?<imageURL>.+)\)|img\s*src="(?<imageURL1>[^"]*)"\s*alt="(?<altText1>[^"]*)" \/>|img\s*alt="(?<altText2>[^"]*)"\s*src="(?<imageURL2>[^"]*)" \/>/m'; 03: $str = 'AWS S3 is a cloud storage service that caters to the storage needs of modern software applications. S3 buckets can be used to host static sites. 04: 05: ## Getting started 06: Once you have your AWS account all setup you can login and then use the search bar up top to search for the S3 service. 07: 08: ![alt1](/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.jpg) 09: 10: <img src="/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png" alt="alt2" /> 11: 12: <img alt="alt3" src="/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.jpg" />'; 13: 14: preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0); 15: 16: $images = []; 17: function getData($set1, $set2, $set3) { 18: if(strlen($set1)) return $set1; 19: if(strlen($set2)) return $set2; 20: if(strlen($set3)) return $set3; 21: return ''; 22: } 23: foreach($matches as $eachMatch) { 24: $images[] = [ 25: 'src' => getData($eachMatch['imageURL'] , $eachMatch['imageURL1'] , $eachMatch['imageURL2']), 26: 'alt' => getData($eachMatch['altText'] , $eachMatch['altText1'] , $eachMatch['altText2']) 27: ]; 28: 29: } 30: // Print the entire match result 31: echo json_encode($images, true); 32: ?>

The regex matches three types of image tag structures:

  • markdown syntax i.e.: [alt](URL)
  • HTML image tag where src comes before the alt
  • HTML image tag where alt comes before src

The above possible matches are why we have the getData function. The function returns data from whichever structure happens to be a match.

Output

01: [ 02: { 03: "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.jpg", 04: "alt": "alt1" 05: }, 06: { 07: "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.fpg", 08: "alt": "alt2" 09: }, 10: { 11: "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.mono", 12: "alt": "alt3" 13: } 14: ]

Here is another article you might like 😊 Remove .html From The End Of URLs | Netlify