The PHP script below parses and returns the URL and ALT of image tags found in any markdown text.
Note: I added support for the HTML image tag as well since markdown is a superset of HTML.
Script
01: <?php
02: $re = '/!\[(?<altText>.*)\]\s*\((?<imageURL>.+)\)|img\s*src="(?<imageURL1>[^"]*)"\s*alt="(?<altText1>[^"]*)" \/>|img\s*alt="(?<altText2>[^"]*)"\s*src="(?<imageURL2>[^"]*)" \/>/m';
03: $str = 'AWS S3 is a cloud storage service that caters to the storage needs of modern software applications. S3 buckets can be used to host static sites.
04:
05: ## Getting started
06: Once you have your AWS account all setup you can login and then use the search bar up top to search for the S3 service.
07:
08: ![alt1](/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.jpg)
09:
10: <img src="/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png" alt="alt2" />
11:
12: <img alt="alt3" src="/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.jpg" />';
13:
14: preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
15:
16: $images = [];
17: function getData($set1, $set2, $set3) {
18: if(strlen($set1)) return $set1;
19: if(strlen($set2)) return $set2;
20: if(strlen($set3)) return $set3;
21: return '';
22: }
23: foreach($matches as $eachMatch) {
24: $images[] = [
25: 'src' => getData($eachMatch['imageURL'] , $eachMatch['imageURL1'] , $eachMatch['imageURL2']),
26: 'alt' => getData($eachMatch['altText'] , $eachMatch['altText1'] , $eachMatch['altText2'])
27: ];
28:
29: }
30: // Print the entire match result
31: echo json_encode($images, true);
32: ?>
The regex matches three types of image tag structures:
- markdown syntax i.e.:
[alt](URL)
- HTML image tag where src comes before the alt
- HTML image tag where alt comes before src
The above possible matches are why we have the getData
function. The function returns data from whichever structure happens to be a match.
Output
01: [
02: {
03: "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.jpg",
04: "alt": "alt1"
05: },
06: {
07: "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.fpg",
08: "alt": "alt2"
09: },
10: {
11: "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.mono",
12: "alt": "alt3"
13: }
14: ]
Here is another article you might like 😊 Remove .html From The End Of URLs | Netlify