The script below makes it possible to parse and extract URLs and ALT data of image tags from markdown text using Javascript.
Script
01: <script>
02:
03: const regex = /!\[(?<altText>.*)\]\s*\((?<imageURL>.+)\)|img\s*src="(?<imageURL1>[^"]*)"\s*alt="(?<altText1>[^"]*)" \/>|img\s*alt="(?<altText2>[^"]*)"\s*src="(?<imageURL2>[^"]*)" \/>/gm;
04:
05: const str = `
06: ## What is Lorem Ipsum?
07: Lorem Ipsum is simply dummy text of the printing and typesetting industry.
08:
09: ![alt1](/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png)
10:
11: <img src="/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png" alt="alt2" />
12:
13: <img alt="alt3" src="/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png" />`;
14:
15: let m;
16: let images = [];
17: while ((m = regex.exec(str)) !== null) {
18: if (m.index === regex.lastIndex) regex.lastIndex++;
19: images.push({
20: alt : m.groups.altText ?? m.groups.altText1 ?? m.groups.altText2,
21: src : m.groups.imageURL ?? m.groups.imageURL1 ?? m.groups.imageURL2
22: });
23: }
24: console.log(images);
25: </script>
The regex matches three types of image tags:
- markdown version i.e.:
[alt](URL)
- HTML image tag where src comes before the alt
- HTML image tag where alt comes before src
These three possible outcomes are why we have the ternary check on lines 20 and 21
. This will return data from whichever regex group happens to contain data.
Output
01: [
02: {
03: "alt": "alt1",
04: "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png"
05: },
06: {
07: "alt": "alt2",
08: "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png"
09: },
10: {
11: "alt": "alt3",
12: "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png"
13: }
14: ]
Here is another article you might like 😊 Regex get all images in a Markdown file | PHP