EDDYMENS

Published a year ago

Regex Get All Images In A Markdown File | JS

The script below makes it possible to parse and extract URLs and ALT data of image tags from markdown text using Javascript.

Script

01: <script> 02: 03: const regex = /!\[(?<altText>.*)\]\s*\((?<imageURL>.+)\)|img\s*src="(?<imageURL1>[^"]*)"\s*alt="(?<altText1>[^"]*)" \/>|img\s*alt="(?<altText2>[^"]*)"\s*src="(?<imageURL2>[^"]*)" \/>/gm; 04: 05: const str = ` 06: ## What is Lorem Ipsum? 07: Lorem Ipsum is simply dummy text of the printing and typesetting industry. 08: 09: ![alt1](/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png) 10: 11: <img src="/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png" alt="alt2" /> 12: 13: <img alt="alt3" src="/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png" />`; 14: 15: let m; 16: let images = []; 17: while ((m = regex.exec(str)) !== null) { 18: if (m.index === regex.lastIndex) regex.lastIndex++; 19: images.push({ 20: alt : m.groups.altText ?? m.groups.altText1 ?? m.groups.altText2, 21: src : m.groups.imageURL ?? m.groups.imageURL1 ?? m.groups.imageURL2 22: }); 23: } 24: console.log(images); 25: </script>

The regex matches three types of image tags:

  • markdown version i.e.: [alt](URL)
  • HTML image tag where src comes before the alt
  • HTML image tag where alt comes before src

These three possible outcomes are why we have the ternary check on lines 20 and 21. This will return data from whichever regex group happens to contain data.

Output

01: [ 02: { 03: "alt": "alt1", 04: "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png" 05: }, 06: { 07: "alt": "alt2", 08: "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png" 09: }, 10: { 11: "alt": "alt3", 12: "src": "/images/172067068-61db1af9-bcaf-46ce-9e44-4ca919199bff.png" 13: } 14: ]

Here is another article you might like 😊 Regex get all images in a Markdown file | PHP