Text cleaning

Matches only English words separated by hyphen and space, and ignores any other characters

text = "This is a sam-ple text, with punctuation and 123 num-bers, co- llor and hel- lo"

----
print(matches)  # ['sam-ple', 'num-bers', 'co-llor', 'hel-lo']
\b[a-zA-Z]+(?:-[a-zA-Z]+)+\b

Match only words separated by hyphen and space in the middle

text = "This is a test of hyphen- ated words. Here is another word: com- puter."

----
# ['hyphen- ated']
\b[a-zA-Z]+- [a-zA-Z]+\b
----
\b[a-zA-Z]+-[ ]?[a-zA-Z]+\b

VTT editing

Text lines longer than 40 characters

^.{41,}$