Text cleaning
Matches only English words separated by hyphen and space, and ignores any other characters
text = "This is a sam-ple text, with punctuation and 123 num-bers, co- llor and hel- lo"
----
print(matches) # ['sam-ple', 'num-bers', 'co-llor', 'hel-lo']
\b[a-zA-Z]+(?:-[a-zA-Z]+)+\b
Match only words separated by hyphen and space in the middle
text = "This is a test of hyphen- ated words. Here is another word: com- puter."
----
# ['hyphen- ated']
\b[a-zA-Z]+- [a-zA-Z]+\b
----
\b[a-zA-Z]+-[ ]?[a-zA-Z]+\b
VTT editing
Text lines longer than 40 characters
^.{41,}$