I have a site that I converted from D7. Much of the content was written before there were WYSIWYG editing plugins. As I'm updating some of the content, though, I'm enabling CKE5 for obvious reasons. The problem I'm having is that in about half the cases, the trimmed output after the CKE5 conversion is nothing at all. 

I suspect that it's happening because when it is "raw" HTML (which in my case has <br> tags between paras rather than <p> tags), the text_summary function trims it at sentence boundaries, but CKE wraps the paragraphs with <p> tags, so then the trimming happens at paragraph boundaries. My trim limit is set at 600 characters, which has always worked well in the past.

There may be an additional issue which has been a problem in Drupal for 15 years: https://www.drupal.org/project/drupal/issues/221257

Anyway, is there a way to get trimming to happen on sentence boundaries as before?

It seems to me that in typical cases, trimmed output shouldn't be nothing at all, but that could just be me. :)

Thanks.

Accepted answer

Bug report filed and I fixed it:

https://github.com/backdrop/backdrop-issues/issues/6423

Would I need to fork the entire Backdrop project to file a PR?

Comments

I suspect that it's happening because when it is "raw" HTML (which in my case has <br> tags between paras rather than <p> tags), the text_summary function trims it at sentence boundaries, but CKE wraps the paragraphs with <p> tags, so then the trimming happens at paragraph boundaries.

Hmmm... I'm not sure this is what's happening. I just tried the following (paragraph with <br>) in the Execute PHP screen provided by Devel:

dpm(text_summary('Fusce vulputate eleifend sapien. Donec vitae orci sed dolor rutrum auctor. Nunc interdum lacus sit amet orci. Nullam tincidunt adipiscing enim. Nulla neque dolor, sagittis eget, iaculis quis, molestie non, velit.<br>Sed lectus. Phasellus blandit leo ut odio. Etiam feugiat lorem non metus. Aliquam erat volutpat. Nullam tincidunt adipiscing enim.<br>Ut a nisl id ante tempus hendrerit. Vestibulum fringilla pede sit amet augue. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Suspendisse potenti. Sed augue ipsum, egestas nec, vestibulum et, malesuada adipiscing, dui.<br>Aenean imperdiet. Fusce convallis metus id felis luctus adipiscing. Pellentesque libero tortor, tincidunt et, tincidunt eget, semper nec, quam. Ut id nisl quis enim dignissim sagittis. Duis leo.<br>', 'filtered_html'));

And I'm getting this (correct) output:

Fusce vulputate eleifend sapien. Donec vitae orci sed dolor rutrum auctor. Nunc interdum lacus sit amet orci. Nullam tincidunt adipiscing enim. Nulla neque dolor, sagittis eget, iaculis quis, molestie non, velit.<br />Sed lectus. Phasellus blandit leo ut odio. Etiam feugiat lorem non metus. Aliquam erat volutpat. Nullam tincidunt adipiscing enim.

 

Interesting. I will try some similar testing and see what I find. 

Any suggestions on how to figure out what's happening?

@argiepiano - I get the same results as you do when I run it from Execute PHP (using the new field content including <p> tags), but the Teaser view is still empty:

ETA: If I manually shorten the first paragraph to below the trim limit (600 char) then that text does appear in the Teaser.

Did you solve this in the end?

Can you post here (or in a github gist) the raw content of the field?

No, I haven't yet figured it out. I did do some more poking and found that if I set a format that does not use any of the Filters, then it works fine. It seems unrelated to CKE. But if the format does use a filter, it breaks. So then I went to see if I could ID which filter(s) are the problem.

I have multiple defined formats (for historical reasons) and so was able to compare results with 3 different formats which do not use CKE at all. This one works:

But this one doesn't:

Raw content (test value):

Array (    [value] => <p>    Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32. </p> <p>    The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested. Sections 1.10.32 and 1.10.33 from "de Finibus Bonorum et Malorum" by Cicero are also reproduced in their exact original form, accompanied by English versions from the 1914 translation by H. Rackham. </p>    [format] => 1    [safe_value] => <p>    Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32. </p> <p>    The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested. Sections 1.10.32 and 1.10.33 from "de Finibus Bonorum et Malorum" by Cicero are also reproduced in their exact original form, accompanied by English versions from the 1914 translation by H. Rackham. </p>

)  

So, it seems this problem is unrelated to the Text editor, but it's rather a problem with the Filter - most likely the one converting line breaks into HTML. 

Have you tried disabling that filter in the second format (Full HTML) to see if it works? 

To clarify, this problems happens where: in the node edit form? when you view the actual node?  Both? 

I just tried that. It turns out that I can disable all the filters on Full HTML and it still doesn't work. (Cue the Twilight Zone music ....)

The config files are still rather different. Idk if it's meaningful, but here's what I've got:

Full HTML, after turning off all filters, even html corrector:

{    "_config_name": "filter.format.3",    "format": "3",    "name": "Full HTML",    "cache": true,    "status": "1",    "weight": "0",    "roles": {        "4": "4",        "3": 0,        "anonymous": 0,        "authenticated": 0    },    "filters": {        "filter_image_caption": {            "status": 0,            "weight": 4,            "module": "filter",            "settings": []        },        "filter_autop": {            "status": 0,            "weight": 1,            "module": "filter",            "settings": []        },        "filter_url": {            "status": 0,            "weight": 0,            "module": "filter",            "settings": []        },        "filter_htmlcorrector": {            "status": 0,            "weight": 10,            "module": "filter",            "settings": []        },        "filter_html_escape": {            "status": 0,            "weight": -10,            "module": "filter",            "settings": []        },        "filter_image_align": {            "status": 0,            "weight": 5,            "module": "filter",            "settings": []        },        "filter_html": {            "status": 0,            "weight": -10,            "settings": {                "allowed_html": "<a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <h3> <h4> <h5> <p> <img> <figure> <figcaption>"            },            "module": "filter"        }    },    "editor": "",    "editor_settings": [] }  

Unfiltered HTML:

{    "_config_name": "filter.format.4",    "format": "4",    "name": "Unfiltered HTML",    "cache": "0",    "status": "1",    "weight": "0",    "roles": [        4    ],    "filters": {        "filter_htmlcorrector": {            "module": "filter",            "status": "1",            "weight": "10",            "settings": []        }    } }  

To clarify, this problems happens where: in the node edit form? when you view the actual node?  Both?

Both in the "Teaser" preview and in any Views that use the Teaser value.

 

indigoxela's picture

My suggestion, if things go weird with CKEditor 5 - try TinyMCE. That module exists for exactly such a use-case.

If things are also weird or broken with Tiny, the problem isn't the editor. If things are fine with Tiny - stick with it. It's a viable alternative.

Thanks for the suggestion. It's broken for formats that don't use the editor at all, so I'll keep it in mind if I have future problems with the editor.

Figured out where the problem is coming from. This produces the expected output:

dpm(text_summary($field_value));

BUT this prints nothing:

dpm(text_summary($field_value, 1));

In my site, "1" is the ID of the "Filtered HTML" text format. So text_summary really is failing to send back the proper value when the $format_id value is set to almost  any of the filters. Only the "Unfiltered HTML" format still works.

More specifically, the problem is caused by lines 469-471 in core/modules/field/modules/text/text.module

  if (isset($format->filters['filter_autop'])) {    $line_breaks["\n"] = 1;  }

If I comment these out, the summary function works correctly.

I think this is a bug and will file a report. For one thing, regardless of whether 'filter_autop' is enabled, the value of $format->filters['filter_autop'] will be set to something. It should be $format->filters['filter_autop']->status that is being checked.