Why do Glib::ustring::operator[] and at() return values, not references?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Why do Glib::ustring::operator[] and at() return values, not references?

Daniel Boles
Title says it all really. I had to use an std::string in one place because I needed to modify it quickly and with less hassle than using .replace() - and Glib::ustring won't let me get a reference to perform such modification.

The documentation states
    "No reference return; use replace() to write characters."
but does not explain why, nor does the commit log: these comments were added way back in the initial revision.

So I'm wondering whether there's really a reason for this, or if it is just something that no one has wanted to change (until now!)


_______________________________________________
gtkmm-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gtkmm-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why do Glib::ustring::operator[] and at() return values, not references?

Chris Vine-3
On Sat, 24 Jun 2017 15:51:49 +0100
Daniel Boles <[hidden email]> wrote:

> Title says it all really. I had to use an std::string in one place
> because I needed to modify it quickly and with less hassle than
> using .replace() - and Glib::ustring won't let me get a reference to
> perform such modification.
>
> The documentation states
>     "No reference return; use replace()
> <https://developer.gnome.org/glibmm/stable/classGlib_1_1ustring.html#a0f0c9b5aaad58279d3ac87a86a173f4a>
> to write characters."
> but does not explain why, nor does the commit log: these comments were
> added way back in the initial revision.
>
> So I'm wondering whether there's really a reason for this, or if it
> is just something that no one has wanted to change (until now!)

It is because UTF-8 is a multibyte encoding, and any one character may
require between 1 and 5 bytes to represent it.  If you were allowed to
change a byte at will you would be able to introduce invalid encoding
sequences.  As to the absense of documentation, maybe it is because this
was thought to be self-evident, dunno.
_______________________________________________
gtkmm-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gtkmm-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why do Glib::ustring::operator[] and at() return values, not references?

Chris Vine-3
On Sat, 24 Jun 2017 19:08:36 +0100
Chris Vine <[hidden email]> wrote:

> On Sat, 24 Jun 2017 15:51:49 +0100
> Daniel Boles <[hidden email]> wrote:
> > Title says it all really. I had to use an std::string in one place
> > because I needed to modify it quickly and with less hassle than
> > using .replace() - and Glib::ustring won't let me get a reference to
> > perform such modification.
> >
> > The documentation states
> >     "No reference return; use replace()
> > <https://developer.gnome.org/glibmm/stable/classGlib_1_1ustring.html#a0f0c9b5aaad58279d3ac87a86a173f4a>
> > to write characters."
> > but does not explain why, nor does the commit log: these comments
> > were added way back in the initial revision.
> >
> > So I'm wondering whether there's really a reason for this, or if it
> > is just something that no one has wanted to change (until now!)  
>
> It is because UTF-8 is a multibyte encoding, and any one character may
> require between 1 and 5 bytes to represent it.  If you were allowed to
> change a byte at will you would be able to introduce invalid encoding
> sequences.  As to the absense of documentation, maybe it is because
> this was thought to be self-evident, dunno.

And I should perhaps also make the point that these operators return a
32-bit unicode character, not a byte, which is consequent on the same
point.  If you allowed mutation, the length of the string (in bytes)
might change.
_______________________________________________
gtkmm-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gtkmm-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why do Glib::ustring::operator[] and at() return values, not references?

Daniel Boles
On 24 June 2017 at 19:12, Chris Vine <[hidden email]> wrote:
On Sat, 24 Jun 2017 19:08:36 +0100
Chris Vine <[hidden email]> wrote:

> It is because UTF-8 is a multibyte encoding, and any one character may
> require between 1 and 5 bytes to represent it.  If you were allowed to
> change a byte at will you would be able to introduce invalid encoding
> sequences.  As to the absense of documentation, maybe it is because
> this was thought to be self-evident, dunno.

And I should perhaps also make the point that these operators return a
32-bit unicode character, not a byte, which is consequent on the same
point.  If you allowed mutation, the length of the string (in bytes)
might change.


Right, of course. It does seem very obvious now. It seemed to completely slip my mind that we're dealing with characters of arbitrary width, not e.g. UTF-16. :( Thanks for the comprehensive answer to a stupid question!


_______________________________________________
gtkmm-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gtkmm-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why do Glib::ustring::operator[] and at() return values, not references?

Chris Vine-3
On Sat, 24 Jun 2017 19:48:07 +0100
Daniel Boles <[hidden email]> wrote:

> On 24 June 2017 at 19:12, Chris Vine <[hidden email]> wrote:
>
> > On Sat, 24 Jun 2017 19:08:36 +0100
> > Chris Vine <[hidden email]> wrote:
> >  
> > > It is because UTF-8 is a multibyte encoding, and any one
> > > character may require between 1 and 5 bytes to represent it.  If
> > > you were allowed to change a byte at will you would be able to
> > > introduce invalid encoding sequences.  As to the absense of
> > > documentation, maybe it is because this was thought to be
> > > self-evident, dunno.  
> >
> > And I should perhaps also make the point that these operators
> > return a 32-bit unicode character, not a byte, which is consequent
> > on the same point.  If you allowed mutation, the length of the
> > string (in bytes) might change.  
>
> Right, of course. It does seem very obvious now. It seemed to
> completely slip my mind that we're dealing with characters of
> arbitrary width, not e.g. UTF-16. :( Thanks for the comprehensive
> answer to a stupid question!

UTF-16 is also a variable width encoding, with surrogate pairs for
anything outside the basic multilingual plane.  Which is why UTF-16 is
regarded by many as a fairly unhelpful encoding.  It does have the
feature that for the average japanese text, it does occupy slightly
less space that UTF-8.  The same is not true of Chinese text though.
_______________________________________________
gtkmm-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gtkmm-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why do Glib::ustring::operator[] and at() return values, not references?

Daniel Boles
I wonder, would anyone be interested in adding a proxy class, to be returned by operator[] and .at() on a non-const ustring, which would provide operator gunichar() and operator=()? It would then hold a reference to the string and the index with which it was instantiated, and delegate to .replace() to do 'assignment'.

The benefits of this are:
  • more intuitive/familiar
  • could make ustring substitutable into code that currently uses std::string, or usable in template code to work on either

But the main drawback I could think of is this: It would change semantics for anyone currently using auto some_character = non_const_ustring[N], as the auto would now capture the proxy type, not a gunichar. To get the latter, the type would have to be explicitly specified to invoke the conversion operator. Or is there a clever way around this that I don't know?



_______________________________________________
gtkmm-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gtkmm-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why do Glib::ustring::operator[] and at() return values, not references?

Jonathan Wakely
On 28 June 2017 at 09:39, Daniel Boles wrote:
> But the main drawback I could think of is this: It would change semantics
> for anyone currently using auto some_character = non_const_ustring[N], as
> the auto would now capture the proxy type, not a gunichar. To get the
> latter, the type would have to be explicitly specified to invoke the
> conversion operator. Or is there a clever way around this that I don't know?

There's no way around it. There have been proposals for an "operator
auto" that would make it possible to control the deduced type, mostly
for use by expression templates, but nothing that is part of C++ yet.

Another downside of a proxy is it can outlive the string, so this
would be undefined (without some internal complexity to track
lifetimes):

auto c = Glib::ustring("foo")[0];
c = 'b';  // tries to modify the expired temporary

Again, this only happens when using 'auto' because otherwise there's
no attempt to write back into the ustring:

gunichar c = Glib::ustring("foo")[0];
c = 'w'; // just modifies c
_______________________________________________
gtkmm-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gtkmm-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why do Glib::ustring::operator[] and at() return values, not references?

Daniel Boles
On 28 June 2017 at 10:47, Jonathan Wakely <[hidden email]> wrote:
On 28 June 2017 at 09:39, Daniel Boles wrote:
> But the main drawback I could think of is this: It would change semantics
> for anyone currently using auto some_character = non_const_ustring[N], as
> the auto would now capture the proxy type, not a gunichar. To get the
> latter, the type would have to be explicitly specified to invoke the
> conversion operator. Or is there a clever way around this that I don't know?

There's no way around it. There have been proposals for an "operator
auto" that would make it possible to control the deduced type, mostly
for use by expression templates, but nothing that is part of C++ yet.

Interesting to hear about the proposals; thanks.
 
Another downside of a proxy is it can outlive the string, so this
would be undefined (without some internal complexity to track
lifetimes):

Yeah, I thought about that briefly but forgot by the time I was writing the email. I wonder whether sigc::trackable can help here, although maybe that's getting too complex to be worthwhile.




_______________________________________________
gtkmm-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gtkmm-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why do Glib::ustring::operator[] and at() return values, not references?

Daniel Boles
On 28 June 2017 at 10:51, Daniel Boles <[hidden email]> wrote:
Yeah, I thought about that briefly but forgot by the time I was writing the email. I wonder whether sigc::trackable can help here, although maybe that's getting too complex to be worthwhile. 

In fact, that definitely sounds too complex for what we'd gain. We would do better just to tell users to be careful with lifetimes, as they must already do in various other places, than to impose a new base class on ustring.



_______________________________________________
gtkmm-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gtkmm-list
Loading...