0

In this case I want to read integers from standard input such that they are separated by spaces and newline. My first attempt was similar to the following code:

fn splitter(x: String) -> impl Iterator<Item=&'static str> {
    x.as_str().split_whitespace()
}

fn valuereader<A: std::str::FromStr>() -> impl Iterator<Item=A> 
where <A as std::str::FromStr>::Err: std::fmt::Debug
{
    let a = std::io::stdin().lines();
    let b = a.map(Result::unwrap);
    let c = b.flat_map(splitter);
    c.map(|x|x.parse().expect("Not an integer!"))
}

fn main() {
    let temp: Vec<usize> = valuereader().collect();
    println!("{:?}", temp);
}

The problem is that split_whitespace wants a &str, but std::io::stdin().lines() returns an owned String. I don't want to use x.as_str().split_whitespace().collect(), because I don't want to allocate a temporary vector.

The best solution I could come up with was to use a wrapper that contains the owned String and the iterator that depends on the String, using unsafe code. The wrapper's implementation of Iterator is simply a wrapper for the iterator that depends on the String. This was the result:

mod move_wrapper {
    use std::pin::Pin;
    pub fn to_wrapper<'b, A: 'b, F, B: 'b> (a: A, f: F) -> Wrapper<A,B>
    where
        F: FnOnce (&'b A) -> B
    {
        let contained_a = Box::pin(a);
        // Here is the use of unsafe. It is necessary to create a reference to a that can live as long as long as needed.
        // This should not be dangerous as no-one outside this module will be able to copy this reference, and a will live exactly as long as b inside Wrapper.
        let b = f(unsafe{&*core::ptr::addr_of!(*contained_a)});
        Wrapper::<A,B> {_do_not_use:contained_a, dependent:b}
    }

    pub struct Wrapper<A,B> {
        _do_not_use: Pin<Box<A>>,
        dependent: B
    }

    impl<A,B: Iterator> Iterator for Wrapper<A,B>
    {
        type Item = B::Item;
        fn next(&mut self) -> Option<Self::Item> {
            self.dependent.next()
        }
    }
}

fn splitter(x: String) -> impl Iterator<Item=&'static str> {
    move_wrapper::to_wrapper(x, |a|a.as_str().split_whitespace())
}

fn valuereader<A: std::str::FromStr>() -> impl Iterator<Item=A> 
where <A as std::str::FromStr>::Err: std::fmt::Debug
{
    let a = std::io::stdin().lines();
    let b = a.map(Result::unwrap);
    let c = b.flat_map(splitter);
    c.map(|x|x.parse().expect("Not an integer!"))
}

fn main() {
    let temp: Vec<usize> = valuereader().collect();
    println!("{:?}", temp);
}

Now to the actual question. How would you do this as idiomatic as possible, if possible without using any unsafe code (does the function here called to_wrapper exist)? Have I written safe unsafe code? Is there any way to make my Wrapper work for all traits, not just Iterator?

EDIT

To be clearer, this question is about creating a method you can apply anytime you have to give ownership to something that wants a reference, not about how to read from standard input and parse to integers.

11
  • Unsafe code refers to code within an unsafe context, that is, code that has been explicitly marked unsafe. All this to say that there is no "risk" of accidentally writing unsafe code. Commented Apr 11, 2023 at 19:24
  • @jthulhu Did I say it is any risk I accidentally write unsafe code? In this case I have done it deliberately. The question is if I have managed to do it in a safe way. Commented Apr 11, 2023 at 19:34
  • The unsafe in your code is hard to spot. You should at the least have // SAFETY comment explaining why you think it's sound. Commented Apr 11, 2023 at 19:36
  • BTW: lines() is slow because it has to allocate a new String for each line, it's better to use read_line directly. But of course, then you can't use Iterator. Commented Apr 11, 2023 at 19:40
  • @PitaJ: Though the OP may still want to do that; if they're so concerned with performance that x.as_str().split_whitespace().collect() is objectionable due to allocating a temporary Vec, presumably the allocation activity for a String per line is also an issue. Mind you, this is almost certainly premature optimization; even on the fastest modern hard drives, I/O would almost certainly swamp any allocator overhead, so I'd just do the convenient thing until I had evidence it was causing problems. Commented Apr 11, 2023 at 19:56

2 Answers 2

1

Disclaimer: I don't know why anyone would want to use the method described here; it is better to implement the lending iterator manually or avoid using it in the first place. But this method is at least cool.

My original solution in the question is not sound, as the items in the iterator have the type &'static str which is wrong. As drewtato mentioned, we need to use lending iterators, in this answer I will use the lending-iterator crate.

I was not able to do this using a single generic wrapper function because of the need for one of the types to be covariant, so I will use a macro instead.

To create the self-referential struct, I will use self_cell.

The following code can then be put in a crate and used for "wrapping an iterator that is dependent on a reference so the wrapper contains the referent":

pub use self_cell;
pub use lending_iterator;
pub use lending_iterator::HKT;
pub use lending_iterator::LendingIterator;

#[macro_export]
macro_rules! create_wrapper {
    ($wrapper_name:ident, $itertype:ty, $input_type:ty, $create_exp:expr) => {
        type Dependent<'a> = <$itertype as $crate::lending_iterator::higher_kinded_types::WithLifetime<'a>>::T;
        $crate::self_cell::self_cell!(
            pub struct $wrapper_name {
                owner: $input_type,
                #[not_covariant]
                dependent: Dependent,
            }
        );
        impl $wrapper_name {
            fn create(a: $input_type) -> Self {
                Self::new(a, $create_exp)
            }
        }

        #[$crate::lending_iterator::prelude::gat]
        impl $crate::lending_iterator::prelude::LendingIterator for $wrapper_name {
            type Item<'next> = <<$itertype as $crate::lending_iterator::higher_kinded_types::WithLifetime<'next>>::T as ::core::iter::Iterator>::Item;
            fn next<'a>(&'a mut self) -> ::core::option::Option<Self::Item<'a>> {
                self.with_dependent_mut(|_, iter|iter.next())
            }
        }
    };
}

When the above macro is expanded, it creates a wrapper type using self_cell. Then it creates a function to create it and it implements LendingIterator for it. Unfortunately, the type Dependent is visible outside of the macro, and I didn't come up with a way to fix that.

Solving the original problem by using this code is not directly possible. This is because flat_map doesn't exist in the lending_iterator library. Instead I will flatten after the final map when we have a normal iterator again.

I will call the library above iter_to_lending.

use iter_to_lending::LendingIterator;

iter_to_lending::create_wrapper!(SplitWhitespaceWrapper, iter_to_lending::HKT!(core::str::SplitWhitespace<'_>), String, |data|data.split_whitespace());

pub fn splitter(x: String) -> SplitWhitespaceWrapper {
    SplitWhitespaceWrapper::create(x)
}

fn valuereader<A: std::str::FromStr>() -> impl Iterator<Item=A> 
where <A as std::str::FromStr>::Err: std::fmt::Debug
{
    let a = std::io::stdin().lines();
    let b = a.map(Result::unwrap);
    let c = b.map(splitter);
    c.map(|inner|inner.map_into_iter(|x|x.parse().expect("Not an integer!"))).flatten()
}

fn main() {
    let temp: Vec<usize> = valuereader().collect();
    println!("{:?}", temp);
}

Note that no unsafe was used. This means that this method is probably sound, except for possible compiler bugs invoked by lending_iterator (see danielhenrymantilla/lending-iterator.rs#5) or bugs in self_cell.

Also note that the answer by drewtato is better for my example-problem, but it doesn't work in general.

Sign up to request clarification or add additional context in comments.

Comments

0

You can't create self-referential types in safe rust, as explained here. Unfortunately, fixing this issue still leaves the next one.

Iterators that return items that can't outlive the iterator are impossible, explained here. Yours is even more restrictive: it's trying to create items that can't exist by the time the next item is fetched, which means you need a lending iterator to make this in its current state.

It would be quite easy to create your Vec with some nested for loops:

fn valuereader<A: FromStr>() -> Vec<A>
where
    A::Err: Debug,
{
    let mut v = Vec::new();
    for line in std::io::stdin().lines() {
        for word in line.unwrap().split_whitespace() {
            let a = word.parse().unwrap();
            v.push(a);
        }
    }
    v
}

However, this is not very instructive, and creates many temporary allocations, especially if you only need an iterator and not a Vec.

In order to make your original idea work idiomatically, you need an iterator that produces owned items. Fortunately, your final item type is usize (or anything that comes out of parse, which has to be owned), so you can create an iterator that creates those. This only allocates one String, which will grow to the length of the longest line. (playground)

use std::fmt::Debug;
use std::io::BufRead;
use std::marker::PhantomData;
use std::str::FromStr;

#[derive(Debug, Clone)]
struct ValueReader<B, V> {
    // The underlying BufRead
    buffer: B,
    // The current line being read
    line: String,
    // The current offset into the current line
    index: usize,
    // The type being parsed
    _value_type: PhantomData<V>,
}

impl<B, V> ValueReader<B, V> {
    fn new(b: B) -> Self {
        Self {
            buffer: b,
            line: String::new(),
            index: 0,
            _value_type: PhantomData,
        }
    }

    fn value(&mut self) -> Option<V>
    where
        V: FromStr,
        V::Err: Debug,
        B: BufRead,
    {
        loop {
            // Check if line is consumed, or the iterator just started
            if self.line.is_empty() {
                let bytes = self.buffer.read_line(&mut self.line).unwrap();
                // Buffer is completely consumed
                if bytes == 0 {
                    return None;
                }
            }

            let unconsumed = self.line[self.index..].trim_start();
            self.index = self.line.len() - unconsumed.len();

            let Some(word) = unconsumed.split_whitespace().next() else {
                // Line is consumed, reset to original state
                self.index = 0;
                self.line.clear();
                continue;
            };
            self.index += word.len();

            return Some(word.parse().unwrap());
        }
    }
}

impl<B, V> Iterator for ValueReader<B, V>
where
    V: FromStr,
    V::Err: Debug,
    B: BufRead,
{
    type Item = V;

    fn next(&mut self) -> Option<Self::Item> {
        self.value()
    }
}

This could be made more efficient by using fill_buf and consume to only read one word at a time, shortening the max length of the String. It would also be sensible to report errors instead of unwrapping.

1 Comment

With my edit of the question it should be clear that this does not really answer the question. But it at least answers a big part of the old question!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.