INTRODUCTION
Multiuser multipleinput multipleoutput (MUMIMO) systems have attracted
considerable interest due to the potential of provide large system capacity
and high spectral efficiency in recent years (Goldsmith
et al., 2003; Yu and Cioffi, 2004; Hassibi
and Sharif, 2007). In the broadcast channel of these MUMIMO systems, a
Base Station (BS) communicates with multiple cochannel mobile users simultaneously,
and thereby a high system throughput is achieved. This broadcast system is also
known as the downlink of MUMIMO system. Since the transmitted signal intends
for a particular user also acts as interference to the other users, the main
challenge for this system is the mitigation of this CoChannel Interference
(CCI) to ensure the simultaneous transmission of independent messages to different
users (Spencer et al., 2004a). However, due to
the lack of coordination among the decentralized mobile users, the CCI suppression
is hard to handle at the receiver side. Furthermore, motivated by the expectation
of cheap mobile terminals with low power consumption, the systems with complex
signal processing performed at the BS transmitter is preferred (Gibbard
and Sesay, 1999). Therefore, it is essential to design the precoding scheme
at the BS transmitter for CCI suppression.
Most research on multiuser downlink precoding has tended to assume a single
stream transmission to each user (Spencer et al.,
2004a; Schubert and Boche, 2004; Windpassinger
et al., 2004; Tarighat et al., 2005).
This assumption, however, restricts the possibility of gains from additional
antennas at the mobile terminals. In next generation wireless communication
systems, it is possible to employ antenna arrays at the mobile terminals. This
allows the mobile users to receive multiple spatially multiplexed data streams
and thereby results in very high data rates (Stankovic and
Haardt, 2008; Cheng et al., 2010). In the
meantime, however, these simultaneously transmitted data streams to a particular
user introduce additional InterStream Interference (ISI) at the receiver input.
In this scenario, the system CCI contains the interference between different
users (IUI) and the interference among the spatially multiplexed streams direct
to a certain user (ISI). In order to recover the transmitted data streams, it
is essential to mitigate both the InterUser Interference (IUI) and the ISI
at the transmitter. Therefore, compared to the single stream MUMIMO system,
the precoder design for multistream MUMIMO system is more challenging (Liu
and Krzymien, 2008). In this study, we concentrate on the multiuser precoder
design for the MIMO broadcast channel that allows multistream transmission
per user.
One usual linear precoding algorithm that supports multistream transmission
is Block Diagonalization (BD) (Spencer et al., 2004b).
At the expense of dimensionality constraint, this algorithm ensures zero IUI
and ISI at the receiver of each user. To perform symbol detection, however,
conventional BD requires global Channel State Information (CSI) at each receiver.
Due to the lack of coordination among the spatially separated mobiles, additionaltransmitted
information is needed to find the decoding matrix at every receiver. This is
a limitation for BD. An alternative linear approach is to design the precoding
matrix by maximizing the SignaltoLeakageplusNoise Ratio (SLNR) (Sadek
et al., 2007; Park et al., 2009; Cheng
et al., 2010). This leakage based multiuser precoder relaxes the
dimensionality constraint in BD, and carries out an analytical closedform solution
for the precoding matrix. However, the precoding matrix, as the additionaltransmitted
information, is still required at each receiver to decouple the multiple streams.
For the linear precoding systems employing original SLNR maximization and the
BD algorithm, the additionaltransmitted information are used to perform equalization
at the receiver. This linear equalization makes the system suffer from noise
enhancement, and results in poor power efficiency. To avoid noise enhancement
at the decoders and reduce system cost, simple receivers with little additionaltransmitted
information are preferred.
Nonlinear precoding algorithms can achieve better performance (Windpassinger
et al., 2004; Liu and Krzymien, 2008). Since
no cooperation among the spatially separated receivers is possible in the multiuser
MIMO broadcast system, the TomlinsonHarashima Precoding (THP) algorithms, which
is applicable to broadcast transmission, should move both the backward filter
and the forward filter to the transmitter. This architecture enables very simple
receivers and the noise enhancement at the receiver is avoided. In general,
according to the positions of the diagonal scaling filter, there are two basic
THP structures in downlink multiuser MIMO systems (Huang
et al., 2008). The diagonal scaling filter decentralized at the receivers
(Windpassinger et al., 2004) and the diagonal
scaling filter centralized at the transmitter (Windpassinger,
2004). Under a dimensionality constraint, these THPbased solutions achieve
complete equalization at the transmitter. However, the THP algorithm incurs
high complexity due to the nonlinear nature and the combinatorial problem of
user order selection. For a large number of receivers, the complexity at the
transmitter becomes very high.
In this study, a leakagebased nonlinear precoder is proposed for the multistream MIMO broadcast channel.
We extend the THP algorithm to the stream domain; it is performed per user
to eliminate the interference between the data streams directed to that user.
The interference between users is minimized based on SLNR maximization. Since
the THP structure with diagonal scaling filter centralized at the transmitter
(Windpassinger, 2004) is considered during the precoder
design, this proposed precoding scheme achieves a very simple receiver at each
user and only the information of a power scaling factor is needed at the receiver.
Unlike the conventional THPbased nonlinear precoding techniques (Windpassinger
et al., 2004; Doostnejad et al., 2005;
Stankovic and Haardt, 2008; Windpassinger,
2004; Liu and Krzymien, 2008), wherein the THP is
performed in user domain, the implementation of THP in this proposed precoder
requires much smaller dimension, and the dimensionality constraint is also relaxed.
Moreover, we also prove that our scheme supports the data steams within one
user equally. Thereby, our scheme overcomes the inherent drawback of the original
SLNR maximization scheme. Simulations demonstrate the performance of the proposed
scheme.
MULTIUSER MIMO BROADCAST SYSTEM MODEL
Consider a MIMO broadcast channel with K decentralized users and a single Base Station (BS). The BS has N transmit antennas, and user k is equipped with M_{k} receive antennas.
For the case of multistream transmission, let s_{k} = [s_{k} (1), s_{k} (2),...,s_{k} (L_{k})]^{T} denotes the data vector transmitted to the k th user, where L_{k}(≤M_{k}) is the number of streams for user k and (•)^{T} denotes the transpose operator. The modulated symbols in s_{k} are assumed to be independent and have the variance σ_{s}^{2}. For notational simplicity, the time index is dropped.
The channel from the BS to user k is assumed to be flat fading and denoted by a M_{k}xN matrix H_{k}. The elements of H_{k} are samples of independently and identically distributed (i.i.d.) complex Gaussian process with zero mean and unitary variance.
At the kth user, the received signal vector is given by:
where, x_{k} is the Nx1 transmitted vector corresponding to user k, which is generated by precoding the data vector s_{k}; n_{k} is the M_{k}x1 additive complex white Gaussian noise (AWGN) vector, whose elements are i.i.d. samples distributed as CN (0, σ_{n}^{2}).
LEAKAGEBASED PRECODER FOR MULTISTREAM MIMO BROADCAST SYSTEM
For the multistream MIMO broadcast system, a multiuser precoding algorithm is designed to combat the IUI between users and to decouple the multiple streams within each user.
Original SLNR maximization algorithm: In (Sadek
et al., 2007), a linear precoder based on Signal to LeakageplusNoise
Ratio (SLNR) maximization is presented for the multistream MIMO broadcast channel.
At the transmitter, the data vector s_{i} is multiplied by the precoding
matrix W_{i} to generate the transmitted vector x_{i}, i.e.
x_{2} = W_{2}s_{2}. To suppress the IUI, the NxL_{i}
precoding matrix W_{i} is chosen to maximize the SLNR of user i = 1,
..., K. The total power leaked from user i to all the other users is defined
as
and then the SLNR of user i is defined as:


(2) 
where, , (•)^{H} denotes the conjugate transpose operator, Tr (•) stands for trace, •_{F} represents the Frobenius norm, and I_{N} stands for an NxN identity matrix. To perform power control, the precoding matrix W_{i} should ensure the constraint Tr(W^{H}_{i} W_{i}) = L_{i}.
At the receiver, a matched filter is used to decode the signal vector, i.e. the decoded signal is given by:
In order to decouple the multiple streams sent to a given user, the following constraint should be satisfied while designing the precoder:
where D_{i} is a diagonal matrix.
Since, is symmetric positive definite, there exists a nonsingular matrix T_{i}ε÷^{Nx N}, which satisfies:
where, Λ_{i} is NxN diagonal matrix with nonnegative entries.
By means of the character of generalized eigenvalue decomposition and singular value decomposition, the optimal W_{i}, which maximizes the SLNR_{i} (2) and satisfies the constraint (4), is given by:
where, γ_{i} is used to ensure the power constraint Tr (W_{i}^{H}
W_{i}) = L_{i} and the columns of T_{i} are the generalized
eigenvectors of
that corresponding to the L_{i} maximum generalized eigenvalues.
SLNR is a promising criterion for linear precoder design in multiuser MIMO
broadcast system (Park et al., 2009; Cheng
et al., 2010). It decouples the precoder design problem and makes
the analytical closedform solution available. This original SLNR maximization
algorithm assumes a perfect knowledge of CSI at the BS. To decouple the multiple
streams, the knowledge of H_{i} and W_{2}/i should be available
at the receiver of user i. Since no cooperation among the mobiles is possible,
the NxL_{i} precoding matrix W_{i} needs to be additionally
transmitted to user i. The main challenge with this scheme is that the use of
matched filter at each receiver limits the BER performance due to noise enhancement,
and brings in additional system planning to find the decoding matrix at every
receiver. Moreover, from the Eq. 36, it
is easy to see that the substreams within one user have different SINRs. This
imbalance among the substreams can be seen as an inherent drawback of the original
SLNR precoing scheme, since the overall performance of a user with multiple
data streams is limited by the stream with worst channel condition (Cheng
et al., 2010).
The proposed leakagebased nonlinear algorithm
Precoder design: In this section, we propose a leakagebased nonlinear
precoder for the multistream MIMO broadcast system. Assuming perfect knowledge
of CSI at the BS, this proposed precoder mitigates both the IUI and the ISI
at the transmitter. Therefore, a very simple receiver is achieved at every user,
and only a power scaling factor needs to be additionally transmitted at the
transmitter.
Since the THP algorithm is imposed to operate in the stream domain, this proposed precoding scheme encodes the data streams of each user independently at the transmitter. Therefore, the realization of parallel processing at the transmitter is available. The proposed precoding system is shown in Fig. 1.
At the transmitter, to separate the data streams within one user, the nonlinear THP algorithm is introduced. Let s_{i} denotes the modulated data vector for user I.

Fig. 1: 
The proposed precoding system for multistream MIMO broadcast
channel 
The symbols of s_{i} pass through the backward filter B_{i} and the modulo operator MOD (•) iteratively. Then a symbol vector is generated, i.e. the l th element of vector is given by:
where s_{i} (l) denotes the l th element of vector s_{i},[B_{i}]_{lj} denotes the (l, j) th element of B_{i}. The modulo operator is introduced to reduce the signal power increased by B_{i}. The modulo operator for a complex variable a is defined as:
where ⌊•⌋ is the floor operator, which gives the integer smaller than or equal to the argument; Re (a) denotes the real part of a, and Im (a) denotes the image part of a; the constant A is determined by the modulation signal constellation used.
At the transmitter, the output vector
from the feedback section is subsequently multiplied by a precoding matrix W_{i}
before the transmission over channel. In this proposed precoder, we design the
precoding matrix W_{i} as a cascade of two matrices G_{i} and
F_{i}, i.e., W_{i} = G_{i}F_{i}. The matrix
G_{i} is designed to suppress the IUI, and the matrix F_{i}
works as the forward filter matrix of THP for the mitigation of ISI.
To perform THP over the data streams within one user, the processing matrices B_{i}, F_{i} and Γ_{i} directly depend on the effective channel matrix, which is considered as:
Thereby, the NxL_{i} matrix G_{i}, which maximizes the SLNR of user, should be carried out firstly. With the backward filter B_{i} and the forward filter F_{i} deployed at the transmitter, the SLNR of user i can be written as:
Consider the output vector
from the feedback section. As shown in Eq. 7, using THP, the
symbols
of is no longer taken from the signal constellation. The values of these symbols
are (approximately) uniformly distributed over the boundary region of the signal
constellation. This leads to a somewhat increased transmit power. This increment
is negligible for moderate modulation sizes, because it decreases rapidly as
the modulation size increases and vanishes as the modulation size goes to infinity.
For square MQAM constellations, this increment is found to be .
In Fig. 1, this slight transmit power increase is not compensated,
however, in our simulations we do take this into account. Therefore, it can
be considered that the symbols of vector
have the same power as that in s_{i}. Moreover, the symbols of can
be assumed to be mutually uncorrelated (Windpassinger et
al., 2004). Thus, the SLNR expression in (10) can be simplified as:
According to THP algorithm, the forward matrix F_{i} is a unitary matrix. Then Eq. 11 can be written as:
It is obvious that the forward filter matrix F_{i} of THP has no effect
on SLNR_{i} and the SLNR_{i} in Eq. 12 has
the same expression as that in the original SLNR maximization algorithm. Therefore,
the SLNR_{i} optimal precoding matrix G_{i} which maximizes
the Eq. 12 can also be carried out by Eq. 6.
Since simple receivers with little additionaltransmitted information are preferred,
the diagonal scaling matrix Γ_{i} of THP is also located at the
transmitter in our precoding scheme. It should be noted that, placing Γ_{i}
at the transmitter will affect the SLNR of user i, however, this influence to
SLNR_{i} will not considered in our scheme. This is because that, the
causality between Γ_{i} and G_{i} makes it difficult to
find the optimal solutions based on SLNR maximization. Then, the processing
matrices for THP can be obtained by performing LQ decomposition (Windpassinger
et al., 2004) on the effective channel matrix H_{eff, i},
i.e.:
where, S_{i} is a lower triangular matrix, Q_{i} is a unitary matrix. And:
where, [S_{i}]_{mn} denotes the element at the m row and the n column of S_{i}.
With all these processing matrices placed at the transmitter, complete transmitter side equalization is achieved in our scheme, and this enables very simple receivers at every user. At the receiver, a modulo operation MOD (•) is applied to remove the effect of the modulo operation at the transmitter. To keep transmit power constant, a scaling factor:
is required at the transmitter, and this gain is compensated at the receiver correspondingly.
In this proposed precoding scheme, both the IUI and the AWGN is preeliminated
at the transmitter based on SLNR maximization. Specially, the multiple substreams
within one user are also predecoupled at the transmitter by performing the
THP algorithm in stream domain. Then, the additionaltransmitted information
in this scheme is decreased to one real scalar Eq. 17 for
every user. At each receiver, the residual IUI and noise interference are truncated
into a fundamental region due to the nonlinear modulo operation. Thereby, the
noise enhancement suffered in linear precoding schemes is avoided.
Performace discussion: In this proposed leakagebased nonlinear precoding
scheme, we extend the THP algorithm to the stream domain, the THP algorithm
is therefore implemented with a much smaller dimension, and the dimensionality
constraint to the system is also relaxed. In conventional THPbased precoding
schemes, THP is performed within user domain to ensure the transmission of independent
data streams, which results in the dimensionality constraint of
and an implementation of
dimension THP. However, as for this proposed precoding system, the dimensionality
constraint is relaxed to N≥ max (L_{i}, I = 1,..., K) and the THP
implementation dimension is reduced to L_{i}.
Furthermore, the proposed precoding scheme supports the multiple substreams within one user equally in contrast to the original SLNR maximization algorithm. At the decoder, the scalar weight α_{i} is applied at all data streams within user i. It is proved as following that the substreams within one user have the identical SINRs.
Consider the proposed precoding system, the transmitted signal is given by:
Clearly, as an input signal to the modulo operator at user i, the received signal after scaling should take the form:
For user i the output vector from the feedback section satisfies that ,
where p_{i} is the modulo factor vector (Windpassinger
et al., 2004). Applying Eq. 9 and 1316,
we obtain:
where:
is the residual interference in the received signal of user
i, is the received noise.
Since the modulo factor vector p_{i} in Eq. 20 will be removed by passing y'_{i} through the modulo operator, s_{i} + p_{i} in (20) can be seen as the desired signal for user i. Then, it is clear that the desired signal power for each substream within user i is identical.
The power of the residual interference and the received noise for each substream of user are further investigated as following.
Let h_{i, l} represents the l th row of matrix H_{i}, and then the power of the residual interference in the l th substream of user i is given by:
where, E{•} stands for expectation, •stands for norm
operation. Since the symbols of vector
has the same power as that in s_{i} and they are assumed to be
mutually uncorrelated (Windpassinger et al., 2004),
Eq. 21 can be written as:
Applying the Singular Value Decomposition (SVD) to the channel matrix of user i, i.e., H_{i} = U_{i} Σ_{i} V_{i} we have:
where, u_{i, l} is the l th row of matrix U_{i}. Then Eq. 22 can be written as:
Then, it is easy to see that each substream of user i has the same power of the residual interference.
Moreover, from the system model, n_{i} is the additive complex white Gaussian noise vector, whose elements are i.i.d. samples distributed as CN (0, σ^{2}_{n}). Therefore, the power of the received noise for the l th substream of user i is given by:
Thus, from Eq. 20, 24 and 25,
we ensure that all the substreams within one user have the same SINRs; the
drawback of the original SLNR maximization precoding scheme is overcame.
SIMULATION RESULTS
Here, simulation results are presented to demonstrate the performance of the proposed leakagebased nonlinear precoder.
Let (N, M_{l} ,..., M_{k}) denotes a multiuser MIMO system with a base station and K users, where the base station employs N transmit antennas and the kth user is equipped with M_{k} receive antennas. In view of the demand of high transmission rates, we focus on high spectral efficiencies. Thus the number of data streams transmitted to user k is assumed to be equal to M_{k}. Without loss of generality, the same number of receive antennas is assumed for all users. A quasistatic flat fading channel is assumed in this multiuser MIMO system. The channel matrix is known at the transmitter. In the simulations, the channel matrix is the same for every 100 symbols, and alternate independently during different periods. The simulation results are averaged over 10000 channel realizations for BER curves. The range of SNR considered in our simulation is between 0 dB and 30 dB. The SNR is the ratio of the average power of the precoded symbols to noise variance, i.e. the SNR of user i is defined as:
Consider a 3user system with 4 receiveantennas per user and 12 transmit antennas.
Figure 2 shows the performance comparison of the proposed
leakagebased nonlinear precoder, the original SLNR maximization approach (Sadek
et al., 2007), the conventional BD algorithm without power allocation
(Spencer et al., 2004b) and the THPbased precoding
algorithms in (Windpassinger, 2004). 16QAM or 16PSK
modulation with Gray mapping is used throughout the simulation. The BER in the
figure is the average over all users.
From the simulation results shown in Fig. 2, it can be seen that, with the same antenna configuration and data rate, the proposed precoder outperforms the linear precoding schemes and the THP algorithms without ordering or with suboptimal ordering.
The original SLNR maximization approach performs poorly with MQAM modulation. Since the use of matched filter at each receiver results in noise and residual IUI enhancement, whereas the detection of QAM symbols is sensitive to amplitude. The proposed precoder performs significantly better than the original SLNR maximization and the BD algorithm. With a simple receiver, this proposed precoder avoids the noise enhancement suffered in the original SLNR maximization and the BD algorithm.

Fig. 2: 
Performance comparison of the proposed leakagebased nonlinear
precoder, original SLNR maximization, BD, THP without ordering (Windpassinger,
2004), THP with suboptimal ordering (Doostnejad
et al., 2005) and THP with optimal ordering in (12, 4, 4, 4)
system 

Fig. 3: 
BER performance of the proposed precoder in a 3user system
with 4 receive antennas per user and a varying number of transmit antennas 
The performance of the proposed precoder is also better than the nonlinear
THP algorithm without ordering and the THP with suboptimal ordering presented
(Doostnejad et al., 2005). From Fig.
2, it can be observed that the performance of the conventional THP algorithm
greatly depends on the user order selection. With THP algorithm, system performance
can be significantly improved by ordering the channel matrices of different
mobiles properly (Doostnejad et al., 2005). Although
the THPbased precoding system with optimal ordering outperforms the other methods,
the computational complexity of this method is much higher. While finding the
optimal ordering, the THPbased nonlinear precoding algorithm (Windpassinger,
2004) has to be examined over K! rearrangements, which is not practical.

Fig. 4: 
BER performance of every user in (12, 6, 6) system with the
proposed precoder, 16QAM, and the original SLNR maximization, 16PSK 
For a large number of receivers, the complexity at the transmitter becomes
very high. The suboptimal ordering method presented (Doostnejad
et al., 2005) is a low complexity solution; however, the performance
with this suboptimal ordering method improves slightly than that without ordering.
From Fig. 2, the proposed precoder achieves satisfactory performance
with a greatly reduced computational complexity compared to the optimal ordered
THP algorithm. Although the ordering problem is not considered, the proposed
precoder performs much closer to the optimal ordered THP than the THP without
ordering and with suboptimal ordering. Since considerable performance improvement
is achieved, the proposed precoder is promising.
In Fig. 35, we concentrate on the performance of the proposed leakagebased nonlinear precoder.
From Fig. 3, the performance of the proposed scheme improves significantly with the increase of transmit antennas. For (11, 4, 4, 4) system, both the BD algorithm and the THPbased precoding algorithms can not work due to the dimensionality constraint. Based on SLNR maximization, this proposed precoding scheme relaxes the dimensionality constraint; whereas it performs flat in the high SNR region due to the fact that the total number of data streams has exceeded the number of transmit antennas.
In Fig. 4 and 5, we examine the performance
of each user within the system and the performance of each substream directed
to a given user. The proposed precoder is simulated under a (12, 6, 6) system
with 16QAM. It is shown in Fig. 5 that, as proven in the
performance discussion section, this proposed algorithm gains uniform performance
in stream domain. This merit guarantees a better performance of each user within
this system.

Fig. 5: 
BER performance of every substream for user 1 in (12, 6,
6) system with the proposed precoder, 16QAM and the original SLNR maximization,
16PSK 
Moreover, from Fig. 3, this proposed algorithm can achieve
equivalent performance between users within the system.
For comparison, the performance of the original SLNR maximization approach is also presented in Fig. 4 and 5. The (12, 6, 6) system with 16PSK is used. As shown in Fig. 5, with the original SLNR precoding scheme, the substream gains within one user are severely unbalanced. Thus, the overall performance of a user, given in Fig. 4, is dramatically limited by the stream with worst channel condition.
CONCLUSION
A leakagebased nonlinear precoding scheme was proposed for the multistream MIMO broadcast channel. In this precoding scheme, the nonlinear THP algorithm was extended to operate in stream domain, both the IUI and the ISI were mitigated at the transmitter, the additionaltransmitted information was decreased to a scaling factor, and the noise enhancement was avoided with a very simple receiver. Simulation results shown that, without order selection, this proposed precoding scheme achieved satisfactory performance with relative lower computational complexity. Therefore, this proposed precoder is promising for high rate transmission.
ACKNOWLEDGMENT
This work is supported by the National Natural Science Foundation of China (No. 60772161) and the Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 200801410015).